Decoding (MVPA)

Design philosophy

Decoding (a.k.a. MVPA) in MNE largely follows the machine learning API of the scikit-learn package. Each estimator implements fit, transform, fit_transform, and (optionally) inverse_transform methods. For more details on this design, visit scikit-learn. For additional theoretical insights into the decoding framework in MNE, see 1.

For ease of comprehension, we will denote instantiations of the class using the same name as the class but in small caps instead of camel cases.

Let’s start by loading data for a simple two-class problem:

import numpy as np
import matplotlib.pyplot as plt

from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression

import mne
from mne.datasets import sample
from mne.decoding import (SlidingEstimator, GeneralizingEstimator, Scaler,
                          cross_val_multiscore, LinearModel, get_coef,
                          Vectorizer, CSP)

data_path = sample.data_path()

raw_fname = data_path + '/MEG/sample/sample_audvis_raw.fif'
tmin, tmax = -0.200, 0.500
event_id = {'Auditory/Left': 1, 'Visual/Left': 3}  # just use two
raw = mne.io.read_raw_fif(raw_fname, preload=True)

# The subsequent decoding analyses only capture evoked responses, so we can
# low-pass the MEG data. Usually a value more like 40 Hz would be used,
# but here low-pass at 20 so we can more heavily decimate, and allow
# the examlpe to run faster. The 2 Hz high-pass helps improve CSP.
raw.filter(2, 20)
events = mne.find_events(raw, 'STI 014')

# Set up pick list: EEG + MEG - bad channels (modify to your needs)
raw.info['bads'] += ['MEG 2443', 'EEG 053']  # bads + 2 more

# Read epochs
epochs = mne.Epochs(raw, events, event_id, tmin, tmax, proj=True,
                    picks=('grad', 'eog'), baseline=(None, 0.), preload=True,
                    reject=dict(grad=4000e-13, eog=150e-6), decim=10)
epochs.pick_types(meg=True, exclude='bads')  # remove stim and EOG

X = epochs.get_data()  # MEG signals: n_epochs, n_meg_channels, n_times
y = epochs.events[:, 2]  # target: Audio left or right

Out:

Opening raw data file /home/circleci/mne_data/MNE-sample-data/MEG/sample/sample_audvis_raw.fif...
    Read a total of 3 projection items:
        PCA-v1 (1 x 102)  idle
        PCA-v2 (1 x 102)  idle
        PCA-v3 (1 x 102)  idle
    Range : 25800 ... 192599 =     42.956 ...   320.670 secs
Ready.
Current compensation grade : 0
Reading 0 ... 166799  =      0.000 ...   277.714 secs...
Filtering raw data in 1 contiguous segment
Setting up band-pass filter from 2 - 20 Hz

FIR filter parameters
---------------------
Designing a one-pass, zero-phase, non-causal bandpass filter:
- Windowed time-domain design (firwin) method
- Hamming window with 0.0194 passband ripple and 53 dB stopband attenuation
- Lower passband edge: 2.00
- Lower transition bandwidth: 2.00 Hz (-6 dB cutoff frequency: 1.00 Hz)
- Upper passband edge: 20.00 Hz
- Upper transition bandwidth: 5.00 Hz (-6 dB cutoff frequency: 22.50 Hz)
- Filter length: 991 samples (1.650 sec)

320 events found
Event IDs: [ 1  2  3  4  5 32]
145 matching events found
Applying baseline correction (mode: mean)
Not setting metadata
3 projection items activated
Loading data for 145 events and 421 original time points ...
    Rejecting  epoch based on EOG : ['EOG 061']
    Rejecting  epoch based on EOG : ['EOG 061']
    Rejecting  epoch based on EOG : ['EOG 061']
    Rejecting  epoch based on EOG : ['EOG 061']
    Rejecting  epoch based on EOG : ['EOG 061']
    Rejecting  epoch based on EOG : ['EOG 061']
    Rejecting  epoch based on EOG : ['EOG 061']
    Rejecting  epoch based on EOG : ['EOG 061']
    Rejecting  epoch based on EOG : ['EOG 061']
    Rejecting  epoch based on EOG : ['EOG 061']
    Rejecting  epoch based on EOG : ['EOG 061']
    Rejecting  epoch based on EOG : ['EOG 061']
    Rejecting  epoch based on EOG : ['EOG 061']
    Rejecting  epoch based on EOG : ['EOG 061']
    Rejecting  epoch based on EOG : ['EOG 061']
    Rejecting  epoch based on EOG : ['EOG 061']
    Rejecting  epoch based on EOG : ['EOG 061']
    Rejecting  epoch based on EOG : ['EOG 061']
    Rejecting  epoch based on EOG : ['EOG 061']
    Rejecting  epoch based on EOG : ['EOG 061']
    Rejecting  epoch based on EOG : ['EOG 061']
    Rejecting  epoch based on EOG : ['EOG 061']
22 bad epochs dropped

Transformation classes

Scaler

The mne.decoding.Scaler will standardize the data based on channel scales. In the simplest modes scalings=None or scalings=dict(...), each data channel type (e.g., mag, grad, eeg) is treated separately and scaled by a constant. This is the approach used by e.g., mne.compute_covariance() to standardize channel scales.

If scalings='mean' or scalings='median', each channel is scaled using empirical measures. Each channel is scaled independently by the mean and standand deviation, or median and interquartile range, respectively, across all epochs and time points during fit (during training). The transform() method is called to transform data (training or test set) by scaling all time points and epochs on a channel-by-channel basis. To perform both the fit and transform operations in a single call, the fit_transform() method may be used. To invert the transform, inverse_transform() can be used. For scalings='median', scikit-learn version 0.17+ is required.

Note

Using this class is different from directly applying sklearn.preprocessing.StandardScaler or sklearn.preprocessing.RobustScaler offered by scikit-learn. These scale each classification feature, e.g. each time point for each channel, with mean and standard deviation computed across epochs, whereas mne.decoding.Scaler scales each channel using mean and standard deviation computed across all of its time points and epochs.

Vectorizer

Scikit-learn API provides functionality to chain transformers and estimators by using sklearn.pipeline.Pipeline. We can construct decoding pipelines and perform cross-validation and grid-search. However scikit-learn transformers and estimators generally expect 2D data (n_samples * n_features), whereas MNE transformers typically output data with a higher dimensionality (e.g. n_samples * n_channels * n_frequencies * n_times). A Vectorizer therefore needs to be applied between the MNE and the scikit-learn steps like:

# Uses all MEG sensors and time points as separate classification
# features, so the resulting filters used are spatio-temporal
clf = make_pipeline(Scaler(epochs.info),
                    Vectorizer(),
                    LogisticRegression(solver='lbfgs'))

scores = cross_val_multiscore(clf, X, y, cv=5, n_jobs=1)

# Mean scores across cross-validation splits
score = np.mean(scores, axis=0)
print('Spatio-temporal: %0.1f%%' % (100 * score,))

Out:

Spatio-temporal: 99.2%

PSDEstimator

The mne.decoding.PSDEstimator computes the power spectral density (PSD) using the multitaper method. It takes a 3D array as input, converts it into 2D and computes the PSD.

FilterEstimator

The mne.decoding.FilterEstimator filters the 3D epochs data.

Spatial filters

Just like temporal filters, spatial filters provide weights to modify the data along the sensor dimension. They are popular in the BCI community because of their simplicity and ability to distinguish spatially-separated neural activity.

Common spatial pattern

mne.decoding.CSP is a technique to analyze multichannel data based on recordings from two classes 2 (see also https://en.wikipedia.org/wiki/Common_spatial_pattern).

Let \(X \in R^{C\times T}\) be a segment of data with \(C\) channels and \(T\) time points. The data at a single time point is denoted by \(x(t)\) such that \(X=[x(t), x(t+1), ..., x(t+T-1)]\). Common spatial pattern (CSP) finds a decomposition that projects the signal in the original sensor space to CSP space using the following transformation:

(1)\[x_{CSP}(t) = W^{T}x(t)\]

where each column of \(W \in R^{C\times C}\) is a spatial filter and each row of \(x_{CSP}\) is a CSP component. The matrix \(W\) is also called the de-mixing matrix in other contexts. Let \(\Sigma^{+} \in R^{C\times C}\) and \(\Sigma^{-} \in R^{C\times C}\) be the estimates of the covariance matrices of the two conditions. CSP analysis is given by the simultaneous diagonalization of the two covariance matrices

(2)\[W^{T}\Sigma^{+}W = \lambda^{+}\]
(3)\[W^{T}\Sigma^{-}W = \lambda^{-}\]

where \(\lambda^{C}\) is a diagonal matrix whose entries are the eigenvalues of the following generalized eigenvalue problem

(4)\[\Sigma^{+}w = \lambda \Sigma^{-}w\]

Large entries in the diagonal matrix corresponds to a spatial filter which gives high variance in one class but low variance in the other. Thus, the filter facilitates discrimination between the two classes.

Note

The winning entry of the Grasp-and-lift EEG competition in Kaggle used the CSP implementation in MNE and was featured as a script of the week.

We can use CSP with these data with:

csp = CSP(n_components=3, norm_trace=False)
clf = make_pipeline(csp, LogisticRegression(solver='lbfgs'))
scores = cross_val_multiscore(clf, X, y, cv=5, n_jobs=1)
print('CSP: %0.1f%%' % (100 * scores.mean(),))

Out:

Computing data rank from raw with rank=None
    Using tolerance 4.4e-11 (2.2e-16 eps * 203 dim * 9.7e+02  max singular value)
    Estimated rank (mag): 203
    MAG: rank 203 computed from 203 data channels with 0 projectors
Reducing data rank from 203 -> 203
Estimating covariance using EMPIRICAL
Done.
Computing data rank from raw with rank=None
    Using tolerance 5.2e-11 (2.2e-16 eps * 203 dim * 1.1e+03  max singular value)
    Estimated rank (mag): 203
    MAG: rank 203 computed from 203 data channels with 0 projectors
Reducing data rank from 203 -> 203
Estimating covariance using EMPIRICAL
Done.
Computing data rank from raw with rank=None
    Using tolerance 4.2e-11 (2.2e-16 eps * 203 dim * 9.3e+02  max singular value)
    Estimated rank (mag): 203
    MAG: rank 203 computed from 203 data channels with 0 projectors
Reducing data rank from 203 -> 203
Estimating covariance using EMPIRICAL
Done.
Computing data rank from raw with rank=None
    Using tolerance 5.2e-11 (2.2e-16 eps * 203 dim * 1.2e+03  max singular value)
    Estimated rank (mag): 203
    MAG: rank 203 computed from 203 data channels with 0 projectors
Reducing data rank from 203 -> 203
Estimating covariance using EMPIRICAL
Done.
Computing data rank from raw with rank=None
    Using tolerance 4.2e-11 (2.2e-16 eps * 203 dim * 9.4e+02  max singular value)
    Estimated rank (mag): 203
    MAG: rank 203 computed from 203 data channels with 0 projectors
Reducing data rank from 203 -> 203
Estimating covariance using EMPIRICAL
Done.
Computing data rank from raw with rank=None
    Using tolerance 5.2e-11 (2.2e-16 eps * 203 dim * 1.1e+03  max singular value)
    Estimated rank (mag): 203
    MAG: rank 203 computed from 203 data channels with 0 projectors
Reducing data rank from 203 -> 203
Estimating covariance using EMPIRICAL
Done.
Computing data rank from raw with rank=None
    Using tolerance 4.2e-11 (2.2e-16 eps * 203 dim * 9.4e+02  max singular value)
    Estimated rank (mag): 203
    MAG: rank 203 computed from 203 data channels with 0 projectors
Reducing data rank from 203 -> 203
Estimating covariance using EMPIRICAL
Done.
Computing data rank from raw with rank=None
    Using tolerance 5e-11 (2.2e-16 eps * 203 dim * 1.1e+03  max singular value)
    Estimated rank (mag): 203
    MAG: rank 203 computed from 203 data channels with 0 projectors
Reducing data rank from 203 -> 203
Estimating covariance using EMPIRICAL
Done.
Computing data rank from raw with rank=None
    Using tolerance 4.2e-11 (2.2e-16 eps * 203 dim * 9.3e+02  max singular value)
    Estimated rank (mag): 203
    MAG: rank 203 computed from 203 data channels with 0 projectors
Reducing data rank from 203 -> 203
Estimating covariance using EMPIRICAL
Done.
Computing data rank from raw with rank=None
    Using tolerance 5.2e-11 (2.2e-16 eps * 203 dim * 1.1e+03  max singular value)
    Estimated rank (mag): 203
    MAG: rank 203 computed from 203 data channels with 0 projectors
Reducing data rank from 203 -> 203
Estimating covariance using EMPIRICAL
Done.
CSP: 87.0%

Source power comodulation (SPoC)

Source Power Comodulation (mne.decoding.SPoC) 3 identifies the composition of orthogonal spatial filters that maximally correlate with a continuous target.

SPoC can be seen as an extension of the CSP where the target is driven by a continuous variable rather than a discrete variable. Typical applications include extraction of motor patterns using EMG power or audio patterns using sound envelope.

xDAWN

mne.preprocessing.Xdawn is a spatial filtering method designed to improve the signal to signal + noise ratio (SSNR) of the ERP responses 4. Xdawn was originally designed for P300 evoked potential by enhancing the target response with respect to the non-target response. The implementation in MNE-Python is a generalization to any type of ERP.

Effect-matched spatial filtering

The result of mne.decoding.EMS is a spatial filter at each time point and a corresponding time course 5. Intuitively, the result gives the similarity between the filter at each time point and the data vector (sensors) at that time point.

Patterns vs. filters

When interpreting the components of the CSP (or spatial filters in general), it is often more intuitive to think about how \(x(t)\) is composed of the different CSP components \(x_{CSP}(t)\). In other words, we can rewrite Equation (1) as follows:

(5)\[x(t) = (W^{-1})^{T}x_{CSP}(t)\]

The columns of the matrix \((W^{-1})^T\) are called spatial patterns. This is also called the mixing matrix. The example Linear classifier on sensor data with plot patterns and filters discusses the difference between patterns and filters.

These can be plotted with:

# Fit CSP on full data and plot
csp.fit(X, y)
csp.plot_patterns(epochs.info)
csp.plot_filters(epochs.info, scalings=1e-9)
  • ../../_images/sphx_glr_plot_sensors_decoding_001.png
  • ../../_images/sphx_glr_plot_sensors_decoding_002.png

Out:

Computing data rank from raw with rank=None
    Using tolerance 4.8e-11 (2.2e-16 eps * 203 dim * 1.1e+03  max singular value)
    Estimated rank (mag): 203
    MAG: rank 203 computed from 203 data channels with 0 projectors
Reducing data rank from 203 -> 203
Estimating covariance using EMPIRICAL
Done.
Computing data rank from raw with rank=None
    Using tolerance 5.7e-11 (2.2e-16 eps * 203 dim * 1.3e+03  max singular value)
    Estimated rank (mag): 203
    MAG: rank 203 computed from 203 data channels with 0 projectors
Reducing data rank from 203 -> 203
Estimating covariance using EMPIRICAL
Done.

Decoding over time

This strategy consists in fitting a multivariate predictive model on each time instant and evaluating its performance at the same instant on new epochs. The mne.decoding.SlidingEstimator will take as input a pair of features \(X\) and targets \(y\), where \(X\) has more than 2 dimensions. For decoding over time the data \(X\) is the epochs data of shape n_epochs x n_channels x n_times. As the last dimension of \(X\) is the time, an estimator will be fit on every time instant.

This approach is analogous to SlidingEstimator-based approaches in fMRI, where here we are interested in when one can discriminate experimental conditions and therefore figure out when the effect of interest happens.

When working with linear models as estimators, this approach boils down to estimating a discriminative spatial filter for each time instant.

Temporal decoding

We’ll use a Logistic Regression for a binary classification as machine learning model.

# We will train the classifier on all left visual vs auditory trials on MEG

clf = make_pipeline(StandardScaler(), LogisticRegression(solver='lbfgs'))

time_decod = SlidingEstimator(clf, n_jobs=1, scoring='roc_auc', verbose=True)
scores = cross_val_multiscore(time_decod, X, y, cv=5, n_jobs=1)

# Mean scores across cross-validation splits
scores = np.mean(scores, axis=0)

# Plot
fig, ax = plt.subplots()
ax.plot(epochs.times, scores, label='score')
ax.axhline(.5, color='k', linestyle='--', label='chance')
ax.set_xlabel('Times')
ax.set_ylabel('AUC')  # Area Under the Curve
ax.legend()
ax.axvline(.0, color='k', linestyle='-')
ax.set_title('Sensor space decoding')
../../_images/sphx_glr_plot_sensors_decoding_003.png

You can retrieve the spatial filters and spatial patterns if you explicitly use a LinearModel

clf = make_pipeline(StandardScaler(),
                    LinearModel(LogisticRegression(solver='lbfgs')))
time_decod = SlidingEstimator(clf, n_jobs=1, scoring='roc_auc', verbose=True)
time_decod.fit(X, y)

coef = get_coef(time_decod, 'patterns_', inverse_transform=True)
evoked = mne.EvokedArray(coef, epochs.info, tmin=epochs.times[0])
joint_kwargs = dict(ts_args=dict(time_unit='s'),
                    topomap_args=dict(time_unit='s'))
evoked.plot_joint(times=np.arange(0., .500, .100), title='patterns',
                  **joint_kwargs)
../../_images/sphx_glr_plot_sensors_decoding_004.png

Temporal generalization

Temporal generalization is an extension of the decoding over time approach. It consists in evaluating whether the model estimated at a particular time instant accurately predicts any other time instant. It is analogous to transferring a trained model to a distinct learning problem, where the problems correspond to decoding the patterns of brain activity recorded at distinct time instants.

The object to for Temporal generalization is mne.decoding.GeneralizingEstimator. It expects as input \(X\) and \(y\) (similarly to SlidingEstimator) but generates predictions from each model for all time instants. The class GeneralizingEstimator is generic and will treat the last dimension as the one to be used for generalization testing. For convenience, here, we refer to it as different tasks. If \(X\) corresponds to epochs data then the last dimension is time.

This runs the analysis used in 6 and further detailed in 7:

# define the Temporal generalization object
time_gen = GeneralizingEstimator(clf, n_jobs=1, scoring='roc_auc',
                                 verbose=True)

scores = cross_val_multiscore(time_gen, X, y, cv=5, n_jobs=1)

# Mean scores across cross-validation splits
scores = np.mean(scores, axis=0)

# Plot the diagonal (it's exactly the same as the time-by-time decoding above)
fig, ax = plt.subplots()
ax.plot(epochs.times, np.diag(scores), label='score')
ax.axhline(.5, color='k', linestyle='--', label='chance')
ax.set_xlabel('Times')
ax.set_ylabel('AUC')
ax.legend()
ax.axvline(.0, color='k', linestyle='-')
ax.set_title('Decoding MEG sensors over time')
../../_images/sphx_glr_plot_sensors_decoding_005.png

Plot the full (generalization) matrix:

fig, ax = plt.subplots(1, 1)
im = ax.imshow(scores, interpolation='lanczos', origin='lower', cmap='RdBu_r',
               extent=epochs.times[[0, -1, 0, -1]], vmin=0., vmax=1.)
ax.set_xlabel('Testing Time (s)')
ax.set_ylabel('Training Time (s)')
ax.set_title('Temporal generalization')
ax.axvline(0, color='k')
ax.axhline(0, color='k')
plt.colorbar(im, ax=ax)
../../_images/sphx_glr_plot_sensors_decoding_006.png

Source-space decoding

Source space decoding is also possible, but because the number of features can be much larger than in the sensor space, univariate feature selection using ANOVA f-test (or some other metric) can be done to reduce the feature dimension. Interpreting decoding results might be easier in source space as compared to sensor space.

Exercise

  • Explore other datasets from MNE (e.g. Face dataset from SPM to predict Face vs. Scrambled)

References

1

Jean-Rémi King et al. (2018) “Encoding and Decoding Neuronal Dynamics: Methodological Framework to Uncover the Algorithms of Cognition”, 2018. The Cognitive Neurosciences VI. https://hal.archives-ouvertes.fr/hal-01848442/

2

Zoltan J. Koles. The quantitative extraction and topographic mapping of the abnormal components in the clinical EEG. Electroencephalography and Clinical Neurophysiology, 79(6):440–447, December 1991.

3

Dahne, S., Meinecke, F. C., Haufe, S., Hohne, J., Tangermann, M., Muller, K. R., & Nikulin, V. V. (2014). SPoC: a novel framework for relating the amplitude of neuronal oscillations to behaviorally relevant parameters. NeuroImage, 86, 111-122.

4

Rivet, B., Souloumiac, A., Attina, V., & Gibert, G. (2009). xDAWN algorithm to enhance evoked potentials: application to brain-computer interface. Biomedical Engineering, IEEE Transactions on, 56(8), 2035-2043.

5

Aaron Schurger, Sebastien Marti, and Stanislas Dehaene, “Reducing multi-sensor data to a single time course that reveals experimental effects”, BMC Neuroscience 2013, 14:122

6

Jean-Remi King, Alexandre Gramfort, Aaron Schurger, Lionel Naccache and Stanislas Dehaene, “Two distinct dynamic modes subtend the detection of unexpected sounds”, PLOS ONE, 2013, https://www.ncbi.nlm.nih.gov/pubmed/24475052

7

King & Dehaene (2014) ‘Characterizing the dynamics of mental representations: the temporal generalization method’, Trends In Cognitive Sciences, 18(4), 203-210. https://www.ncbi.nlm.nih.gov/pubmed/24593982

Total running time of the script: ( 0 minutes 25.064 seconds)

Estimated memory usage: 487 MB

Gallery generated by Sphinx-Gallery