Using cross-validation when computing RDMs#

This example demonstrates how to perform cross-validation when computing dissimilarity matrices (RDMs). When the data has repeated measurements of the same stimulus type, cross-validation can be used to provide much more robust distance estimates between stimulus types. Repeated measurements can for example be actual repetitions of the same stimulus within the same recording, or recordings on multiple volunteers with the same stimuli.

The dataset will be the kiloword dataset [1]: approximately 1,000 words were presented to 75 participants in a go/no-go lexical decision task while event-related potentials (ERPs) were recorded.

This dataset as provided does not have repeated measurements of the same stimuli. To illustrate cross-validation, we will treat words with the same number of letters as being repeated measurements of the same stimulus type.

Authors:
Marijn van Vliet <marijn.vanvliet@aalto.fi>
# Import required packages
import mne
import mne_rsa

MNE-Python contains a built-in data loader for the kiloword dataset. We use it here to read it as 960 epochs. Each epoch represents the brain response to a single word, averaged across all the participants. For this example, we speed up the computation, at a cost of temporal precision, by downsampling the data from the original 250 Hz. to 100 Hz.

data_path = mne.datasets.kiloword.data_path(verbose=True)
epochs = mne.read_epochs(data_path / "kword_metadata-epo.fif")
epochs = epochs.resample(100)
Reading /home/runner/mne_data/MNE-kiloword-data/kword_metadata-epo.fif ...
Isotrak not found
    Found the data of interest:
        t =    -100.00 ...     920.00 ms
        0 CTF compensation matrices available
Adding metadata with 8 columns
960 matching events found
No baseline correction applied
0 projection items activated

The epochs object contains a .metadata field that contains information about the 960 words that were used in the experiment. Let’s have a look at the metadata for the 10 random words:

epochs.metadata.sample(10)
WORD Concreteness WordFrequency OrthographicDistance NumberOfLetters BigramFrequency ConsonantVowelProportion VisualComplexity
142 nebula 4.368421 1.623249 2.85 6.0 266.166667 0.500000 67.767866
667 lunatic 4.000000 1.690196 2.95 7.0 502.142857 0.571429 56.254641
34 beer 6.050000 2.920645 1.30 4.0 562.750000 0.500000 71.383193
646 conduct 2.850000 2.445604 2.35 7.0 956.714286 0.714286 64.737597
787 frame 5.750000 2.646404 1.75 5.0 582.400000 0.600000 68.284290
776 plane 5.750000 2.911690 1.40 5.0 637.200000 0.600000 68.576867
781 scale 5.100000 3.118595 1.55 5.0 467.200000 0.600000 65.258107
337 spring 4.100000 3.061075 1.55 6.0 574.500000 0.833333 65.787796
552 stern 3.650000 2.190332 1.90 5.0 549.800000 0.800000 60.215695
953 antidote 4.850000 1.544068 3.30 8.0 461.750000 0.500000 63.860049


The kiloword dataset as provided does not have repeated measurements of the same stimuli. To illustrate cross-validation, we will treat words with the same number of letters as being repeated measurements of the same stimulus type.

To denote which epochs are repetitions of the same stimulus, we create a list labels that contains a label for each epoch indicating to which stimulus it belongs. Repetitions of the same stimulus need to have the same label, hence we will use the NumberOfLetters field of the metadata as label.

labels = epochs.metadata.NumberOfLetters.astype(int)

Many high-level functions in the MNE-RSA module can take the y list as a parameter to enable cross-validation. Notably the functions for performing RSA and computing RDMs. In this example, we will restrict the analysis to computing RDMs using a spatio-temporal searchlight on the sensor-level data.

rdms = mne_rsa.rdm_epochs(
    epochs,  # The EEG data
    labels=labels,  # Set labels to enable cross validation
    n_folds=5,  # Number of folds to use during cross validation
    dist_metric="sqeuclidean",  # Distance metric to compute the RDMs
    spatial_radius=0.45,  # Spatial radius of the searchlight patch in meters.
    temporal_radius=0.05,  # Temporal radius of the searchlight path in seconds.
    tmin=0.15,
    tmax=0.25,
    n_jobs=1,  # Use this to specify the number of CPU cores to use.
)  # To save time, only analyze this time interval

Plotting the cross-validated RDMs

mne_rsa.plot_rdms_topo(rdms, epochs.info)
Time point: 0
Creating spatio-temporal searchlight patches

<Figure size 640x480 with 29 Axes>

For performance reasons, the low-level functions of MNE-RSA do not take a y list for cross-validation. Instead, they require the data to be already split into folds. The mne_rsa.create_folds() function can create these folds.

X = epochs.get_data()
y = epochs.metadata.NumberOfLetters.astype(int)
folds = mne_rsa.create_folds(X, y, n_folds=5)

rdm = mne_rsa.compute_rdm_cv(folds, metric="euclidean")
mne_rsa.plot_rdms(rdm)
plot cross validation
<Figure size 200x200 with 2 Axes>

Total running time of the script: (0 minutes 2.687 seconds)

Gallery generated by Sphinx-Gallery