Note

Go to the end to download the full example code

Working with Epoch metadata#

This tutorial shows how to add metadata to Epochs objects, and how to use Pandas query strings to select and plot epochs based on metadata properties.

For this tutorial we’ll use a different dataset than usual: the Kiloword dataset, which contains EEG data averaged across 75 subjects who were performing a lexical decision (word/non-word) task. The data is in Epochs format, with each epoch representing the response to a different stimulus (word). As usual we’ll start by importing the modules we need and loading the data:

import numpy as np
import pandas as pd
import mne

kiloword_data_folder = mne.datasets.kiloword.data_path()
kiloword_data_file = kiloword_data_folder / "kword_metadata-epo.fif"
epochs = mne.read_epochs(kiloword_data_file)

Reading /home/circleci/mne_data/MNE-kiloword-data/kword_metadata-epo.fif ...
Isotrak not found
    Found the data of interest:
        t =    -100.00 ...     920.00 ms
        0 CTF compensation matrices available
Adding metadata with 8 columns
960 matching events found
No baseline correction applied
0 projection items activated

Viewing `Epochs` metadata#

Restrictions on metadata DataFrames

Metadata dataframes are less flexible than typical Pandas DataFrames. For example, the allowed data types are restricted to strings, floats, integers, or booleans; and the row labels are always integers corresponding to epoch numbers. Other capabilities of DataFrames such as hierarchical indexing are possible while the Epochs object is in memory, but will not survive saving and reloading the Epochs object to/from disk.

The metadata attached to Epochs objects is stored as a pandas.DataFrame containing one row for each epoch. The columns of this DataFrame can contain just about any information you want to store about each epoch; in this case, the metadata encodes information about the stimulus seen on each trial, including properties of the visual word form itself (e.g., NumberOfLetters, VisualComplexity) as well as properties of what the word means (e.g., its Concreteness) and its prominence in the English lexicon (e.g., WordFrequency). Here are all the variables; note that in a Jupyter notebook, viewing a pandas.DataFrame gets rendered as an HTML table instead of the normal Python output block:

epochs.metadata

	WORD	Concreteness	WordFrequency	OrthographicDistance	NumberOfLetters	BigramFrequency	ConsonantVowelProportion	VisualComplexity
0	film	5.450000	3.189490	1.75	4.0	343.250	0.750	55.783710
1	cent	5.900000	3.700704	1.35	4.0	546.750	0.750	63.141553
2	shot	4.600000	2.858537	1.20	4.0	484.750	0.750	64.600033
3	cold	3.700000	3.454540	1.15	4.0	1095.250	0.750	63.657457
4	main	3.000000	3.539076	1.35	4.0	686.000	0.500	68.945661
...	...	...	...	...	...	...	...	...
955	drudgery	3.473684	1.556303	2.95	8.0	486.125	0.625	69.732357
956	reversal	3.700000	1.991226	2.65	8.0	859.000	0.625	60.545879
957	billiard	5.500000	1.672098	2.90	8.0	528.875	0.625	55.838597
958	adherent	3.450000	0.698970	2.55	8.0	615.625	0.625	68.088112
959	solenoid	4.111111	0.301030	3.70	8.0	443.250	0.500	64.544507

960 rows × 8 columns

Viewing the metadata values for a given epoch and metadata variable is done using any of the Pandas indexing methods such as loc, iloc, at, and iat. Because the index of the dataframe is the integer epoch number, the name- and index-based selection methods will work similarly for selecting rows, except that name-based selection (with loc) is inclusive of the endpoint:

print("Name-based selection with .loc")
print(epochs.metadata.loc[2:4])

print("\nIndex-based selection with .iloc")
print(epochs.metadata.iloc[2:4])

Name-based selection with .loc
   WORD  ...  VisualComplexity
2  shot  ...         64.600033
3  cold  ...         63.657457
4  main  ...         68.945661

[3 rows x 8 columns]

Index-based selection with .iloc
   WORD  ...  VisualComplexity
2  shot  ...         64.600033
3  cold  ...         63.657457

[2 rows x 8 columns]

Modifying the metadata#

Like any pandas.DataFrame, you can modify the data or add columns as needed. Here we convert the NumberOfLetters column from float to integer data type, and add a boolean column that arbitrarily divides the variable VisualComplexity into high and low groups.

epochs.metadata["NumberOfLetters"] = epochs.metadata["NumberOfLetters"].map(int)

epochs.metadata["HighComplexity"] = epochs.metadata["VisualComplexity"] > 65
epochs.metadata.head()

	WORD	Concreteness	WordFrequency	OrthographicDistance	NumberOfLetters	BigramFrequency	ConsonantVowelProportion	VisualComplexity	HighComplexity
0	film	5.45	3.189490	1.75	4	343.25	0.75	55.783710	False
1	cent	5.90	3.700704	1.35	4	546.75	0.75	63.141553	False
2	shot	4.60	2.858537	1.20	4	484.75	0.75	64.600033	False
3	cold	3.70	3.454540	1.15	4	1095.25	0.75	63.657457	False
4	main	3.00	3.539076	1.35	4	686.00	0.50	68.945661	True

Selecting epochs using metadata queries#

All Epochs objects can be subselected by event name, index, or slice (see Subselecting epochs). But Epochs objects with metadata can also be queried using Pandas query strings by passing the query string just as you would normally pass an event name. For example:

print(epochs['WORD.str.startswith("dis")'])

<EpochsFIF |  8 events (all good), -0.1 – 0.92 s, baseline off, ~499 kB, data loaded, with metadata,
 'district': 1
 'display': 1
 'disarray': 1
 'disaster': 1
 'disease': 1
 'discord': 1
 'disposal': 1
 'distance': 1>

This capability uses the pandas.DataFrame.query() method under the hood, so you can check out the documentation of that method to learn how to format query strings. Here’s another example:

print(epochs["Concreteness > 6 and WordFrequency < 1"])

<EpochsFIF |  4 events (all good), -0.1 – 0.92 s, baseline off, ~267 kB, data loaded, with metadata,
 'lasso': 1
 'tentacle': 1
 'banjo': 1
 'corsage': 1>

Note also that traditional epochs subselection by condition name still works; MNE-Python will try the traditional method first before falling back on rich metadata querying.

epochs["solenoid"].compute_psd().plot(picks="data", exclude="bads")

Using multitaper spectrum estimation with 7 DPSS windows

One use of the Pandas query string approach is to select specific words for plotting:

words = ["typhoon", "bungalow", "colossus", "drudgery", "linguist", "solenoid"]
epochs["WORD in {}".format(words)].plot(n_channels=29)

Notice that in this dataset, each “condition” (A.K.A., each word) occurs only once, whereas with the Sample dataset each condition (e.g., “auditory/left”, “visual/right”, etc) occurred dozens of times. This makes the Pandas querying methods especially useful when you want to aggregate epochs that have different condition names but that share similar stimulus properties. For example, here we group epochs based on the number of letters in the stimulus word, and compare the average signal at electrode Pz for each group:

evokeds = dict()
query = "NumberOfLetters == {}"
for n_letters in epochs.metadata["NumberOfLetters"].unique():
    evokeds[str(n_letters)] = epochs[query.format(n_letters)].average()

mne.viz.plot_compare_evokeds(evokeds, cmap=("word length", "viridis"), picks="Pz")

NOTE: pick_channels() is a legacy function. New code should use inst.pick(...).
NOTE: pick_channels() is a legacy function. New code should use inst.pick(...).
NOTE: pick_channels() is a legacy function. New code should use inst.pick(...).
NOTE: pick_channels() is a legacy function. New code should use inst.pick(...).
NOTE: pick_channels() is a legacy function. New code should use inst.pick(...).

Metadata can also be useful for sorting the epochs in an image plot. For example, here we order the epochs based on word frequency to see if there’s a pattern to the latency or intensity of the response:

sort_order = np.argsort(epochs.metadata["WordFrequency"])
epochs.plot_image(order=sort_order, picks="Pz")

Not setting metadata
960 matching events found
No baseline correction applied
0 projection items activated

Although there’s no obvious relationship in this case, such analyses may be useful for metadata variables that more directly index the time course of stimulus processing (such as reaction time).

Adding metadata to an `Epochs` object#

You can add a metadata DataFrame to any Epochs object (or replace existing metadata) simply by assigning to the metadata attribute:

new_metadata = pd.DataFrame(
    data=["foo"] * len(epochs), columns=["bar"], index=range(len(epochs))
)
epochs.metadata = new_metadata
epochs.metadata.head()

Replacing existing metadata with 1 columns

	bar
0	foo
1	foo
2	foo
3	foo
4	foo

You can remove metadata from an Epochs object by setting its metadata to None:

epochs.metadata = None

Removing existing metadata

Total running time of the script: ( 0 minutes 12.784 seconds)

Estimated memory usage: 10 MB

Gallery generated by Sphinx-Gallery

Working with Epoch metadata#

Viewing Epochs metadata#

Modifying the metadata#

Selecting epochs using metadata queries#

Adding metadata to an Epochs object#

Viewing `Epochs` metadata#

Adding metadata to an `Epochs` object#