# Export epochs to Pandas DataFrame¶

In this example the pandas exporter will be used to produce a DataFrame object. After exploring some basic features a split-apply-combine work flow will be conducted to examine the latencies of the response maxima across epochs and conditions.

Note

Equivalent methods are available for raw and evoked data objects.

## Short Pandas Primer¶

### Pandas Data Frames¶

A data frame can be thought of as a combination of matrix, list and dict: It knows about linear algebra and element-wise operations but is size mutable and allows for labeled access to its data. In addition, the pandas data frame class provides many useful methods for restructuring, reshaping and visualizing data. As most methods return data frame instances, operations can be chained with ease; this allows to write efficient one-liners. Technically a DataFrame can be seen as a high-level container for numpy arrays and hence switching back and forth between numpy arrays and DataFrames is very easy. Taken together, these features qualify data frames for inter operation with databases and for interactive data exploration / analysis. Additionally, pandas interfaces with the R statistical computing language that covers a huge amount of statistical functionality.

### Export Options¶

The pandas exporter comes with a few options worth being commented.

Pandas DataFrame objects use a so called hierarchical index. This can be thought of as an array of unique tuples, in our case, representing the higher dimensional MEG data in a 2D data table. The column names are the channel names from the epoch object. The channels can be accessed like entries of a dictionary:

>>> df['MEG 2333']


Epochs and time slices can be accessed with the .loc method:

>>> epochs_df.loc[(1, 2), 'MEG 2333']


However, it is also possible to include this index as regular categorial data columns which yields a long table format typically used for repeated measure designs. To take control of this feature, on export, you can specify which of the three dimensions ‘condition’, ‘epoch’ and ‘time’ is passed to the Pandas index using the index parameter. Note that this decision is revertible any time, as demonstrated below.

Similarly, for convenience, it is possible to scale the times, e.g. from seconds to milliseconds.

### Some Instance Methods¶

Most numpy methods and many ufuncs can be found as instance methods, e.g. mean, median, var, std, mul, , max, argmax etc. Below an incomplete listing of additional useful data frame instance methods:

applyapply function to data.

Any kind of custom function can be applied to the data. In combination with lambda this can be very useful.

describequickly generate summary stats

Very useful for exploring data.

groupbygenerate subgroups and initialize a ‘split-apply-combine’ operation.

Creates a group object. Subsequently, methods like apply, agg, or transform can be used to manipulate the underlying data separately but simultaneously. Finally, reset_index can be used to combine the results back into a data frame.

plotwrapper around plt.plot

However it comes with some special options. For examples see below.

shapeshape attribute

gets the dimensions of the data frame.

values :

return underlying numpy array.

to_records :

export data as numpy record array.

to_dict :

export data as dict of arrays.

# Author: Denis Engemann <denis.engemann@gmail.com>
#

import mne
import matplotlib.pyplot as plt
import numpy as np
from mne.datasets import sample

print(__doc__)

data_path = sample.data_path()
raw_fname = data_path + '/MEG/sample/sample_audvis_filt-0-40_raw.fif'
event_fname = data_path + '/MEG/sample/sample_audvis_filt-0-40_raw-eve.fif'

# These data already have an average EEG ref applied

# For simplicity we will only consider the first 10 epochs

picks = mne.pick_types(raw.info, meg='grad', eeg=True, eog=True,

tmin, tmax = -0.2, 0.5
baseline = (None, 0)

event_id = dict(auditory_l=1, auditory_r=2, visual_l=3, visual_r=4)

epochs = mne.Epochs(raw, events, event_id, tmin, tmax, proj=True, picks=picks,


Out:

Opening raw data file /home/circleci/mne_data/MNE-sample-data/MEG/sample/sample_audvis_filt-0-40_raw.fif...
Read a total of 4 projection items:
PCA-v1 (1 x 102)  idle
PCA-v2 (1 x 102)  idle
PCA-v3 (1 x 102)  idle
Average EEG reference (1 x 60)  idle
Range : 6450 ... 48149 =     42.956 ...   320.665 secs
10 matching events found
Applying baseline correction (mode: mean)
Created an SSP operator (subspace dimension = 1)
4 projection items activated
Rejecting  epoch based on EOG : ['EOG 061']


Export DataFrame

# The following parameters will scale the channels and times plotting
# friendly. The info columns 'epoch' and 'time' will be used as hierarchical
# index whereas the condition is treated as categorial data. Note that
# this is optional. By passing None you could also print out all nesting
# factors in a long table style commonly used for analyzing repeated measure
# designs.

index, scaling_time, scalings = ['epoch', 'time'], 1e3, dict(grad=1e13)

df = epochs.to_data_frame(picks=None, scalings=scalings,
scaling_time=scaling_time, index=index)

# Create MEG channel selector and drop EOG channel.
meg_chs = [c for c in df.columns if 'MEG' in c]

df.pop('EOG 061')  # this works just like with a list.


Out:

Converting "time" to "<class 'numpy.int64'>"...


Explore Pandas MultiIndex

# Pandas is using a MultiIndex or hierarchical index to handle higher
# dimensionality while at the same time representing data in a flat 2d manner.

print(df.index.names, df.index.levels)

# Inspecting the index object unveils that 'epoch', 'time' are used
# for subsetting data. We can take advantage of that by using the
# .loc attribute, where in this case the first position indexes the MultiIndex
# and the second the columns, that is, channels.

# Plot some channels across the first three epochs
xticks, sel = np.arange(3, 600, 120), meg_chs[:15]
df.loc[:3, sel].plot(xticks=xticks)
mne.viz.tight_layout()

# slice the time starting at t0 in epoch 2 and ending 500ms after
# the base line in epoch 3. Note that the second part of the tuple
# represents time in milliseconds from stimulus onset.
df.loc[(1, 0):(3, 500), sel].plot(xticks=xticks)
mne.viz.tight_layout()

# Note: For convenience the index was converted from floating point values
# to integer values. To restore the original values you can e.g. say
# df['times'] = np.tile(epoch.times, len(epochs_times)

# We now reset the index of the DataFrame to expose some Pandas
# pivoting functionality. To simplify the groupby operation we
# we drop the indices to treat epoch and time as categroial factors.

df = df.reset_index()

# The ensuing DataFrame then is split into subsets reflecting a crossing
# between condition and trial number. The idea is that we can broadcast
# operations into each cell simultaneously.

factors = ['condition', 'epoch']
sel = factors + ['MEG 1332', 'MEG 1342']
grouped = df[sel].groupby(factors)

# To make the plot labels more readable let's edit the values of 'condition'.
df.condition = df.condition.apply(lambda name: name + ' ')

# Now we compare the mean of two channels response across conditions.
grouped.mean().plot(kind='bar', stacked=True, title='Mean MEG Response',
color=['steelblue', 'orange'])
mne.viz.tight_layout()

# We can even accomplish more complicated tasks in a few lines calling
# apply method and passing a function. Assume we wanted to know the time
# slice of the maximum response for each condition.

max_latency = grouped[sel[2]].apply(lambda x: df.time[x.idxmax()])

print(max_latency)

# Then make the plot labels more readable let's edit the values of 'condition'.
df.condition = df.condition.apply(lambda name: name + ' ')

plt.figure()
max_latency.plot(kind='barh', title='Latency of Maximum Response',
color=['steelblue'])
mne.viz.tight_layout()

# Finally, we will again remove the index to create a proper data table that
# can be used with statistical packages like statsmodels or R.

final_df = max_latency.reset_index()
final_df.rename(columns={0: sel[2]})  # as the index is oblivious of names.

# The index is now written into regular columns so it can be used as factor.
print(final_df)

plt.show()

# To save as csv file, uncomment the next line.
# final_df.to_csv('my_epochs.csv')

# Note. Data Frames can be easily concatenated, e.g., across subjects.
# E.g. say:
#
# import pandas as pd
# group = pd.concat([df_1, df_2])
# group['subject'] = np.r_[np.ones(len(df_1)), np.ones(len(df_2)) + 1]


Out:

['epoch', 'time'] [[0, 1, 2, 3, 4, 5, 6, 7, 8], [-200, -193, -186, -180, -173, -166, -160, -153, -147, -140, -133, -127, -120, -113, -107, -100, -93, -87, -80, -73, -67, -60, -53, -47, -40, -33, -27, -20, -13, -7, 0, 7, 13, 20, 27, 33, 40, 47, 53, 60, 67, 73, 80, 87, 93, 100, 107, 113, 120, 127, 133, 140, 147, 153, 160, 166, 173, 180, 186, 193, 200, 206, 213, 220, 226, 233, 240, 246, 253, 260, 266, 273, 280, 286, 293, 300, 306, 313, 320, 326, 333, 340, 346, 353, 360, 366, 373, 380, 386, 393, 400, 406, 413, 420, 426, 433, 440, 446, 453, 460, ...]]
condition   epoch
auditory_l  1        100
5        186
auditory_r  3        100
7         13
visual_l    0        -47
4        100
8        406
visual_r    2        393
6        353
Name: MEG 1332, dtype: int64
condition  ...  MEG 1332
0  auditory_l  ...       100
1  auditory_l  ...       186
2  auditory_r  ...       100
3  auditory_r  ...        13
4    visual_l  ...       -47
5    visual_l  ...       100
6    visual_l  ...       406
7    visual_r  ...       393
8    visual_r  ...       353

[9 rows x 3 columns]


Long-format dataframes

# Many statistical modelling functions expect data in a long format
# where each row is one observation at a unique coordinate of factors
# such as sensors, conditions, subjects etc.
df_long = epochs.to_data_frame(long_format=True)

# Here the MEG or EEG signal appears in the column "observation".
# The total length is therefore the number of channels times the time points.
print(len(df_long), "=", epochs.get_data().size)

# To simplify subsetting and filtering a channwel type column is added.

# Note that some of the columns are transformed to "category" data types.
print(df_long.dtypes)


Out:

Converting "condition" to "category"...
Converting "epoch" to "category"...
Converting "channel" to "category"...
Converting "ch_type" to "category"...
condition  ... ch_type

[5 rows x 6 columns]
250902 = 250902
condition  ... ch_type
203  visual_l  ...     eeg
204  visual_l  ...     eeg
205  visual_l  ...     eeg
206  visual_l  ...     eeg
207  visual_l  ...     eeg

[5 rows x 6 columns]
condition      category
epoch          category
time            float64
channel        category
observation     float64
ch_type        category
dtype: object


The pandas exporter facilitates processing MNE outputs in R: https://mne-tools.github.io/mne-r/articles/plot_evoked_multilevel_model.html

Total running time of the script: ( 0 minutes 3.970 seconds)

Estimated memory usage: 9 MB

Gallery generated by Sphinx-Gallery