Note

Click here to download the full example code

Repairing artifacts with ICA¶

This tutorial covers the basics of independent components analysis (ICA) and shows how ICA can be used for artifact repair; an extended example illustrates repair of ocular and heartbeat artifacts.

Page contents

What is ICA?
- ICA in MNE-Python
Example: EOG and ECG artifact repair
References

We begin as always by importing the necessary Python modules and loading some example data. Because ICA can be computationally intense, we’ll also crop the data to 60 seconds; and to save ourselves from repeatedly typing mne.preprocessing we’ll directly import a few functions and classes from that submodule:

import os
import mne
from mne.preprocessing import (ICA, create_eog_epochs, create_ecg_epochs,
                               corrmap)

sample_data_folder = mne.datasets.sample.data_path()
sample_data_raw_file = os.path.join(sample_data_folder, 'MEG', 'sample',
                                    'sample_audvis_raw.fif')
raw = mne.io.read_raw_fif(sample_data_raw_file)
raw.crop(tmax=60.)

Out:

Opening raw data file /home/circleci/mne_data/MNE-sample-data/MEG/sample/sample_audvis_raw.fif...
    Read a total of 3 projection items:
        PCA-v1 (1 x 102)  idle
        PCA-v2 (1 x 102)  idle
        PCA-v3 (1 x 102)  idle
    Range : 25800 ... 192599 =     42.956 ...   320.670 secs
Ready.

Note

Before applying ICA (or any artifact repair strategy), be sure to observe the artifacts in your data to make sure you choose the right repair tool. Sometimes the right tool is no tool at all — if the artifacts are small enough you may not even need to repair them to get good analysis results. See Overview of artifact detection for guidance on detecting and visualizing various types of artifact.

What is ICA?¶

Independent components analysis (ICA) is a technique for estimating independent source signals from a set of recordings in which the source signals were mixed together in unknown ratios. A common example of this is the problem of blind source separation: with 3 musical instruments playing in the same room, and 3 microphones recording the performance (each picking up all 3 instruments, but at varying levels), can you somehow “unmix” the signals recorded by the 3 microphones so that you end up with a separate “recording” isolating the sound of each instrument?

It is not hard to see how this analogy applies to EEG/MEG analysis: there are many “microphones” (sensor channels) simultaneously recording many “instruments” (blinks, heartbeats, activity in different areas of the brain, muscular activity from jaw clenching or swallowing, etc). As long as these various source signals are statistically independent and non-gaussian, it is usually possible to separate the sources using ICA, and then re-construct the sensor signals after excluding the sources that are unwanted.

ICA in MNE-Python ¶

If you want to perform ICA with no dimensionality reduction (other than the number of Independent Components (ICs) given in n_components, and any subsequent exclusion of ICs you specify in ICA.exclude), pass n_pca_components=None (this is the default value).

However, if you do want to reduce dimensionality, consider this example: if you have 300 sensor channels and you set n_pca_components=None and n_components=50, then the the first 50 PCs are sent to the ICA algorithm (yielding 50 ICs), and during reconstruction apply will use the 50 ICs plus PCs number 51-300 (the full PCA residual). If instead you specify n_pca_components=120 in apply, it will reconstruct using the 50 ICs plus the first 70 PCs in the PCA residual (numbers 51-120), thus discarding the smallest 180 components.

If you have previously been using EEGLAB’s runica() and are looking for the equivalent of its 'pca', n option to reduce dimensionality via PCA before the ICA step, set n_components=n during initialization and pass n_pca_components=n to apply.

MNE-Python implements three different ICA algorithms: fastica (the default), picard, and infomax. FastICA and Infomax are both in fairly widespread use; Picard is a newer (2017) algorithm that is expected to converge faster than FastICA and Infomax, and is more robust than other algorithms in cases where the sources are not completely independent, which typically happens with real EEG/MEG data. See 1 for more information.

The ICA interface in MNE-Python is similar to the interface in scikit-learn: some general parameters are specified when creating an ICA object, then the ICA object is fit to the data using its fit method. The results of the fitting are added to the ICA object as attributes that end in an underscore (_), such as ica.mixing_matrix_ and ica.unmixing_matrix_. After fitting, the ICA component(s) that you want to remove must be chosen, and the ICA fit must then be applied to the Raw or Epochs object using the ICA object’s apply method.

As is typically done with ICA, the data are first scaled to unit variance and whitened using principal components analysis (PCA) before performing the ICA decomposition. This is a two-stage process:

To deal with different channel types having different units (e.g., Volts for EEG and Tesla for MEG), data must be pre-whitened. If noise_cov=None (default), all data of a given channel type is scaled by the standard deviation across all channels. If noise_cov is a Covariance, the channels are pre-whitened using the covariance.
The pre-whitened data are then decomposed using PCA.

From the resulting principal components (PCs), the first n_components are then passed to the ICA algorithm if n_components is an integer number. It can also be a float between 0 and 1, specifying the fraction of explained variance that the PCs should capture; the appropriate number of PCs (i.e., just as many PCs as are required to explain the given fraction of total variance) is then passed to the ICA.

After visualizing the Independent Components (ICs) and excluding any that capture artifacts you want to repair, the sensor signal can be reconstructed using the ICA object’s apply method. By default, signal reconstruction uses all of the ICs (less any ICs listed in ICA.exclude) plus all of the PCs that were not included in the ICA decomposition (i.e., the “PCA residual”). If you want to reduce the number of components used at the reconstruction stage, it is controlled by the n_pca_components parameter (which will in turn reduce the rank of your data; by default n_pca_components=None resulting in no additional dimensionality reduction). The fitting and reconstruction procedures and the parameters that control dimensionality at various stages are summarized in the diagram below:

See the Notes section of the ICA documentation for further details. Next we’ll walk through an extended example that illustrates each of these steps in greater detail.

Example: EOG and ECG artifact repair ¶

Visualizing the artifacts ¶

Let’s begin by visualizing the artifacts that we want to repair. In this dataset they are big enough to see easily in the raw data:

# pick some channels that clearly show heartbeats and blinks
regexp = r'(MEG [12][45][123]1|EEG 00.)'
artifact_picks = mne.pick_channels_regexp(raw.ch_names, regexp=regexp)
raw.plot(order=artifact_picks, n_channels=len(artifact_picks),
         show_scrollbars=False)

We can get a summary of how the ocular artifact manifests across each channel type using create_eog_epochs like we did in the Overview of artifact detection tutorial:

eog_evoked = create_eog_epochs(raw).average()
eog_evoked.apply_baseline(baseline=(None, -0.2))
eog_evoked.plot_joint()

Out:

EOG channel index for this subject is: [375]
Filtering the data to remove DC offset to help distinguish blinks from saccades
Setting up band-pass filter from 1 - 10 Hz

FIR filter parameters
---------------------
Designing a two-pass forward and reverse, zero-phase, non-causal bandpass filter:
- Windowed frequency-domain design (firwin2) method
- Hann window
- Lower passband edge: 1.00
- Lower transition bandwidth: 0.50 Hz (-12 dB cutoff frequency: 0.75 Hz)
- Upper passband edge: 10.00 Hz
- Upper transition bandwidth: 0.50 Hz (-12 dB cutoff frequency: 10.25 Hz)
- Filter length: 6007 samples (10.001 sec)

Now detecting blinks and generating corresponding events
Found 10 significant peaks
Number of EOG events detected : 10
Not setting metadata
Not setting metadata
10 matching events found
No baseline correction applied
Created an SSP operator (subspace dimension = 3)
Loading data for 10 events and 601 original time points ...
0 bad epochs dropped
Applying baseline correction (mode: mean)
Created an SSP operator (subspace dimension = 3)
3 projection items activated
SSP projectors applied...
Removing projector <Projection | PCA-v1, active : True, n_channels : 102>
Removing projector <Projection | PCA-v2, active : True, n_channels : 102>
Removing projector <Projection | PCA-v3, active : True, n_channels : 102>
Removing projector <Projection | PCA-v1, active : True, n_channels : 102>
Removing projector <Projection | PCA-v2, active : True, n_channels : 102>
Removing projector <Projection | PCA-v3, active : True, n_channels : 102>

Now we’ll do the same for the heartbeat artifacts, using create_ecg_epochs:

ecg_evoked = create_ecg_epochs(raw).average()
ecg_evoked.apply_baseline(baseline=(None, -0.2))
ecg_evoked.plot_joint()

Out:

Reconstructing ECG signal from Magnetometers
Setting up band-pass filter from 8 - 16 Hz

FIR filter parameters
---------------------
Designing a two-pass forward and reverse, zero-phase, non-causal bandpass filter:
- Windowed frequency-domain design (firwin2) method
- Hann window
- Lower passband edge: 8.00
- Lower transition bandwidth: 0.50 Hz (-12 dB cutoff frequency: 7.75 Hz)
- Upper passband edge: 16.00 Hz
- Upper transition bandwidth: 0.50 Hz (-12 dB cutoff frequency: 16.25 Hz)
- Filter length: 6007 samples (10.001 sec)

Number of ECG events detected : 59 (average pulse 58 / min.)
Not setting metadata
Not setting metadata
59 matching events found
No baseline correction applied
Created an SSP operator (subspace dimension = 3)
Loading data for 59 events and 601 original time points ...
0 bad epochs dropped
Applying baseline correction (mode: mean)
Created an SSP operator (subspace dimension = 3)
3 projection items activated
SSP projectors applied...
Removing projector <Projection | PCA-v1, active : True, n_channels : 102>
Removing projector <Projection | PCA-v2, active : True, n_channels : 102>
Removing projector <Projection | PCA-v3, active : True, n_channels : 102>
Removing projector <Projection | PCA-v1, active : True, n_channels : 102>
Removing projector <Projection | PCA-v2, active : True, n_channels : 102>
Removing projector <Projection | PCA-v3, active : True, n_channels : 102>

Filtering to remove slow drifts ¶

Before we run the ICA, an important step is filtering the data to remove low-frequency drifts, which can negatively affect the quality of the ICA fit. The slow drifts are problematic because they reduce the independence of the assumed-to-be-independent sources (e.g., during a slow upward drift, the neural, heartbeat, blink, and other muscular sources will all tend to have higher values), making it harder for the algorithm to find an accurate solution. A high-pass filter with 1 Hz cutoff frequency is recommended. However, because filtering is a linear operation, the ICA solution found from the filtered signal can be applied to the unfiltered signal (see 2 for more information), so we’ll keep a copy of the unfiltered Raw object around so we can apply the ICA solution to it later.

filt_raw = raw.copy()
filt_raw.load_data().filter(l_freq=1., h_freq=None)

Out:

Reading 0 ... 36037  =      0.000 ...    60.000 secs...
Filtering raw data in 1 contiguous segment
Setting up high-pass filter at 1 Hz

FIR filter parameters
---------------------
Designing a one-pass, zero-phase, non-causal highpass filter:
- Windowed time-domain design (firwin) method
- Hamming window with 0.0194 passband ripple and 53 dB stopband attenuation
- Lower passband edge: 1.00
- Lower transition bandwidth: 1.00 Hz (-6 dB cutoff frequency: 0.50 Hz)
- Filter length: 1983 samples (3.302 sec)

Fitting and plotting the ICA solution ¶

Now we’re ready to set up and fit the ICA. Since we know (from observing our raw data) that the EOG and ECG artifacts are fairly strong, we would expect those artifacts to be captured in the first few dimensions of the PCA decomposition that happens before the ICA. Therefore, we probably don’t need a huge number of components to do a good job of isolating our artifacts (though it is usually preferable to include more components for a more accurate solution). As a first guess, we’ll run ICA with n_components=15 (use only the first 15 PCA components to compute the ICA decomposition) — a very small number given that our data has over 300 channels, but with the advantage that it will run quickly and we will able to tell easily whether it worked or not (because we already know what the EOG / ECG artifacts should look like).

ICA fitting is not deterministic (e.g., the components may get a sign flip on different runs, or may not always be returned in the same order), so we’ll also specify a random seed so that we get identical results each time this tutorial is built by our web servers.

ica = ICA(n_components=15, random_state=97)
ica.fit(filt_raw)

Out:

Fitting ICA to data using 364 channels (please be patient, this may take a while)
Selecting by number: 15 components
Fitting ICA took 6.7s.

Some optional parameters that we could have passed to the fit method include decim (to use only every Nth sample in computing the ICs, which can yield a considerable speed-up) and reject (for providing a rejection dictionary for maximum acceptable peak-to-peak amplitudes for each channel type, just like we used when creating epoched data in the Overview of MEG/EEG analysis with MNE-Python tutorial).

Now we can examine the ICs to see what they captured. plot_sources will show the time series of the ICs. Note that in our call to plot_sources we can use the original, unfiltered Raw object:

raw.load_data()
ica.plot_sources(raw, show_scrollbars=False)

Out:

Reading 0 ... 36037  =      0.000 ...    60.000 secs...
Creating RawArray with float64 data, n_channels=16, n_times=36038
    Range : 25800 ... 61837 =     42.956 ...   102.956 secs
Ready.

Here we can pretty clearly see that the first component (ICA000) captures the EOG signal quite well, and the second component (ICA001) looks a lot like a heartbeat (for more info on visually identifying Independent Components, this EEGLAB tutorial is a good resource). We can also visualize the scalp field distribution of each component using plot_components. These are interpolated based on the values in the ICA mixing matrix:

ica.plot_components()

ICA components, ICA000, ICA001, ICA002, ICA003, ICA004, ICA005, ICA006, ICA007, ICA008, ICA009, ICA010, ICA011, ICA012, ICA013, ICA014

Note

plot_components (which plots the scalp field topographies for each component) has an optional inst parameter that takes an instance of Raw or Epochs. Passing inst makes the scalp topographies interactive: clicking one will bring up a diagnostic plot_properties window (see below) for that component.

In the plots above it’s fairly obvious which ICs are capturing our EOG and ECG artifacts, but there are additional ways visualize them anyway just to be sure. First, we can plot an overlay of the original signal against the reconstructed signal with the artifactual ICs excluded, using plot_overlay:

# blinks
ica.plot_overlay(raw, exclude=[0], picks='eeg')
# heartbeats
ica.plot_overlay(raw, exclude=[1], picks='mag')

Out:

Applying ICA to Raw instance
    Transforming to ICA space (15 components)
    Zeroing out 1 ICA component
    Projecting back using 364 PCA components
Applying ICA to Raw instance
    Transforming to ICA space (15 components)
    Zeroing out 1 ICA component
    Projecting back using 364 PCA components

We can also plot some diagnostics of each IC using plot_properties:

ica.plot_properties(raw, picks=[0, 1])

Out:

    Using multitaper spectrum estimation with 7 DPSS windows
Not setting metadata
Not setting metadata
30 matching events found
No baseline correction applied
0 projection items activated
0 bad epochs dropped
Not setting metadata
Not setting metadata
30 matching events found
No baseline correction applied
0 projection items activated
0 bad epochs dropped

In the remaining sections, we’ll look at different ways of choosing which ICs to exclude prior to reconstructing the sensor signals.

Selecting ICA components manually ¶

Once we’re certain which components we want to exclude, we can specify that manually by setting the ica.exclude attribute. Similar to marking bad channels, merely setting ica.exclude doesn’t do anything immediately (it just adds the excluded ICs to a list that will get used later when it’s needed). Once the exclusions have been set, ICA methods like plot_overlay will exclude those component(s) even if no exclude parameter is passed, and the list of excluded components will be preserved when using mne.preprocessing.ICA.save and mne.preprocessing.read_ica.

ica.exclude = [0, 1]  # indices chosen based on various plots above

Now that the exclusions have been set, we can reconstruct the sensor signals with artifacts removed using the apply method (remember, we’re applying the ICA solution from the filtered data to the original unfiltered signal). Plotting the original raw data alongside the reconstructed data shows that the heartbeat and blink artifacts are repaired.

# ica.apply() changes the Raw object in-place, so let's make a copy first:
reconst_raw = raw.copy()
ica.apply(reconst_raw)

raw.plot(order=artifact_picks, n_channels=len(artifact_picks),
         show_scrollbars=False)
reconst_raw.plot(order=artifact_picks, n_channels=len(artifact_picks),
                 show_scrollbars=False)
del reconst_raw

Out:

Applying ICA to Raw instance
    Transforming to ICA space (15 components)
    Zeroing out 2 ICA components
    Projecting back using 364 PCA components

Using an EOG channel to select ICA components ¶

It may have seemed easy to review the plots and manually select which ICs to exclude, but when processing dozens or hundreds of subjects this can become a tedious, rate-limiting step in the analysis pipeline. One alternative is to use dedicated EOG or ECG sensors as a “pattern” to check the ICs against, and automatically mark for exclusion any ICs that match the EOG/ECG pattern. Here we’ll use find_bads_eog to automatically find the ICs that best match the EOG signal, then use plot_scores along with our other plotting functions to see which ICs it picked. We’ll start by resetting ica.exclude back to an empty list:

ica.exclude = []
# find which ICs match the EOG pattern
eog_indices, eog_scores = ica.find_bads_eog(raw)
ica.exclude = eog_indices

# barplot of ICA component "EOG match" scores
ica.plot_scores(eog_scores)

# plot diagnostics
ica.plot_properties(raw, picks=eog_indices)

# plot ICs applied to raw data, with EOG matches highlighted
ica.plot_sources(raw, show_scrollbars=False)

# plot ICs applied to the averaged EOG epochs, with EOG matches highlighted
ica.plot_sources(eog_evoked)

Out:

... filtering ICA sources
Setting up band-pass filter from 1 - 10 Hz

FIR filter parameters
---------------------
Designing a two-pass forward and reverse, zero-phase, non-causal bandpass filter:
- Windowed frequency-domain design (firwin2) method
- Hann window
- Lower passband edge: 1.00
- Lower transition bandwidth: 0.50 Hz (-12 dB cutoff frequency: 0.75 Hz)
- Upper passband edge: 10.00 Hz
- Upper transition bandwidth: 0.50 Hz (-12 dB cutoff frequency: 10.25 Hz)
- Filter length: 6007 samples (10.001 sec)

... filtering target
Setting up band-pass filter from 1 - 10 Hz

FIR filter parameters
---------------------
Designing a two-pass forward and reverse, zero-phase, non-causal bandpass filter:
- Windowed frequency-domain design (firwin2) method
- Hann window
- Lower passband edge: 1.00
- Lower transition bandwidth: 0.50 Hz (-12 dB cutoff frequency: 0.75 Hz)
- Upper passband edge: 10.00 Hz
- Upper transition bandwidth: 0.50 Hz (-12 dB cutoff frequency: 10.25 Hz)
- Filter length: 6007 samples (10.001 sec)

    Using multitaper spectrum estimation with 7 DPSS windows
Not setting metadata
Not setting metadata
30 matching events found
No baseline correction applied
0 projection items activated
0 bad epochs dropped
Creating RawArray with float64 data, n_channels=16, n_times=36038
    Range : 25800 ... 61837 =     42.956 ...   102.956 secs
Ready.

Note that above we used plot_sources on both the original Raw instance and also on an Evoked instance of the extracted EOG artifacts. This can be another way to confirm that find_bads_eog has identified the correct components.

Using a simulated channel to select ICA components ¶

If you don’t have an EOG channel, find_bads_eog has a ch_name parameter that you can use as a proxy for EOG. You can use a single channel, or create a bipolar reference from frontal EEG sensors and use that as virtual EOG channel. This carries a risk however: you must hope that the frontal EEG channels only reflect EOG and not brain dynamics in the prefrontal cortex (or you must not care about those prefrontal signals).

For ECG, it is easier: find_bads_ecg can use cross-channel averaging of magnetometer or gradiometer channels to construct a virtual ECG channel, so if you have MEG channels it is usually not necessary to pass a specific channel name. find_bads_ecg also has two options for its method parameter: 'ctps' (cross-trial phase statistics 3) and 'correlation' (Pearson correlation between data and ECG channel).

ica.exclude = []
# find which ICs match the ECG pattern
ecg_indices, ecg_scores = ica.find_bads_ecg(raw, method='correlation',
                                            threshold='auto')
ica.exclude = ecg_indices

# barplot of ICA component "ECG match" scores
ica.plot_scores(ecg_scores)

# plot diagnostics
ica.plot_properties(raw, picks=ecg_indices)

# plot ICs applied to raw data, with ECG matches highlighted
ica.plot_sources(raw, show_scrollbars=False)

# plot ICs applied to the averaged ECG epochs, with ECG matches highlighted
ica.plot_sources(ecg_evoked)

Out:

Reconstructing ECG signal from Magnetometers
... filtering ICA sources
Setting up band-pass filter from 8 - 16 Hz

FIR filter parameters
---------------------
Designing a two-pass forward and reverse, zero-phase, non-causal bandpass filter:
- Windowed frequency-domain design (firwin2) method
- Hann window
- Lower passband edge: 8.00
- Lower transition bandwidth: 0.50 Hz (-12 dB cutoff frequency: 7.75 Hz)
- Upper passband edge: 16.00 Hz
- Upper transition bandwidth: 0.50 Hz (-12 dB cutoff frequency: 16.25 Hz)
- Filter length: 6007 samples (10.001 sec)

... filtering target
Setting up band-pass filter from 8 - 16 Hz

FIR filter parameters
---------------------
Designing a two-pass forward and reverse, zero-phase, non-causal bandpass filter:
- Windowed frequency-domain design (firwin2) method
- Hann window
- Lower passband edge: 8.00
- Lower transition bandwidth: 0.50 Hz (-12 dB cutoff frequency: 7.75 Hz)
- Upper passband edge: 16.00 Hz
- Upper transition bandwidth: 0.50 Hz (-12 dB cutoff frequency: 16.25 Hz)
- Filter length: 6007 samples (10.001 sec)

    Using multitaper spectrum estimation with 7 DPSS windows
Not setting metadata
Not setting metadata
30 matching events found
No baseline correction applied
0 projection items activated
0 bad epochs dropped
Creating RawArray with float64 data, n_channels=16, n_times=36038
    Range : 25800 ... 61837 =     42.956 ...   102.956 secs
Ready.

The last of these plots is especially useful: it shows us that the heartbeat artifact is coming through on two ICs, and we’ve only caught one of them. In fact, if we look closely at the output of plot_sources (online, you can right-click → “view image” to zoom in), it looks like ICA014 has a weak periodic component that is in-phase with ICA001. It might be worthwhile to re-run the ICA with more components to see if that second heartbeat artifact resolves out a little better:

# refit the ICA with 30 components this time
new_ica = ICA(n_components=30, random_state=97)
new_ica.fit(filt_raw)

# find which ICs match the ECG pattern
ecg_indices, ecg_scores = new_ica.find_bads_ecg(raw, method='correlation',
                                                threshold='auto')
new_ica.exclude = ecg_indices

# barplot of ICA component "ECG match" scores
new_ica.plot_scores(ecg_scores)

# plot diagnostics
new_ica.plot_properties(raw, picks=ecg_indices)

# plot ICs applied to raw data, with ECG matches highlighted
new_ica.plot_sources(raw, show_scrollbars=False)

# plot ICs applied to the averaged ECG epochs, with ECG matches highlighted
new_ica.plot_sources(ecg_evoked)

Out:

Fitting ICA to data using 364 channels (please be patient, this may take a while)
Selecting by number: 30 components
Fitting ICA took 6.5s.
Reconstructing ECG signal from Magnetometers
... filtering ICA sources
Setting up band-pass filter from 8 - 16 Hz

FIR filter parameters
---------------------
Designing a two-pass forward and reverse, zero-phase, non-causal bandpass filter:
- Windowed frequency-domain design (firwin2) method
- Hann window
- Lower passband edge: 8.00
- Lower transition bandwidth: 0.50 Hz (-12 dB cutoff frequency: 7.75 Hz)
- Upper passband edge: 16.00 Hz
- Upper transition bandwidth: 0.50 Hz (-12 dB cutoff frequency: 16.25 Hz)
- Filter length: 6007 samples (10.001 sec)

... filtering target
Setting up band-pass filter from 8 - 16 Hz

FIR filter parameters
---------------------
Designing a two-pass forward and reverse, zero-phase, non-causal bandpass filter:
- Windowed frequency-domain design (firwin2) method
- Hann window
- Lower passband edge: 8.00
- Lower transition bandwidth: 0.50 Hz (-12 dB cutoff frequency: 7.75 Hz)
- Upper passband edge: 16.00 Hz
- Upper transition bandwidth: 0.50 Hz (-12 dB cutoff frequency: 16.25 Hz)
- Filter length: 6007 samples (10.001 sec)

    Using multitaper spectrum estimation with 7 DPSS windows
Not setting metadata
Not setting metadata
30 matching events found
No baseline correction applied
0 projection items activated
0 bad epochs dropped
Not setting metadata
Not setting metadata
30 matching events found
No baseline correction applied
0 projection items activated
0 bad epochs dropped
Creating RawArray with float64 data, n_channels=31, n_times=36038
    Range : 25800 ... 61837 =     42.956 ...   102.956 secs
Ready.

Much better! Now we’ve captured both ICs that are reflecting the heartbeat artifact (and as a result, we got two diagnostic plots: one for each IC that reflects the heartbeat). This demonstrates the value of checking the results of automated approaches like find_bads_ecg before accepting them.

# clean up memory before moving on
del raw, filt_raw, ica, new_ica

Selecting ICA components using template matching ¶

When dealing with multiple subjects, it is also possible to manually select an IC for exclusion on one subject, and then use that component as a template for selecting which ICs to exclude from other subjects’ data, using mne.preprocessing.corrmap 4. The idea behind corrmap is that the artifact patterns are similar enough across subjects that corresponding ICs can be identified by correlating the ICs from each ICA solution with a common template, and picking the ICs with the highest correlation strength. corrmap takes a list of ICA solutions, and a template parameter that specifies which ICA object and which component within it to use as a template.

Since our sample dataset only contains data from one subject, we’ll use a different dataset with multiple subjects: the EEGBCI dataset 5 6. The dataset has 109 subjects, we’ll just download one run (a left/right hand movement task) from each of the first 4 subjects:

mapping = {
    'Fc5.': 'FC5', 'Fc3.': 'FC3', 'Fc1.': 'FC1', 'Fcz.': 'FCz', 'Fc2.': 'FC2',
    'Fc4.': 'FC4', 'Fc6.': 'FC6', 'C5..': 'C5', 'C3..': 'C3', 'C1..': 'C1',
    'Cz..': 'Cz', 'C2..': 'C2', 'C4..': 'C4', 'C6..': 'C6', 'Cp5.': 'CP5',
    'Cp3.': 'CP3', 'Cp1.': 'CP1', 'Cpz.': 'CPz', 'Cp2.': 'CP2', 'Cp4.': 'CP4',
    'Cp6.': 'CP6', 'Fp1.': 'Fp1', 'Fpz.': 'Fpz', 'Fp2.': 'Fp2', 'Af7.': 'AF7',
    'Af3.': 'AF3', 'Afz.': 'AFz', 'Af4.': 'AF4', 'Af8.': 'AF8', 'F7..': 'F7',
    'F5..': 'F5', 'F3..': 'F3', 'F1..': 'F1', 'Fz..': 'Fz', 'F2..': 'F2',
    'F4..': 'F4', 'F6..': 'F6', 'F8..': 'F8', 'Ft7.': 'FT7', 'Ft8.': 'FT8',
    'T7..': 'T7', 'T8..': 'T8', 'T9..': 'T9', 'T10.': 'T10', 'Tp7.': 'TP7',
    'Tp8.': 'TP8', 'P7..': 'P7', 'P5..': 'P5', 'P3..': 'P3', 'P1..': 'P1',
    'Pz..': 'Pz', 'P2..': 'P2', 'P4..': 'P4', 'P6..': 'P6', 'P8..': 'P8',
    'Po7.': 'PO7', 'Po3.': 'PO3', 'Poz.': 'POz', 'Po4.': 'PO4', 'Po8.': 'PO8',
    'O1..': 'O1', 'Oz..': 'Oz', 'O2..': 'O2', 'Iz..': 'Iz'
}

raws = list()
icas = list()

for subj in range(4):
    # EEGBCI subjects are 1-indexed; run 3 is a left/right hand movement task
    fname = mne.datasets.eegbci.load_data(subj + 1, runs=[3])[0]
    raw = mne.io.read_raw_edf(fname)
    # remove trailing `.` from channel names so we can set montage
    raw.rename_channels(mapping)
    raw.set_montage('standard_1005')
    # fit ICA
    ica = ICA(n_components=30, random_state=97)
    ica.fit(raw)
    raws.append(raw)
    icas.append(ica)

Out:

Extracting EDF parameters from /home/circleci/mne_data/MNE-eegbci-data/files/eegmmidb/1.0.0/S001/S001R03.edf...
EDF file detected
Setting channel info structure...
Creating raw.info structure...
Fitting ICA to data using 64 channels (please be patient, this may take a while)
Selecting by number: 30 components
Fitting ICA took 5.7s.
Extracting EDF parameters from /home/circleci/mne_data/MNE-eegbci-data/files/eegmmidb/1.0.0/S002/S002R03.edf...
EDF file detected
Setting channel info structure...
Creating raw.info structure...
Fitting ICA to data using 64 channels (please be patient, this may take a while)
Selecting by number: 30 components
Fitting ICA took 3.4s.
Extracting EDF parameters from /home/circleci/mne_data/MNE-eegbci-data/files/eegmmidb/1.0.0/S003/S003R03.edf...
EDF file detected
Setting channel info structure...
Creating raw.info structure...
Fitting ICA to data using 64 channels (please be patient, this may take a while)
Selecting by number: 30 components
Fitting ICA took 9.0s.
Extracting EDF parameters from /home/circleci/mne_data/MNE-eegbci-data/files/eegmmidb/1.0.0/S004/S004R03.edf...
EDF file detected
Setting channel info structure...
Creating raw.info structure...
Fitting ICA to data using 64 channels (please be patient, this may take a while)
Selecting by number: 30 components
Fitting ICA took 2.5s.

Now let’s run corrmap:

# use the first subject as template; use Fpz as proxy for EOG
raw = raws[0]
ica = icas[0]
eog_inds, eog_scores = ica.find_bads_eog(raw, ch_name='Fpz')
corrmap(icas, template=(0, eog_inds[0]))

Out:

Using channel Fpz as EOG channel
... filtering ICA sources
Setting up band-pass filter from 1 - 10 Hz

FIR filter parameters
---------------------
Designing a two-pass forward and reverse, zero-phase, non-causal bandpass filter:
- Windowed frequency-domain design (firwin2) method
- Hann window
- Lower passband edge: 1.00
- Lower transition bandwidth: 0.50 Hz (-12 dB cutoff frequency: 0.75 Hz)
- Upper passband edge: 10.00 Hz
- Upper transition bandwidth: 0.50 Hz (-12 dB cutoff frequency: 10.25 Hz)
- Filter length: 1600 samples (10.000 sec)

... filtering target
Setting up band-pass filter from 1 - 10 Hz

FIR filter parameters
---------------------
Designing a two-pass forward and reverse, zero-phase, non-causal bandpass filter:
- Windowed frequency-domain design (firwin2) method
- Hann window
- Lower passband edge: 1.00
- Lower transition bandwidth: 0.50 Hz (-12 dB cutoff frequency: 0.75 Hz)
- Upper passband edge: 10.00 Hz
- Upper transition bandwidth: 0.50 Hz (-12 dB cutoff frequency: 10.25 Hz)
- Filter length: 1600 samples (10.000 sec)

Median correlation with constructed map: 0.983
Displaying selected ICs per subject.
No maps selected for subject [1], consider a more liberal threshold.

The first figure shows the template map, while the second figure shows all the maps that were considered a “match” for the template (including the template itself). There were only three matches from the four subjects; notice the output message No maps selected for subject(s) 1, consider a more liberal threshold. By default the threshold is set automatically by trying several values; here it may have chosen a threshold that is too high. Let’s take a look at the ICA sources for each subject:

for index, (ica, raw) in enumerate(zip(icas, raws)):
    fig = ica.plot_sources(raw, show_scrollbars=False)
    fig.subplots_adjust(top=0.9)  # make space for title
    fig.suptitle('Subject {}'.format(index))

Out:

Creating RawArray with float64 data, n_channels=30, n_times=20000
    Range : 0 ... 19999 =      0.000 ...   124.994 secs
Ready.
Creating RawArray with float64 data, n_channels=30, n_times=19680
    Range : 0 ... 19679 =      0.000 ...   122.994 secs
Ready.
Creating RawArray with float64 data, n_channels=30, n_times=20000
    Range : 0 ... 19999 =      0.000 ...   124.994 secs
Ready.
Creating RawArray with float64 data, n_channels=30, n_times=19680
    Range : 0 ... 19679 =      0.000 ...   122.994 secs
Ready.

Notice that subject 1 does seem to have an IC that looks like it reflects blink artifacts (component ICA000). Notice also that subject 3 appears to have two components that are reflecting ocular artifacts (ICA000 and ICA002), but only one was caught by corrmap. Let’s try setting the threshold manually:

corrmap(icas, template=(0, eog_inds[0]), threshold=0.9)

Out:

Median correlation with constructed map: 0.951
Displaying selected ICs per subject.
At least 1 IC detected for each subject.

Now we get the message At least 1 IC detected for each subject (which is good). At this point we’ll re-run corrmap with parameters label='blink', plot=False to label the ICs from each subject that capture the blink artifacts (without plotting them again).

corrmap(icas, template=(0, eog_inds[0]), threshold=0.9, label='blink',
        plot=False)
print([ica.labels_ for ica in icas])

Out:

Median correlation with constructed map: 0.951
At least 1 IC detected for each subject.
[{'eog/0/Fpz': [0], 'eog': [0], 'blink': [0]}, {'blink': [0]}, {'blink': [0]}, {'blink': [0, 2]}]

Notice that the first subject has 3 different labels for the IC at index 0: “eog/0/Fpz”, “eog”, and “blink”. The first two were added by find_bads_eog; the “blink” label was added by the last call to corrmap. Notice also that each subject has at least one IC index labelled “blink”, and subject 3 has two components (0 and 2) labelled “blink” (consistent with the plot of IC sources above). The labels_ attribute of ICA objects can also be manually edited to annotate the ICs with custom labels. They also come in handy when plotting:

icas[3].plot_components(picks=icas[3].labels_['blink'])
icas[3].exclude = icas[3].labels_['blink']
icas[3].plot_sources(raws[3], show_scrollbars=False)

Out:

Creating RawArray with float64 data, n_channels=30, n_times=19680
    Range : 0 ... 19679 =      0.000 ...   122.994 secs
Ready.

As a final note, it is possible to extract ICs numerically using the get_components method of ICA objects. This will return a NumPy array that can be passed to corrmap instead of the tuple of (subject_index, component_index) we passed before, and will yield the same result:

template_eog_component = icas[0].get_components()[:, eog_inds[0]]
corrmap(icas, template=template_eog_component, threshold=0.9)
print(template_eog_component)

Out:

Median correlation with constructed map: 0.951
Displaying selected ICs per subject.
At least 1 IC detected for each subject.
[-0.44495318 -0.40983842 -0.38930046 -0.37648581 -0.4081585  -0.43522276
 -0.47475364 -0.29240548 -0.28406613 -0.26882023 -0.25526935 -0.28871567
 -0.30326613 -0.30316262 -0.21016141 -0.22111437 -0.21807985 -0.207049
 -0.22375209 -0.24014921 -0.25321907 -1.68027772 -1.4824676  -1.62847163
 -1.40547681 -1.36613268 -0.85900872 -1.07936549 -1.52916767 -0.69233214
 -0.77286864 -0.65947637 -0.61444207 -0.59479909 -0.64739204 -0.65451653
 -0.76721442 -0.78455491 -0.38975992 -0.43540168 -0.24198818 -0.29395946
 -0.12909375 -0.16455719 -0.17778243 -0.18440583 -0.13619307 -0.16361049
 -0.17566436 -0.16126002 -0.16756067 -0.16994137 -0.18353601 -0.17522907
 -0.14012023 -0.09664031 -0.12390408 -0.12631576 -0.1284134  -0.09195589
 -0.09006861 -0.09628503 -0.06248746 -0.0248017 ]

An advantage of using this numerical representation of an IC to capture a particular artifact pattern is that it can be saved and used as a template for future template-matching tasks using corrmap without having to load or recompute the ICA solution that yielded the template originally. Put another way, when the template is a NumPy array, the ICA object containing the template does not need to be in the list of ICAs provided to corrmap.

References ¶

1: Ablin P, Cardoso J, Gramfort A (2018). Faster Independent Component Analysis by Preconditioning With Hessian Approximations. IEEE Transactions on Signal Processing 66:4040–4049. https://doi.org/10.1109/TSP.2018.2844203
2: Winkler I, Debener S, Müller K-R, Tangermann M (2015). On the influence of high-pass filtering on ICA-based artifact reduction in EEG-ERP. Proceedings of EMBC-2015, 4101–4105. https://doi.org/10.1109/EMBC.2015.7319296
3: Dammers J, Schiek M, Boers F, Silex C, Zvyagintsev M, Pietrzyk U, Mathiak K (2008). Integration of amplitude and phase statistics for complete artifact removal in independent components of neuromagnetic recordings. IEEE Transactions on Biomedical Engineering 55(10):2353–2362. https://doi.org/10.1109/TBME.2008.926677
4: Viola FC, Thorne J, Edmonds B, Schneider T, Eichele T, Debener S (2009). Semi-automatic identification of independent components representing EEG artifact. Clinical Neurophysiology 120(5):868–877. https://doi.org/10.1016/j.clinph.2009.01.015
5: Schalk G, McFarland DJ, Hinterberger T, Birbaumer N, Wolpaw JR (2004). BCI2000: A General-Purpose Brain-Computer Interface (BCI) System. IEEE Transactions on Biomedical Engineering 51(6):1034-1043. https://doi.org/10.1109/TBME.2004.827072
6: Goldberger AL, Amaral LAN, Glass L, Hausdorff JM, Ivanov PCh, Mark RG, Mietus JE, Moody GB, Peng C-K, Stanley HE (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a New Research Resource for Complex Physiologic Signals. Circulation 101(23):e215-e220. https://doi.org/10.1161/01.CIR.101.23.e215

Total running time of the script: ( 1 minutes 31.111 seconds)

Estimated memory usage: 834 MB

Gallery generated by Sphinx-Gallery