Peaks Extraction

Peaks Extraction#

EMD_eeg(data, method='EMD', graph=False, extrema_detection='simple', nIMFs=5)[source]#

The Empirical Mode Decomposition is a data-adaptive multiresolution technique to decompose a signal into physically meaningful components, the Intrinsic Mode Functions (IMFs). It works like a dyadic filter bank. Hence, a log2 structure characterizes the relation between successive IMFs.

Parameters:

data (array (numDataPoints,)) – Single time series.
method (str, default=’EMD’) – Type of Empirical Mode Decomposition. The available types are:
- ‘EMD’
- ‘EEMD’
- ‘CEEMDAN’
graph (bool, default=False) – Defines if graph is created.
extrema_detection (str, default=’simple’) –
- ‘simple’
- ‘parabol’
nIMFs (int, default=5) – Number of IMFs to plot when ‘graph’ = True.

Returns:

eIMFs (array (nIMFS, numDataPoints)) – Returns an array of n Intrinsic Mode Functions by the initial number of data points.

Examples

>>> data = np.load('data_examples/EEG_pareidolia/parei_data_1000ts.npy')[0]
>>> IMFs = EMD_eeg(data, method="EMD", graph=False, extrema_detection="simple", nIMFs=5)
>>> print('DATA', data.shape, 'IMFs', IMFs.shape)
DATA (9501,) IMFs (11, 9501)

References

1. Huang, Norden E., et al. “The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis.” Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences 454.1971 (1998): 903-995.

SSA_EEG(data, n_components=3, graph=False, window_size=20)[source]#

Applies Singular Spectrum Analysis (SSA) to the input signal to extract its main frequency components.

Parameters:

data (numpy.ndarray) – The input signal as a 1D numpy array.
n_components (int, default=3) – The number of components to extract from the signal.
graph (bool, default=False) – Whether to plot the original signal and the extracted components.
window_size (int, default=20) – The size of the sliding window used in SSA.

Returns:

numpy.ndarray – The extracted components as a 2D numpy array, where each row corresponds to a component.

Examples

>>> data = np.load('data_examples/EEG_pareidolia/parei_data_1000ts.npy')[0]
>>> components = SSA_EEG(data[1000:2000], n_components=5, graph=True, window_size=50)
>>> print(components.shape)
(5, 1000)

extract_welch_peaks(data, sf, precision=0.5, min_freq=1, max_freq=None, FREQ_BANDS=None, average='median', noverlap=None, nperseg=None, nfft=None, find_peaks_method='maxima', width=2, rel_height=0.7, prominence=1, out_type='all', extended_returns=True, smooth=1)[source]#

Extract frequency peaks using Welch’s method for periodograms computation.

Parameters:

data (array (numDataPoints,)) – Single time series.
sf (int) – Sampling frequency.
precision (float, default=0.5) – Size of a frequency bin in Hertz.
min_freq (float, default=1.0) – Minimum frequency to consider when out_type=’all’.
max_freq (float) – Maximum frequency to consider when out_type=’all’.
FREQ_BANDS (List of lists) – Each sublist contains the minimum and maximum values for each frequency band.
average (str, default=’median’) – Method for averaging periodograms.
- ‘mean’
- ‘median’
nperseg (int, default=None) – Length of each segment. If None, nperseg = nfft/smooth
nfft (int, default=None) – Length of the FFT used, if a zero padded FFT is desired. If None, nfft = sf/(1/precision)
noverlap (int, default=None) – Number of points to overlap between segments. If None, noverlap = nperseg // 2.
find_peaks_method (str, default=’maxima’) –
- ‘maxima’
- ‘wavelet’
width (int, default=2) – Required width of peaks in samples.
rel_height (float, default=0.7) – Chooses the relative height at which the peak width is measured as a percentage of its prominence. 1.0 calculates the width of the peak at its lowest contour line while 0.5 evaluates at half the prominence height.
prominence (float, default=1) – Required prominence of peaks.
out_type (str, default=’all’) – Defines how many peaks are outputed. The options are:
- ‘single’
- ‘bands’
- ‘all’
extended_returns (bool, default=True) – Defines if psd and frequency bins values are output along the peaks and amplitudes.
smooth (int, default=1) – Number used to divide nfft to derive nperseg. Higher values will provide smoother periodograms.

Returns:

peak (List(float)) – Frequency value.
amp (List(float)) – Amplitude value.
freqs (Array) – Frequency bins
psd (Array) – Power spectrum density of each frequency bin

Examples

>>> data = np.load('data_examples/EEG_pareidolia/parei_data_1000ts.npy')[0]
>>> FREQ_BANDS = [[1, 3], [3, 8], [8, 13], [13, 20], [20, 30], [30, 50]]
>>> peaks, amps = extract_welch_peaks(
>>>                                   data,
>>>                                   1200,
>>>                                   FREQ_BANDS=FREQ_BANDS,
>>>                                   precision=0.5,
>>>                                   out_type="bands",
>>>                                   extended_returns=False,
>>>                                   )
>>> peaks
[1.5, 5.5, 9.0, 15.0, 22.5, 32.5]

compute_FOOOF(data, sf, precision=0.5, max_freq=80, noverlap=None, nperseg=None, nfft=None, n_peaks=5, extended_returns=False, graph=False)[source]#

FOOOF model the power spectrum as a combination of two distinct functional processes: - An aperiodic component, reflecting 1/f like characteristics - A variable number of periodic components (putative oscillations),

as peaks rising above the aperiodic component.

Parameters:

data (array (numDataPoints,)) – Single time series.
sf (int) – Sampling frequency.
precision (float, default=0.5) – Size of a frequency bin in Hertz before sending to FOOOF.
max_freq (float, default=80) – Maximum frequency to consider as a peak.
noverlap (int, default=None) – Number of points to overlap between segments. If None, noverlap = nperseg // 2.
nperseg (int) – Length of each segment.
nfft (int, default=None) – Length of the FFT used, if a zero padded FFT is desired. If None, the FFT length is nperseg.
n_peaks (int, default=5) – Maximum number of peaks. If FOOOF finds higher number of peaks, the peaks with highest amplitude will be retained.
extended_returns (bool, default=False) – Defines if psd and frequency bins values are output along the peaks and amplitudes.
graph (bool, default=False) – Defines if a graph is generated.

Returns:

peaks (List(float)) – Frequency values.
amps (List(float)) – Amplitude values.
freqs (Array) – Frequency bins
psd (Array) – Power spectrum density of each frequency bin

Examples

>>> data = np.load('data_examples/EEG_pareidolia/parei_data_1000ts.npy')[0]
>>> peaks, amps = compute_FOOOF(data, sf=1200, max_freq=50, n_peaks=3)
>>> peaks
[7.24, 14.71, 5.19]

References

Donoghue T, Haller M, Peterson EJ, Varma P, Sebastian P, Gao R, Noto T, Lara AH, Wallis JD, Knight RT, Shestyuk A, Voytek B (2020). Parameterizing neural power spectra into periodic and aperiodic components. Nature Neuroscience, 23, 1655-1665. DOI: 10.1038/s41593-020-00744-x

HilbertHuang1D(data, sf, graph=False, nIMFs=5, min_freq=1, max_freq=80, precision=0.1, bin_spread='log', smooth_sigma=None, keep_first_IMF=False)[source]#

The Hilbert-Huang transform provides a description of how the energy or power within a signal is distributed across frequency. The distributions are based on the instantaneous frequency and amplitude of a signal.

Parameters:

data (array (numDataPoints,)) – Single time series.
sf (int) – Sampling frequency.
graph (bool, default=False) – Defines if a graph is generated.
nIMFs (int, default=5) – Number of intrinsic mode functions (IMFs) to keep when Empirical Mode Decomposition (EMD) is computed.
min_freq (float, default=1) – Minimum frequency to consider.
max_freq (float, default=80) – Maximum frequency to consider.
precision (float, default=0.1) – Value in Hertz corresponding to the minimal step between two frequency bins.
bin_spread (str, default=’log’) –
- ‘linear’
- ‘log’
smooth_sigma (float, default=None) – Sigma value for gaussian smoothing.

Returns:

IF (array (numDataPoints,nIMFs)) – instantaneous frequencies associated with each IMF.
peaks (List(float)) – Frequency values.
amps (List(float)) – Amplitude values.
spec (array (nIMFs, nbins)) – Power associated with all bins for each IMF
bins (array (nIMFs, nbins)) – Frequency bins for each IMF

Examples

>>> data = np.load('data_examples/EEG_pareidolia/parei_data_1000ts.npy')[0]
>>> _, peaks, amps, _, _ = HilbertHuang1D(data, sf=1200, nIMFs=5)
>>> peaks
[2.24, 8.08, 11.97, 19.61, 64.06]

cepstrum(signal, sf, plot_cepstrum=False, min_freq=1.5, max_freq=80)[source]#

The cepstrum is the result of computing the inverse Fourier transform (IFT) of the logarithm of the estimated signal spectrum. The method is a tool for investigating periodic structures in frequency spectra.

Parameters:

signal (array (numDataPoints,)) – Single time series.
sf (int) – Sampling frequency.
plot_cepstrum (bool, default=False) – Determines wether a plot is generated.
min_freq (float, default=1.5) – Minimum frequency to consider.
max_freq (float, default=80) – Maximum frequency to consider.

Returns:

cepstrum (array (nbins,)) – Power of the cepstrum for each quefrency.
quefrency_vector (array(nbins,)) – Values of each quefrency bins.

cepstral_peaks(cepstrum, quefrency_vector, max_time, min_time)[source]#

This function extract cepstral peaks based on the :func:’biotuner.peaks_extraction.cepstrum’ function.

Parameters:

cepstrum (array) – Values of cepstrum power across all quefrency bins.
quefrency_vector (array) – Values of all the quefrency bins.
max_time (float) – Maximum value of the quefrency to keep in seconds.
min_time (float) – Minimum value of the quefrency to keep in seconds.

Returns:

peaks (List(float)) – Quefrency values.
amps (List(float)) – Amplitude values.

Examples

>>> data = np.load('data_examples/EEG_pareidolia/parei_data_1000ts.npy')[0]
>>> cepst, quef = cepstrum(data, 1000, plot_cepstrum=False, min_freq=1, max_freq=50)
>>> peaks, amps = cepstral_peaks(cepst, quef, 1, 0.01)
>>> print('PEAKS', [np.round(x) for x in peaks[0:3]])
>>> print('AMPS', [np.round(x) for x in amps[0:3]])
PEAKS [24.0, 16.0, 11.0]
AMPS [244.0, 151.0, 101.0]

pac_frequencies(ts, sf, method='duprelatour', n_values=10, drive_precision=0.05, max_drive_freq=6, min_drive_freq=3, sig_precision=1, max_sig_freq=50, min_sig_freq=8, low_fq_width=0.5, high_fq_width=1, plot=False)[source]#

A function to compute the comodulogram for phase-amplitude coupling: and extract the pairs of peaks with maximum coupling value.

Parameters:

ts (array (numDataPoints,)) – Single time series.
sf (int) – Sampling frequency.
method (str) –
- ‘ozkurt’
- ‘canolty’
- ‘tort’
- ‘penny’
- ‘vanwijk’
- ‘duprelatour’
- ‘colgin’
- ‘sigl’
- ‘bispectrum’
n_values (int, default=10) – Number of pairs of drive and modulated frequencies to keep.
drive_precision (float, default=0.05) – Value (hertz) of one frequency bin of the phase signal.
max_drive_freq (float, default=6) – Minimum value (hertz) of the phase signal.
min_drive_freq (float, default=3) – Maximum value (hertz) of the phase signal.
sig_precision (float, default=1) – Value (hertz) of one frequency bin of the amplitude signal.
max_sig_freq (float, default=50) – Maximum value (hertz) of the amplitude signal.
min_sig_freq (float, default=8) – Minimum value (hertz) of the amplitude signal.
low_fq_width (float, default=0.5) – Bandwidth of the band-pass filter (phase signal).
high_fq_width (float, default=1) – Bandwidth of the band-pass filter (amplitude signal).
plot (bool, default=False) – Determines if a plot of the comodulogram is created.

Returns:

pac_freqs (List of lists) – Each sublist correspond to pairs of frequencies for the phase and amplitude signals with maximal coupling value.
pac_coupling (List) – Coupling values associated with each pairs of phase and amplitude frequencies.

Examples

>>> data = np.load('data_examples/EEG_pareidolia/parei_data_1000ts.npy')[0]
>>> pac_frequencies(
>>>                 data,
>>>                 1200,
>>>                 method="canolty",
>>>                 n_values=5,
>>>                 drive_precision=0.1,
>>>                 max_drive_freq=6,
>>>                 min_drive_freq=3,
>>>                 sig_precision=1,
>>>                 max_sig_freq=50,
>>>                 min_sig_freq=10,
>>>                 low_fq_width=0.5,
>>>                 high_fq_width=1,
>>>                 plot=True,
>>>                 )
([[3.0, 10.0], [3.0, 11.0], [3.8, 15.0], [4.0, 11.0], [3.2, 11.0]],
[3.544482291850382e-08,
3.44758700485373e-08,
4.125714430185903e-08,
3.780184228154704e-08,
3.3232328382531826e-08])

polycoherence(data, *args, dim=2, **kwargs)[source]#

Calculate the polycoherence between frequencies and their sum frequency.

Parameters:

data (array_like) – 1D data array.
fs (float) – Sampling rate.
ofreqs (float) – Fixed frequencies.
dim ({‘sum’, 1, 2, 0}, optional) – Dimension of the polycoherence calculation:
- ‘sum’: 1D polycoherence with fixed frequency sum. The first argument after fs is the frequency sum. Other fixed frequencies possible.
- 1: 1D polycoherence as a function of f1, at least one fixed frequency (ofreq) is expected.
- 2: 2D polycoherence as a function of f1 and f2. ofreqs are additional fixed frequencies.
- 0: Polycoherence for fixed frequencies.
norm ({2, 0, tuple}, optional) – Normalization scheme:
- 2: Return polycoherence, n0 = n1 = n2 = 2 (default).
- 0: Return polyspectrum, <prod(spec(fi)) * conj(spec(sum(fi)))>.
- tuple(n1, n2): General case with n0=2.
synthetic (list, optional) – Used for synthetic signal for some frequencies. List of 3-item tuples (freq, amplitude, phase). The freq must coincide with the first fixed frequencies (ofreq, except for dim=’sum’).
flim1, flim2 (tuple, optional) – Frequency limits for 2D case.
**kwargs – Additional keyword arguments to pass to scipy.signal.spectrogram. Important parameters are nperseg, noverlap, and nfft.

Returns:

polycoherence (ndarray) – The polycoherence or polyspectrum.
freqs (ndarray) – Frequency array.
spectrum (ndarray) – The power spectrum of the signal.

Notes

< > denotes averaging and | | denotes absolute value.

References

FROM: trichter/polycoherence.

polyspectrum_frequencies(data, sf, precision, n_values=10, nperseg=None, noverlap=None, method='bicoherence', flim1=(2, 50), flim2=(2, 50), graph=False)[source]#

Calculate the frequencies and amplitudes of the top n polyspectral components using the bispectrum or bicoherence method.

Parameters:

data (array-like) – The input signal.
sf (float) – The sampling frequency of the input signal.
precision (float) – The desired frequency precision of the output.
n_values (int, default=10) – The number of top polyspectral components to return. Default is 10.
nperseg (int, default=None) – The length of each segment used in the FFT calculation. If not specified, defaults to sf.
noverlap (int, default=None) – The number of samples to overlap between adjacent segments. If not specified, When default is None, noverlap = sf // 10.
method (str, default=’bicoherence’) – The method to use for calculating the polyspectrum. Can be either:
- “bispectrum”
- “bicoherence”
flim1 (tuple of float, default=(2, 50)) – The frequency limits for the first frequency axis.
flim2 (tuple of float, default=(2, 50)) – The frequency limits for the second frequency axis.
graph (bool, default=False) – Whether to plot the polyspectrum using matplotlib.

Returns:

tuple of list of float – A tuple containing the frequencies and amplitudes of the top n polyspectral components, respectively.

Examples

>>> data = np.load('data_examples/EEG_pareidolia/parei_data_1000ts.npy')[0]
>>> polyspectrum_frequencies(data, sf=1200, precision=0.1, n_values=5, method="bicoherence",
>>>                      flim1=(15, 30), flim2=(2, 15), graph=True,
>>>                     )
([[23.25, 4.916666666666666],
[23.333333333333332, 4.916666666666666],
[23.166666666666664, 4.916666666666666],
[23.416666666666664, 4.916666666666666],
[27.25, 4.75]],
[[0.8518411], [0.84810454], [0.8344524], [0.83267957], [0.8235908]])

harmonic_recurrence(peaks, amps, min_freq=1, max_freq=30, min_harms=2, harm_limit=128)[source]#

Identify spectral peaks that have the highest recurrence in the spectrum based on their harmonic series.

Parameters:

peaks (list of floats) – List of all spectral peaks.
amps (list of floats) – List of amplitudes of the spectral peaks.
min_freq (float, default=1) – Minimum frequency to consider.
max_freq (float, default=30) – Maximum frequency to consider.
min_harms (int, default=2) – Minimum number of harmonic recurrence to keep a peak.
harm_limit (int, default=128) – Highest harmonic to consider.

Returns:

tuple of arrays – Returns a tuple of arrays containing:

max_n: Number of harmonic recurrences for each selected peak.
max_peaks: Frequencies of each selected peak.
max_amps: Amplitudes of each selected peak.
harmonics: List of harmonic ratios of each selected peak.
harmonic_peaks: Frequencies of peaks that share harmonic ratios with each selected peak.
harm_peaks_fit: List containing detailed information about each selected peak.

Examples

>>> data = np.load('data_examples/EEG_pareidolia/parei_data_1000ts.npy')[0]
>>> peaks, amps, freqs, psd = extract_welch_peaks(
>>>                             data,
>>>                             1000,
>>>                             precision=1,
>>>                             max_freq=150,
>>>                             extended_returns=True,
>>>                             out_type="all",
>>>                             min_freq=1)
>>> (max_n, peaks_temp,
>>>  amps_temp,harms,
>>>  harm_peaks,
>>>  harm_peaks_fit) = harmonic_recurrence(peaks, amps, min_freq=1,
>>>                                        max_freq=50, min_harms=2,
>>>                                        harm_limit=128)
>>> peaks_temp
array([ 2.,  6., 27., 44.])

endogenous_intermodulations(peaks, amps, order=3, min_IMs=2, max_freq=100)[source]#

Computes the intermodulation components (IMCs) for each pair of peaks and compares the IMCs with peak values. If a pair of peaks has a number of IMCs equal to or greater than the min_IMs parameter, these peaks and the associated IMCs are stored in the IMCs_all dictionary.

Parameters:

peaks (array_like) – An array of peak frequencies.
amps (array_like) – An array of amplitudes corresponding to the peak frequencies.
order (int, default=3) – The maximum order of intermodulation to consider.
min_IMs (int, default=2) – The minimum number of IMCs required for a pair of peaks to be stored in the IMCs_all dictionary.
max_freq (float, default=100) – The maximum frequency to consider when computing IMCs.

Returns:

EIMs (list) – A list of the endogenous intermodulation components (EIMs) for each peak.
IMCs_all (dict) – A dictionary containing information about all the pairs of peaks and their associated IMCs that satisfy the min_IMs threshold. The dictionary has the following keys:
- IMs: a list of lists, where each list contains the IMCs associated with a pair of peaks.
- peaks: a list of lists, where each list contains the frequencies of the two peaks.
- n_IMs: a list of integers, where each integer represents the number of IMCs associated with a pair of peaks.
- amps: a list of lists, where each list contains the amplitudes of the two peaks.
n_IM_peaks (int) – The total number of pairs of peaks and their associated IMCs that satisfy the min_IMs threshold.

Examples

>>> peaks = [5, 9, 13, 21]
>>> amps = [0.6, 0.5, 0.4, 0.3]
>>> EIMs, IMCs_all, n_IM_peaks = endogenous_intermodulations(peaks, amps, order=3, min_IMs=2, max_freq=50)
>>> IMCs_all
{'IMs': [[5, 21]], 'peaks': [[9, 13]], 'n_IMs': [2], 'amps': [[0.5, 0.4]]}

compute_sidebands(carrier, modulator, order=2)[source]#

Computes the frequency values of the sidebands resulting from the interaction of a carrier signal and a modulating signal.

Parameters:

carrier (float) – The frequency value of the carrier signal.
modulator (float) – The frequency value of the modulating signal.
order (int, default=2) – The order of the highest sideband to compute.

Returns:

numpy.ndarray – A sorted 1D array of frequency values for the sidebands.

Examples

>>> compute_sidebands(1000, 100, 3)
array([700., 800., 900., 1100., 1200., 1300.])

>>> compute_sidebands(500, 75, 2)
array([350., 425., 575., 650.])

class UnivariateTransformerMixin[source]#

Bases: object

Mixin class for all univariate transformers in pyts.

fit_transform(X, y=None, **fit_params)[source]#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:

X (array-like, shape = (n_samples, n_timestamps)) – Univariate time series.
y (None or array-like, shape = (n_samples,) (default = None)) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.

Returns:

X_new (array) – Transformed array.

class SingularSpectrumAnalysis(window_size=4, groups=None, lower_frequency_bound=0.075, lower_frequency_contribution=0.85, chunksize=None, n_jobs=1)[source]#

Bases: UnivariateTransformerMixin

Singular Spectrum Analysis.

Parameters:

window_size (int or float (default = 4)) – Size of the sliding window (i.e. the size of each word). If float, it represents the percentage of the size of each time series and must be between 0 and 1. The window size will be computed as max(2, ceil(window_size * n_timestamps)).
groups (None, int, ‘auto’, or array-like (default = None)) – The way the elementary matrices are grouped. If None, no grouping is performed. If an integer, it represents the number of groups and the bounds of the groups are computed as np.linspace(0, window_size, groups + 1).astype('int64'). If ‘auto’, then three groups are determined, containing trend, seasonal, and residual. If array-like, each element must be array-like and contain the indices for each group.
lower_frequency_bound (float (default = 0.075)) – The boundary of the periodogram to characterize trend, seasonal and residual components. It must be between 0 and 0.5. Ignored if groups is not set to ‘auto’.
lower_frequency_contribution (float (default = 0.85)) – The relative threshold to characterize trend, seasonal and residual components by considering the periodogram. It must be between 0 and 1. Ignored if groups is not set to ‘auto’.
chunksize (int or None (default = None)) – If int, the transformation of the whole dataset is performed using chunks (batches) and chunksize corresponds to the maximum size of each chunk (batch). If None, the transformation is performed on the whole dataset at once. Performing the transformation with chunks is likely to be a bit slower but requires less memory.
n_jobs (None or int (default = None)) – The number of jobs to use for the computation. Only used if chunksize is set to an integer.

References

Examples

>>> from pyts.datasets import load_gunpoint
>>> from pyts.decomposition import SingularSpectrumAnalysis
>>> X, _, _, _ = load_gunpoint(return_X_y=True)
>>> transformer = SingularSpectrumAnalysis(window_size=5)
>>> X_new = transformer.transform(X)
>>> X_new.shape
(50, 5, 150)

fit(X=None, y=None)[source]#

transform(X)[source]#

Transform the provided data.

Parameters:: X (array-like, shape = (n_samples, n_timestamps))
Returns:: X_new (array-like, shape = (n_samples, n_splits, n_timestamps)) – Transformed data. n_splits value depends on the value of groups. If groups=None, n_splits is equal to window_size. If groups is an integer, n_splits is equal to groups. If groups='auto', n_splits is equal to three. If groups is array-like, n_splits is equal to the length of groups. If n_splits=1, X_new is squeezed and its shape is (n_samples, n_timestamps).

Peaks Extraction

Contents

Peaks Extraction#