Stats#

Statistical comparison functions for BiotunerGroup analyses.

Module type: Functions

These functions accept BiotunerGroup objects (or their .summary() DataFrames) and perform group-level statistical tests on harmonicity metrics.

Typical usage#

>>> from biotuner.biotuner_group import BiotunerGroup
>>> from biotuner.stats import compare_all_metrics, plot_stats_comparison
>>>
>>> bt1 = BiotunerGroup(data_rest, sf=1000).compute_peaks().compute_metrics()
>>> bt2 = BiotunerGroup(data_task, sf=1000).compute_peaks().compute_metrics()
>>>
>>> pvals, tstats, direction = compare_all_metrics(bt1, bt2, data_labels=['rest', 'task'])
>>> plot_stats_comparison(pvals, tstats, direction, data_labels=['rest', 'task'])

ttest_groups(group1, group2, metrics: List[str] | None = None, alternative: str = 'two-sided') → DataFrame[source]#

Independent t-tests comparing all metrics between two groups.

Parameters:

group1, group2 (BiotunerGroup or pd.DataFrame) – Groups to compare. DataFrames should have metrics as columns.
metrics (list of str, optional) – Metrics to include. If None, uses all numeric columns present in both groups.
alternative (str, default=’two-sided’) – Hypothesis direction: 'two-sided', 'less', or 'greater'.

Returns:

results (pd.DataFrame) – Indexed by metric name with columns:

t_stat – t-statistic
p_value – two-sided (or directed) p-value
mean_group1 – mean of group 1
mean_group2 – mean of group 2
higher_group– 1 if group1 mean ≥ group2 mean, else 2

Examples

>>> results = ttest_groups(bt_rest, bt_task)
>>> significant = results[results['p_value'] < 0.05]

ancova_groups(group1, group2, metric: str, covariate: str = 'peak_freq_mean', data_labels: List[str] | None = None) → DataFrame[source]#

ANCOVA comparing two groups on a metric, controlling for peak frequency.

Requires the pingouin package (pip install pingouin).

Parameters:

group1, group2 (BiotunerGroup or pd.DataFrame) – Groups to compare.
metric (str) – Dependent variable (outcome metric).
covariate (str, default=’peak_freq_mean’) – Covariate column (typically average peak frequency). Must exist in both group summaries.
data_labels (list of str, optional) – Names for the two groups. Defaults to ['group1', 'group2'].

Returns:

ancova_result (pd.DataFrame) – Output of pingouin.ancova with F-statistic and p-value.

compare_all_metrics(group1, group2, method: str = 'ttest', metrics: List[str] | None = None, data_labels: List[str] | None = None) → Tuple[DataFrame, DataFrame, DataFrame][source]#

Compare all available metrics between two groups.

Runs a statistical test for every numeric metric column that appears in both group summaries.

Parameters:

group1, group2 (BiotunerGroup or pd.DataFrame) – Groups to compare.
method (str, default=’ttest’) – Statistical test:
- 'ttest' – independent samples t-test (no extra dependencies).
- 'ancova' – ANCOVA with peak_freq_mean as covariate. Requires pingouin; automatically skips the covariate column itself.
metrics (list of str, optional) – Subset of metrics to test. If None, tests all numeric columns present in both summaries.
data_labels (list of str, optional) – Names for the two groups. Defaults to ['group1', 'group2'].

Returns:

p_values (pd.DataFrame) – Column p_value, indexed by metric.
statistics (pd.DataFrame) – Column statistic (t or F), indexed by metric.
direction (pd.DataFrame) – Column direction: 1 if group1 mean ≥ group2, 2 otherwise, 0 if indeterminate (NaN values or covariate skip).

Examples

>>> pvals, tstats, direction = compare_all_metrics(
...     bt_rest, bt_task, method='ttest', data_labels=['rest', 'task']
... )
>>> plot_stats_comparison(pvals, tstats, direction, data_labels=['rest', 'task'])

correlate_metrics_peaks(bt_group, metrics: List[str] | None = None) → Tuple[DataFrame, DataFrame][source]#

Correlate harmonicity metrics with peak frequency within a group.

Useful for assessing whether observed differences in a metric are confounded by differences in peak frequency.

Parameters:

bt_group (BiotunerGroup or pd.DataFrame) – Group with computed peaks and metrics.
metrics (list of str, optional) – Columns to include. If None, uses all numeric columns.

Returns:

corr_df (pd.DataFrame) – Absolute Pearson correlation with peak frequency, column correlation.
pval_df (pd.DataFrame) – Corresponding p-values, column p_value.

Raises:

ValueError – If no peak-frequency column is found (peak_freq_mean, peaks, or peak_freq).

plot_stats_comparison(p_values: DataFrame, statistics: DataFrame | None = None, direction: DataFrame | None = None, data_labels: List[str] | None = None, method_name: str = '', figsize: Tuple[int, int] = (14, 7), save_path: str | None = None, show: bool = True) → Figure[source]#

Plot statistical comparison results as a line chart with significance markers.

Each metric is shown on the x-axis; the y-axis shows the p-value. A dashed red line marks p = 0.05. Significant metrics are annotated with triangular markers indicating which group had the higher mean.

Parameters:

p_values (pd.DataFrame) – p-values indexed by metric name (output of compare_all_metrics() or ttest_groups()). Column p_value or first column is used.
statistics (pd.DataFrame, optional) – Test statistics (t or F) indexed by metric. Currently unused in the plot but kept for API consistency.
direction (pd.DataFrame, optional) – Direction DataFrame from compare_all_metrics() (column direction: 1=group1 higher, 2=group2 higher).
data_labels (list of str, optional) – Group names for the legend. Defaults to ['Group 1', 'Group 2'].
method_name (str, default=’’) – Method name appended to the title.
figsize (tuple, default=(14, 7)) – Figure size.
save_path (str, optional) – If provided, save figure to this path (300 dpi).
show (bool, default=True) – Call plt.show() at the end.

Returns:

fig (matplotlib.Figure)

Stats

Contents

Stats#

Typical usage#