Stats#
Statistical comparison functions for BiotunerGroup analyses.
Module type: Functions
These functions accept BiotunerGroup objects
(or their .summary() DataFrames) and perform group-level statistical tests
on harmonicity metrics.
Typical usage#
>>> from biotuner.biotuner_group import BiotunerGroup
>>> from biotuner.stats import compare_all_metrics, plot_stats_comparison
>>>
>>> bt1 = BiotunerGroup(data_rest, sf=1000).compute_peaks().compute_metrics()
>>> bt2 = BiotunerGroup(data_task, sf=1000).compute_peaks().compute_metrics()
>>>
>>> pvals, tstats, direction = compare_all_metrics(bt1, bt2, data_labels=['rest', 'task'])
>>> plot_stats_comparison(pvals, tstats, direction, data_labels=['rest', 'task'])
- ttest_groups(group1, group2, metrics: List[str] | None = None, alternative: str = 'two-sided') DataFrame[source]#
Independent t-tests comparing all metrics between two groups.
- Parameters:
group1, group2 (BiotunerGroup or pd.DataFrame) – Groups to compare. DataFrames should have metrics as columns.
metrics (list of str, optional) – Metrics to include. If
None, uses all numeric columns present in both groups.alternative (str, default=’two-sided’) – Hypothesis direction:
'two-sided','less', or'greater'.
- Returns:
results (pd.DataFrame) – Indexed by metric name with columns:
t_stat– t-statisticp_value– two-sided (or directed) p-valuemean_group1– mean of group 1mean_group2– mean of group 2higher_group– 1 if group1 mean ≥ group2 mean, else 2
Examples
>>> results = ttest_groups(bt_rest, bt_task) >>> significant = results[results['p_value'] < 0.05]
- ancova_groups(group1, group2, metric: str, covariate: str = 'peak_freq_mean', data_labels: List[str] | None = None) DataFrame[source]#
ANCOVA comparing two groups on a metric, controlling for peak frequency.
Requires the
pingouinpackage (pip install pingouin).- Parameters:
group1, group2 (BiotunerGroup or pd.DataFrame) – Groups to compare.
metric (str) – Dependent variable (outcome metric).
covariate (str, default=’peak_freq_mean’) – Covariate column (typically average peak frequency). Must exist in both group summaries.
data_labels (list of str, optional) – Names for the two groups. Defaults to
['group1', 'group2'].
- Returns:
ancova_result (pd.DataFrame) – Output of
pingouin.ancovawith F-statistic and p-value.
- compare_all_metrics(group1, group2, method: str = 'ttest', metrics: List[str] | None = None, data_labels: List[str] | None = None) Tuple[DataFrame, DataFrame, DataFrame][source]#
Compare all available metrics between two groups.
Runs a statistical test for every numeric metric column that appears in both group summaries.
- Parameters:
group1, group2 (BiotunerGroup or pd.DataFrame) – Groups to compare.
method (str, default=’ttest’) – Statistical test:
'ttest'– independent samples t-test (no extra dependencies).'ancova'– ANCOVA withpeak_freq_meanas covariate. Requirespingouin; automatically skips the covariate column itself.
metrics (list of str, optional) – Subset of metrics to test. If
None, tests all numeric columns present in both summaries.data_labels (list of str, optional) – Names for the two groups. Defaults to
['group1', 'group2'].
- Returns:
p_values (pd.DataFrame) – Column
p_value, indexed by metric.statistics (pd.DataFrame) – Column
statistic(t or F), indexed by metric.direction (pd.DataFrame) – Column
direction: 1 if group1 mean ≥ group2, 2 otherwise, 0 if indeterminate (NaN values or covariate skip).
Examples
>>> pvals, tstats, direction = compare_all_metrics( ... bt_rest, bt_task, method='ttest', data_labels=['rest', 'task'] ... ) >>> plot_stats_comparison(pvals, tstats, direction, data_labels=['rest', 'task'])
- correlate_metrics_peaks(bt_group, metrics: List[str] | None = None) Tuple[DataFrame, DataFrame][source]#
Correlate harmonicity metrics with peak frequency within a group.
Useful for assessing whether observed differences in a metric are confounded by differences in peak frequency.
- Parameters:
bt_group (BiotunerGroup or pd.DataFrame) – Group with computed peaks and metrics.
metrics (list of str, optional) – Columns to include. If
None, uses all numeric columns.
- Returns:
corr_df (pd.DataFrame) – Absolute Pearson correlation with peak frequency, column
correlation.pval_df (pd.DataFrame) – Corresponding p-values, column
p_value.
- Raises:
ValueError – If no peak-frequency column is found (
peak_freq_mean,peaks, orpeak_freq).
- plot_stats_comparison(p_values: DataFrame, statistics: DataFrame | None = None, direction: DataFrame | None = None, data_labels: List[str] | None = None, method_name: str = '', figsize: Tuple[int, int] = (14, 7), save_path: str | None = None, show: bool = True) Figure[source]#
Plot statistical comparison results as a line chart with significance markers.
Each metric is shown on the x-axis; the y-axis shows the p-value. A dashed red line marks p = 0.05. Significant metrics are annotated with triangular markers indicating which group had the higher mean.
- Parameters:
p_values (pd.DataFrame) – p-values indexed by metric name (output of
compare_all_metrics()orttest_groups()). Columnp_valueor first column is used.statistics (pd.DataFrame, optional) – Test statistics (t or F) indexed by metric. Currently unused in the plot but kept for API consistency.
direction (pd.DataFrame, optional) – Direction DataFrame from
compare_all_metrics()(columndirection: 1=group1 higher, 2=group2 higher).data_labels (list of str, optional) – Group names for the legend. Defaults to
['Group 1', 'Group 2'].method_name (str, default=’’) – Method name appended to the title.
figsize (tuple, default=(14, 7)) – Figure size.
save_path (str, optional) – If provided, save figure to this path (300 dpi).
show (bool, default=True) – Call
plt.show()at the end.
- Returns:
fig (matplotlib.Figure)