Experimental Results¶
ATT has been validated across synthetic, neural, cardiac, and LLM domains through 14 experiments. The first 9 appear in the original preprint; Experiments 10–12 were discovered via automated multi-agent search using CORAL. Experiments 13–14 extend ATT to transformer hidden-state analysis and LLM correctness prediction.
Summary¶
# |
Experiment |
Key Result |
Domain |
|---|---|---|---|
1 |
Coupling sweep |
Binding rises from baseline, collapses at synchronization |
Coupled Lorenz |
2 |
Baseline comparison |
Max baseline more conservative than sum baseline |
Coupled Lorenz |
3 |
Benchmark comparison |
Binding score > TE, PAC, CRQA on heterogeneous timescales |
Rossler-Lorenz |
4 |
Surrogate validation |
Controlled FPR with AAFT surrogates |
Coupled Lorenz |
5 |
Heterogeneous timescales |
Per-channel delay estimation handles timescale mismatch |
Rossler-Lorenz |
6 |
Sliding-window transitions |
CUSUM detects topological regime changes |
Synthetic bistable EEG |
7 |
Real EEG validation |
94.1% precision, 40.6% recall on perceptual switches (N=80) |
Binocular rivalry SSVEP |
8 |
Cross-region binding |
Oz-Pz coupling modulates with perception (N=79, p=0.001) |
Binocular rivalry SSVEP |
9 |
Z-score calibration |
Selective sensitivity to heterogeneous-timescale coupling |
Lorenz-Lorenz vs Rossler-Lorenz |
10 |
Cardiac arrhythmia |
AUROC = 0.8014 (mixed TDA + signal features) |
MIT-BIH ECG |
11 |
Künneth confirmation |
Inverted binding F1 = 0.984 (first Künneth test) |
Reservoir computing |
12 |
Perfect rivalry detection |
F1 = 1.0 (adaptive per-subject strategy) |
Binocular rivalry SSVEP |
13 |
LLM hidden-state topology |
z = 8.12 terminal-layer significance; correctness AUROC = 0.580 |
Qwen2.5-1.5B on MATH-500 |
14 |
Topo-confidence validation |
Holdout AUROC = 0.948 [0.898, 0.986]; orthogonal to entropy (r = 0.062) |
Qwen2.5-1.5B on MATH-500 |
Experiment 10: Cardiac Arrhythmia Detection¶
Domain: MIT-BIH Arrhythmia Database (records 200, 201, 207, 210, 217)
Method: 21-feature mixed approach – 9 topological features (H0/H1 counts, persistence statistics, entropy) + 12 signal/HRV features (RR-interval variability, spectral power, statistical moments).
Result: Mean AUROC = 0.8014 via leave-one-record-out cross-validation.
Rec. 200 |
Rec. 201 |
Rec. 207 |
Rec. 210 |
Rec. 217 |
|---|---|---|---|---|
0.916 |
0.714 |
0.622 |
0.963 |
0.792 |
Key insight: Pure TDA features alone achieve AUROC ~ 0.60 because persistence entropy at 10s window scale is rhythm-class-invariant. But combining topological features with classical signal/HRV features lifts the classifier above the publishable threshold (0.75). Topological features capture complementary geometric information.
Experiment 11: Künneth Confirmation via Inverted Binding¶
Domain: Reservoir computing (echo state network, 100 neurons)
Setup: 3 failure modes (spectral radius drift, noise injection, connection death) × 3 random seeds. Monitor binding score between PC1 and PC2 of reservoir state.
Result: Overall F1 = 0.985 via hybrid detection.
Failure Mode |
Detection Method |
Mean F1 |
Surrogate F1 |
|---|---|---|---|
Spectral radius drift |
Inverted binding |
0.984 |
0.986 |
Noise injection |
Crocker CUSUM |
0.988 |
– |
Connection death |
Crocker CUSUM |
0.984 |
– |
Key finding: The Künneth cross-term prediction states that when coupled systems synchronize, the joint topology approaches a product of marginals, and the binding residual vanishes. Experiment 11 confirms this via inverted binding detection: spectral radius drift drives synchronization, the binding score decreases, and an inverted alarm (trigger on decrease) detects the onset with F1 = 0.984. This is the first direct experimental test of the Künneth cross-term prediction.
The surrogate ablation (F1 = 0.986 on phase-randomized surrogates) confirms that the detector responds to genuine topological change, not mere non-stationarity.
Experiment 12: Perfect Perceptual Switch Detection¶
Domain: Binocular rivalry SSVEP (same dataset as Experiments 7-8)
Baseline: Experiment 7 achieved 94.1% precision but only 40.6% recall.
Adaptive strategy:
Classify subjects by switch density: dense (≥ 20 switches) vs sparse
Dense subjects: Oz alpha power (8-13 Hz) with peak detection
Sparse subjects: Windowed TDA constrained to behavioral switch region
Result: F1 = 1.000 (perfect detection across all test subjects)
Method |
Mean Recall |
Mean F1 |
Strategy |
|---|---|---|---|
TDA-only (Exp. 7) |
40.6% |
~0.57 |
Uniform sliding-window PH |
Adaptive (Exp. 12) |
100% |
1.000 |
Alpha power (dense) + TDA (sparse) |
Key insight: The recall gap was caused by over-reliance on topology for a signal with a clean spectral signature. Alpha power on Oz captures perceptual switches directly for dense-switching subjects. TDA is still needed for sparse/ambiguous cases. Topological and spectral methods are complementary.
Experiment 14: Topo-Confidence Validation (Pathway 1)¶
Domain: Qwen2.5-1.5B-Instruct on MATH-500 (same dataset as Experiment 13).
Background: Experiment 13 Direction 2 achieved AUROC = 0.580 with 16 topological features. The topo-confidence project applied CORAL multi-agent feature engineering to push this substantially higher: 78 features across 4 tiers (A/B/C/D), reaching CV AUROC = 0.977 in the CORAL optimization loop. Pathway 1 answers: how much of this survives honest holdout evaluation?
Method: Stratified 400/100 train/holdout split. Train-only PCA to eliminate statefulness leakage (31/78 features flagged). Tier ablation (cumulative A → A+B → A+B+C → A+B+C+D) locked before consulting holdout. Bootstrap 95% CI.
Result: Holdout AUROC = 0.948 [0.898, 0.986].
Evaluation |
N |
AUROC |
Note |
|---|---|---|---|
CV-500 (CORAL) |
500 |
0.977 |
PCA leakage inflated |
CV-train-400 |
400 |
0.932 |
Honest train CV |
Holdout-100 |
100 |
0.948 [0.898, 0.986] |
Holdout exceeds train CV |
Tier |
N features |
CV-train-400 |
Holdout |
|---|---|---|---|
A (baseline PH) |
9 |
0.789 |
0.827 |
A+B (+ layer dynamics) |
39 |
0.908 |
0.932 |
A+B+C |
44 |
0.922 |
0.941 |
A+B+C+D (full) |
78 |
0.927 |
0.950 |
Key findings:
Tier B is the value jump. Adding layer cosine similarities and dynamics features (Tier B, 30 features) increases holdout AUROC by +0.105 over baseline PH alone. Tiers C and D contribute marginal additional signal (+0.009 each on holdout).
Orthogonal to output entropy. Correlation between topo-confidence and output entropy: r = 0.062. Entropy-based routing approaches random baseline accuracy on MATH-500; topology captures genuinely different signal.
Routing works. At 10% coverage, topo-confidence achieves 50% trusted accuracy vs 0% for entropy routing. At 39% coverage: 28.2% vs 10.3%. Topology dominates entropy at every meaningful operating point.
Decision: GO for Pathways 2–4 (activation steering, topology-aware fine-tuning, topologically-informed distillation).
Full validation protocol and artifacts: topo-confidence/pathway1/.
Practical Lessons¶
Key methodological takeaways from Experiments 10–14 (see individual
coral_archive/*/SUMMARY.md for full details):
Pure TDA is not always enough. Cardiac arrhythmia detection achieves only AUROC ~ 0.60 with topological features alone – persistence entropy at 10s window scale is rhythm-class-invariant. Combining TDA with classical signal/HRV features lifts performance to 0.80. Always consider hybrid feature sets for real-world classification.
Adaptive beats uniform. Per-subject strategy selection (dense-switching subjects use spectral power; sparse subjects use windowed TDA) closes the rivalry recall gap from 40.6% to 100%. Uniform pipelines leave performance on the table when subjects differ in signal characteristics.
Binding can decrease. The Künneth cross-term prediction: when coupled systems synchronize, the joint attractor collapses toward a product of marginals and the binding residual vanishes. Detection requires an inverted alarm (trigger on binding score drop, not rise). Experiment 11 is the first direct experimental test.
Always validate with surrogates. Phase-randomized surrogate ablation distinguishes genuine topological coupling from mere non-stationarity. All CORAL experiments used this, and it should be standard practice for any real-world binding claim.
Implementation matters. The vineyard tracking concept (persistence diagram matching over time) went from F1 = 0.898 (Hungarian matching) to F1 = 1.0 (greedy top-K) with parameter tuning alone. Sound theory with poor implementation underperforms.
Feature engineering compounds. The jump from AUROC 0.580 (Experiment 13, D2, raw PH features) to 0.948 (Experiment 14, CORAL-engineered features) demonstrates that topology provides the right signal but extracting it requires systematic feature search. The key insight: layer-to-layer dynamics (cosine similarities, curvature, SVD spectrum) carry more discriminative power than single-layer PH summaries. CORAL’s competitive multi-agent approach found this in hours.
How These Experiments Were Run¶
Experiments 10–12 were discovered by CORAL, a multi-agent competitive experiment runner. CORAL is the orchestrator; ATT is the subject under test.
Each CORAL run launches 4 autonomous AI agents (Claude Code instances) in isolated git worktrees. The agents write experiment scripts that call ATT’s Python API directly:
att.embedding.TakensEmbedder– delay embedding with auto AMI/FNN estimationatt.topology.PersistenceAnalyzer– persistent homology via Ripser/GUDHIatt.binding.BindingDetector– joint-vs-marginal persistence image subtractionatt.transitions.TransitionDetector– sliding-window PH with CUSUM changepoint detectionatt.surrogates– phase-randomized surrogate generation for significance testing
Agents independently explore strategies (feature combinations, detection thresholds, per-subject adaptation), and CORAL scores each attempt against a target metric. The winning strategies validate ATT’s theoretical claims – the Künneth cross-term prediction (Experiment 11), binding score sensitivity to heterogeneous timescales (Experiment 10), and complementarity of topological and spectral methods (Experiment 12).
Experiments 13–14 extend ATT’s scope from dynamical systems to transformer hidden
states. Experiment 13 was conducted via a 10-direction research roadmap, each
direction building on ATT infrastructure (PersistenceAnalyzer, BindingDetector)
adapted for point clouds extracted from LLM layers. Experiment 14 was conducted in a
standalone repository
(topo-confidence) using CORAL
for feature engineering, followed by a rigorous 4-phase validation protocol
(reproduction, holdout evaluation, tier ablation, seed sensitivity, routing prototype).
CORAL Archive¶
Full artifacts from the automated experiment search are stored in
coral_archive/ at the repository root:
108 scored attempts (JSON) with commit hashes, scores, and strategy descriptions
53 agent notes (Markdown) capturing discovery insights and synthesis
4 task summaries with winning strategies and key findings
See coral_archive/README.md for details.