Five-run verification

Benchmark claims rerun, compared, and stress-checked for repeatability.

This page is the reproducibility layer for the current SCBE benchmark claims. Each benchmark was rerun five times under the same code and corpus, then compared against the previously saved artifact. The goal here is not hype. The goal is to separate repeatable findings from one-off excitement.

Repeats: 5

Control: ============================== 4 passed in 0.74s ==============================

Generated: 2026-03-24T06:45:36.537377+00:00

Repo map

Research hub

Hyperbolic helix

1.7618

Mean helix separation across five reruns. This exactly reproduced the saved baseline and remained 56.7% above the flat baseline on the same benchmark.

Null-space ablation

85.7% → 100.0%

Null space closes the missed attacks in the ablation, but it also drives held-out false positives from 0.0% to 100.0%. That makes it a secondary feature, not a universal gate.

Unified triangulation

75.8%

Mean attack detection rate over five reruns. The script was stable, but this unified stack still underperformed the simpler high-precision gate.

Scientific verdict

5 / 5

All benchmark scripts reproduced exactly across five reruns. The reproducibility is strong. The promotion decision is still selective: helix separation survives, null-space helps only in the uncertain zone, and the simpler gate remains the cleanest detector.

Scientific method

Step	What was done
1	Loaded the prior artifact for each benchmark as the baseline snapshot.
2	Reran each benchmark script five times under the same code and corpus.
3	Captured top-level metrics after every run and computed mean, std, min, and max.
4	Compared repeated values against the baseline artifact rather than a memory of earlier claims.
5	Ran a deterministic adversarial regression lane as a control test.

Repeatability table

Benchmark	Metric	Baseline	Mean	Verdict
semantic_vs_stub	semantic_detection_rate	0.6703	0.6703	Exact reproduction
semantic_vs_stub	stub_detection_rate	0.8022	0.8022	Exact reproduction
hyperbolic_helix	helix_separation	1.7618	1.7618	Exact reproduction
hyperbolic_helix	flat_recall	0.5292	0.5292	Exact reproduction
unified_triangulation	detection_rate	0.7582	0.7582	Exact reproduction
unified_triangulation	false_positive_rate	0.1333	0.1333	Exact reproduction
null_space_ablation	e4_detection_rate	0.8571	0.8571	Exact reproduction
null_space_ablation	null_detection_rate	1.0	1.0	Exact reproduction
null_space_ablation	null_holdout_fp_rate	1.0	1.0	Exact reproduction

The most important result here is reproducibility. The measurements repeated exactly across five reruns. The strongest geometric claim that survived this check is the helix separation advantage. The strongest practical warning that survived this check is that null-space should be routed only into the uncertain zone, because the universal-gate version buys perfect attack catch at the cost of perfect held-out false positives.