Five-run verification

Benchmark claims rerun, compared, and stress-checked for repeatability.

This page is the reproducibility layer for the current SCBE benchmark claims. Each benchmark was rerun five times under the same code and corpus, then compared against the previously saved artifact. The goal here is not hype. The goal is to separate repeatable findings from one-off excitement.

Repeats: 5
Control: ============================== 4 passed in 0.74s ==============================
Generated: 2026-03-24T06:45:36.537377+00:00

Hyperbolic helix

1.7618

Mean helix separation across five reruns. This exactly reproduced the saved baseline and remained 56.7% above the flat baseline on the same benchmark.

Null-space ablation

85.7% → 100.0%

Null space closes the missed attacks in the ablation, but it also drives held-out false positives from 0.0% to 100.0%. That makes it a secondary feature, not a universal gate.

Unified triangulation

75.8%

Mean attack detection rate over five reruns. The script was stable, but this unified stack still underperformed the simpler high-precision gate.

Scientific verdict

5 / 5

All benchmark scripts reproduced exactly across five reruns. The reproducibility is strong. The promotion decision is still selective: helix separation survives, null-space helps only in the uncertain zone, and the simpler gate remains the cleanest detector.

Scientific method
StepWhat was done
1Loaded the prior artifact for each benchmark as the baseline snapshot.
2Reran each benchmark script five times under the same code and corpus.
3Captured top-level metrics after every run and computed mean, std, min, and max.
4Compared repeated values against the baseline artifact rather than a memory of earlier claims.
5Ran a deterministic adversarial regression lane as a control test.
Repeatability table
Benchmark Metric Baseline Mean Std Verdict
semantic_vs_stub semantic_detection_rate 0.6703 0.6703 0.0 Exact reproduction
semantic_vs_stub stub_detection_rate 0.8022 0.8022 0.0 Exact reproduction
hyperbolic_helix helix_separation 1.7618 1.7618 0.0 Exact reproduction
hyperbolic_helix flat_recall 0.5292 0.5292 0.0 Exact reproduction
unified_triangulation detection_rate 0.7582 0.7582 0.0 Exact reproduction
unified_triangulation false_positive_rate 0.1333 0.1333 0.0 Exact reproduction
null_space_ablation e4_detection_rate 0.8571 0.8571 0.0 Exact reproduction
null_space_ablation null_detection_rate 1.0 1.0 0.0 Exact reproduction
null_space_ablation null_holdout_fp_rate 1.0 1.0 0.0 Exact reproduction
The most important result here is reproducibility. The measurements repeated exactly across five reruns. The strongest geometric claim that survived this check is the helix separation advantage. The strongest practical warning that survived this check is that null-space should be routed only into the uncertain zone, because the universal-gate version buys perfect attack catch at the cost of perfect held-out false positives.