This page is the reproducibility layer for the current SCBE benchmark claims. Each benchmark was rerun five times under the same code and corpus, then compared against the previously saved artifact. The goal here is not hype. The goal is to separate repeatable findings from one-off excitement.
Mean helix separation across five reruns. This exactly reproduced the saved baseline and remained 56.7% above the flat baseline on the same benchmark.
Null space closes the missed attacks in the ablation, but it also drives held-out false positives from 0.0% to 100.0%. That makes it a secondary feature, not a universal gate.
Mean attack detection rate over five reruns. The script was stable, but this unified stack still underperformed the simpler high-precision gate.
All benchmark scripts reproduced exactly across five reruns. The reproducibility is strong. The promotion decision is still selective: helix separation survives, null-space helps only in the uncertain zone, and the simpler gate remains the cleanest detector.
| Step | What was done |
|---|---|
| 1 | Loaded the prior artifact for each benchmark as the baseline snapshot. |
| 2 | Reran each benchmark script five times under the same code and corpus. |
| 3 | Captured top-level metrics after every run and computed mean, std, min, and max. |
| 4 | Compared repeated values against the baseline artifact rather than a memory of earlier claims. |
| 5 | Ran a deterministic adversarial regression lane as a control test. |
| Benchmark | Metric | Baseline | Mean | Std | Verdict |
|---|---|---|---|---|---|
| semantic_vs_stub | semantic_detection_rate | 0.6703 | 0.6703 | 0.0 | Exact reproduction |
| semantic_vs_stub | stub_detection_rate | 0.8022 | 0.8022 | 0.0 | Exact reproduction |
| hyperbolic_helix | helix_separation | 1.7618 | 1.7618 | 0.0 | Exact reproduction |
| hyperbolic_helix | flat_recall | 0.5292 | 0.5292 | 0.0 | Exact reproduction |
| unified_triangulation | detection_rate | 0.7582 | 0.7582 | 0.0 | Exact reproduction |
| unified_triangulation | false_positive_rate | 0.1333 | 0.1333 | 0.0 | Exact reproduction |
| null_space_ablation | e4_detection_rate | 0.8571 | 0.8571 | 0.0 | Exact reproduction |
| null_space_ablation | null_detection_rate | 1.0 | 1.0 | 0.0 | Exact reproduction |
| null_space_ablation | null_holdout_fp_rate | 1.0 | 1.0 | 0.0 | Exact reproduction |