This is the public verification record of Flamehaven's governance review work. For commercial services, blueprints, and methodologies, visit flamehaven.space.
Flamehaven Labs

Verification Ledger

Public verification record for Flamehaven's governance reviews, physics verification runs, and biomedical audit scans. Every result is deterministic, locally executed, and fully inspectable.

What is the Flamehaven Verification Ledger?

The Flamehaven Verification Ledger is a transparent, reproducible public ledger of all capability evaluations, mathematical physics verification runs, and biomedical AI governance audits executed by Flamehaven.

Every log and audit report published here is fully local and deterministic—run without network dependencies or external LLM layers, so the results can be reproduced from repository state.

Flamehaven Verification Ledger

Portal Overview & Verification Engines

Equation-to-Artifact (EQA)

Verifies theoretical math and physics breakthroughs by reproducing published equations as runnable, citable, and testable computational software.

1 Paper / 53 Runs Active
Biomolecular AI Validation (BAV)

Governs biomedical AI pipelines. Surfaces 34 experiments through 6 live cards, with one grouped truthful-null sub-series and a 26-entry foundational archive.

34 Experiments · 6 Live Cards
Bioscience Compliance (BSC)

Deterministic static scanning auditing bioscience AI surfaces. Evaluates clinical boundaries and dependency risks without runtime hazards.

2 Reports Active
Methodology & Frameworks

Review dashboards, verification blueprints, and compliance mapping frameworks demonstrating systematic review methods.

1 Framework Active (v3)
Equation-to-Artifact (EQA)

Mathematical & Physical Verification Ledger

Independent, deterministic reproduction of mathematical proofs, physics models, and discrete geometry equations. Rather than generating ungrounded hypotheses, this ledger validates theoretical claims by turning abstract formulas into executable, citable, and testable scientific software.

EQA Protocol

Structured conversion of abstract mathematical equations to verified digital software:

  • Step 1: Precision Lock: Re-deriving equations using arbitrary-precision libraries (e.g. mpmath) to prevent standard underflow/catastrophic cancellation.
  • Step 2: Constraint Verification: Checking local constraints, bounds, and algebraic-geometric admissibility programmatically.
  • Step 3: CI/CD Proof-of-Work: Regression testing codebases across multiple OS platforms and environments.
Scholarly Archival Alignment

Ensuring verified math software is fit for peer-reviewed citation:

  • Citable Metadata: Enforcing standardized CITATION.cff and Zenodo integrations to issue immutable DOIs.
  • Provenance Manifests: Freezing SHA-256 cryptographic signatures of code files to verify execution immutability.
  • Audit Disclosures: Exposing explicit LaTeX paper sources alongside codebases to document deviations from literature.
Filter Ledger:
EQA-TEST-0055 2026-04-18
Research Artifact

All Elementary Functions from a Single Operator (AEFSO)

We evaluated whether the AEFSO operator eml(x,y) = exp(x) − ln(y) could serve as a TOE core component through a staged paper-to-TOE research stack: SPAR paper review, fhval validation, and 4 dogfood runs. Result: ACCEPT WITH BOUNDS at paper level, blocked for TOE core promotion, retained as OPTIONAL_REPRESENTATION_LAYER. The durable output is the missing-link IR discovery, not a core PASS.

[Report] [Repo] [Paper]
EQA-TEST-0056 2026-05-25
Verification Run

OpenAI Erdős Conjecture Disproof: Equation (2.2) Executable Reproduction

In May 2026, an OpenAI reasoning model proved that Erdős' discrete geometry conjecture is false, publishing a crucial exponent excess equation (2.2). Naive computer floats collapse this value to zero due to catastrophic cancellation. This project builds an independent Python artifact using arbitrary-precision math to successfully reproduce and verify the published numerical results. Result: The computation is fully reproduced to 0.014% relative error, locally CI-verified, and published on Zenodo.

EQA-TEST-0054 2026-05-24
Governance Audit

LOGOS-to-TOE Intake Governance Gate Verification

We evaluated whether the offline reasoning sidecar pipeline could safely ingest theoretical math solver candidates without bypassing mandatory governance review. The system was presented with an incomplete research dossier lacking a concrete algebraic model candidate. Result: this is a pre-SPAR intake governance audit, not a math verdict. LOGOS returned zero candidate results, so the contract score fell to 0.625, dangerous pass risk stayed at 1.0, and LawBinder issued INHIBIT. The useful outcome is that the boundary held: no candidate, no promotion, SPAR still mandatory.

[Repo] [Paper]
EQA-TEST-0053 2026-05-23
Runtime Audit

Reasoning Model Sidecar Pipeline & Namespace Integrity Scan

We audited active Python environment variables, package search hierarchies, and potential import path collisions between standalone and embedded reasoning APIs in local workspaces. The scan resolved library ambiguities and measured import-time dependencies. Result: this is a replay-stable runtime integration audit, not a physics verdict. `logos` resolves to [workspace]/RExSyn-Nexus-main/src/logos/__init__.py, direct Flamehaven-LOGOS import times out at 20s, AATS smoke also times out at 20s, while TOE contract tests still pass. The experiment yields a concrete boundary: keep LOGOS offline-only until namespace isolation and latency fixes land.

[Report] [Repo] [Paper]
EQA-TEST-0052 2026-05-10
Review Artifact

Fluid Dynamics GTE Pedagogy Hypothesis

We evaluated the hypothesis that the General Transport Equation (GTE) is the universal pedagogical foundation for fluid dynamics, unifying Mass, Momentum, Energy, and Scalar transport via φ-substitution — sourced from expert LinkedIn academic discussion. Result: the mathematical core is bounded-valid, but the universal pedagogy claim overreaches the reviewed evidence. The same encoded review payload yields three policy-surface outcomes — historical `73 / MINOR REVISION`, current TOE legacy `76 / MINOR REVISION`, current toe-spar `98 / ACCEPT` — so the real insight is review-policy drift, not a timeless single verdict.

[Report] [Repo] [Paper]
EQA-ARCHIVE TOE-TEST-0001 ~ 0051
Historical Records

TOE-TEST Foundational Runs (0001 ~ 0051)

The foundational Flamehaven-TOE experiment series that preceded the active EQA ledger (0052+): string-theory / topology physics, quantum-biology and protein spin-qubit studies, and the verification-methodology layers themselves. Each entry opens the verbatim source report in the Ledger Inspector (local-workspace paths sanitized; report content unedited). Ordered most-recent first.

Foundational Runs

Click any run above to open its verbatim report and provenance in the Ledger Inspector.

Biomolecular AI Validation (BAV)

Biomolecular AI Validation Ledger

Validates whether an entire biomedical AI pipeline — RExSyn reasoning + NNSL resonance + LawBinder governance — deserves trust, not just whether one model looks confident. Treats model disagreement as signal, gates fail-closed, and keeps accepted results separate from held diagnostics.

Governance Protocol

How a multi-engine pipeline is judged trustworthy — disagreement, honesty, and chain reliability:

  • Multi-Model Disagreement: AlphaFold 3 / 2 / Chai-1 / Boltz-2 cross-validation. Rising drift exposes hidden topology conflict — disagreement is signal, not noise (KEEP_OBSERVER when convergence fails).
  • Honesty Gating (SR9 / DI2): Cross-domain resonance must clear SR9 ≥ 0.70 and logical drift DI2 ≤ 0.30; the pipeline abstains rather than hallucinate confidence.
  • End-to-End Reliability: p_e2e = capture × transfer × model × clinical, with LawBinder fail-closed escalation to human review.
Governance Reproducibility

Why each verdict is auditable and how accepted results stay clean:

  • Path Separation: An accepted legacy-replay anchor is never blended with a held current-regeneration path — controlled expansion without breaking the PASS/BLOCK baseline.
  • Provenance Manifests: Verbatim run payloads frozen with SHA-256; all card values are live-fetched from them, never hardcoded.
  • Honest Scope: Pipeline reliability & governance only — explicitly not clinical efficacy, with disabled features disclosed.
📖 Metrics & Engines — Glossary
Each metric is tagged by where its authority comes from: EXTERNAL third-party-defined & checkable · DERIVED recomputable from external inputs · RULE deterministic rule citing an external basis · ADVISORY Flamehaven internal, not externally validated.
SR9 (Scientific Resonance) ADVISORY — cross-domain consistency: does the reasoning stay coherent across chemistry, genomics, and proteomics? Higher is better (guard ≥ 0.70).
DI2 (Dimensional Integrity) ADVISORY — reasoning drift: internal contradiction across inference steps. Lower is better (guard ≤ 0.30).
NNSL ADVISORY — the Flamehaven semantic-resonance engine that computes SR9 / DI2 (internal).
RExSyn ADVISORY — the Flamehaven hypothesis-synthesis engine (multi-validator); runs observer-first.
LawBinder RULE — a fail-closed governance gate that escalates to human review when uncertain.
LOGOS ADVISORY — Flamehaven's internal multi-engine reasoning orchestrator: coordinates inference across several solver engines under LawBinder governance; candidate generator for the EQA intake gate. (Codename, not a claim of autonomy.)
TOE ADVISORY — the mathematical-model verification engine: turns candidate physics models into runnable, gated checks (run IDs TOE-TEST-NNNN). Named for the Theory-of-Everything physics it tests against — it does not claim to be one.
p_e2e DERIVED — end-to-end reliability = capture × transfer × model × clinical (recomputable from the shown factors).
pLDDT / PAE / pTM EXTERNAL — standard AlphaFold confidence metrics (per-residue confidence, predicted aligned error, predicted TM-score), defined by DeepMind/EBI.
Brier / ECE EXTERNAL — standard calibration metrics (probability accuracy; lower is better).
Filter Ledger:
ENGINE OVERVIEW RExSyn + NNSL
Context

What This Lane Validates: the RExSyn + NNSL Governance Chain

BAV does not do drug discovery — it validates whether a biomedical AI pipeline deserves trust. Two engines anchor the chain, themselves audited for code health (pipeline-insight Omega):

  • RExSyn (Nexus) — trinity hypothesis + multi-validator synthesis. Audited Omega 0.665 (Revoked): runs observer-first, not as an accepted decision-maker.
  • NNSL — semantic resonance / governance verification (SR9 cross-domain coherence, DI2 logical drift). Audited Omega 0.919 (Certified).
  • LawBinder — fail-closed policy gate: escalates to human review when uncertain, never silently approves.
EXP-034 · METHODLOCK 2026-04-19
GO · Path Held

When a Pipeline Passes — But One Path Must Still Be Held

Modal expansion (AlphaFold-EBI observer, AlphaGenome live) was admitted only after the parity anchor reproduced. The legacy-replay path was accepted (GO); the current-regeneration path was held (HOLD) rather than blended into success. Result: Final stage gate = PASS for the accepted anchor, with a separate diagnostic HOLD for current regeneration. Accuracy delta stayed 0.0: non-degradation, not repair.

Note
EXP-033 · LAWBINDER-CRITIC 2026-03-10
Pipeline Audit

How Do You Know the Entire Pipeline Is Wrong — Not Just One Model?

Validating each model in isolation missed the chain failure. Against the EXP-032 parity baseline, EXP-033 current repro2 drove all PASS-eligible controls to BLOCK while keeping dangerous false-pass at zero. Result: balanced accuracy fell to 0.50, pass recall collapsed to 0.00, and rule routing shifted from R6_pass to R5_e2e_floor. This is a pipeline failure record, not a model victory.

Note
EXP-032 · ADAPTIVE-GATE 2026-03-07
Verdict GO

Adaptive Gate: Pipeline Governance & PASS/BLOCK Discrimination

Accepted legacy-replay parity anchor for the RExSyn + NNSL governance chain. The benchmark spans two labeled control classes expanded into six arm payloads; current-regeneration outputs are diagnostic-only and excluded from success claims. Result: Clinical parity is GO (PASS->PASS, BLOCK->BLOCK, dangerous false-pass 0.0), but LawBinder still returns ESCALATE on both classes. Shadow hints are non-binding, strict-evidence recheck still fails, and the replay remains governance-only rather than a production gate activation.

Note
EXP-031 · OOD-ABLATION 2026-02-09
Keep Observer

Trinity Protocol: Multi-Model Disagreement Under Out-of-Distribution Stress

Cross-checked AF2 / AF3 / Boltz-2 / Chai-1 artifact metrics on an out-of-distribution protein-ligand target across three observer-only validator arms. AlphaGenome participated as a redesign engine, not as a direct validator score source. Adding independent validators increased structural disagreement rather than reducing it — exposing topology conflict invisible to any single model. Result: All arms returned "Unverified (Drift Detected)" / failed convergence. Manual validator metrics were preserved as observer-only evidence, promotion evidence remained insufficient, and the disposition stayed KEEP_OBSERVER (do not target).

Note Reproduce [Paper]
EXP-028 · POST-OVERLAY 2026-02-05
Honest Abstain

The Honesty Test: It Looked Perfect, Then It Failed

Integrating AlphaFold 3 + AlphaGenome produced confident, well-calibrated outputs — yet the honesty check (SR9 cross-domain resonance, DI2 logical drift) flagged contradictory reasoning on a very small pilot. Result: Brier 0.0056 and AUC 1.0 are real, but phase1 used only 6 samples, phase3 tested only 2, and the deployed gate fell back from 0.5 to 0.075 with Youden J = 0. The honest result is not "perfect performance." It is a tiny fallback-gated pilot that refuses to overclaim.

Note
EXP-005~007 · SEP UPADACITINIB 2026-01-24
Truthful Null

How Failing in 2 Hours Saved 8 Months — Upadacitinib Topical Formulation

Early RExSyn line-first control runs screened three lipid-based carriers (SLN, NLC, Liposomal Gel) for topical Upadacitinib, with manual fact injection present upstream and the NNSL SR9 honesty gate (≥ 0.80) deciding eligibility. Result: all three carriers scored SR9 ~0.23–0.28 and were rejected. The useful finding is the honest null, not a fully autonomous pipeline win: the public record now exposes the manual-assisted provenance, the validator split, and the fact that the active decision window was ~2 hours after separate literature pre-work.

Note
EXP-001~030 · ARCHIVE 2026-01~04
Archived

Foundational Iterations (EXP-001 ~ 030)

The early RExSyn / NNSL iteration sequence that established the trinity-consensus, multimodal, and governance primitives. Experiments EXP-028 and EXP-031~034 graduated from this lineage into full ledger cards above. Archived as historical record — reproducibility / governance only.

Foundational Iteration Registry (26 experiments)
Bioscience Compliance (BSC)

Bioscience Repository Compliance Ledger

Deterministic static scanning auditing bioscience AI surfaces. Evaluates clinical boundaries, safety limitations, and license compliance to enforce safe, transparent distribution.

BSC Protocol

Multi-stage compliance scanning auditing repository surfaces without runtime hazards:

  • Stage 1: Intent Audit: Evaluating repository readme documentation to verify transparency and diagnostic intents.
  • Stage 2R: Repo Consistency: Static dependency pinning audits and repository surface structure checks.
  • Stage 3: Responsibility Integrity: Safety checks auditing clinical boundaries and clinical-use restriction compliance.
Compliance Archival Alignment

Ensuring verified compliance audits are fully documented and citable:

  • Citable Metadata: Providing standardized, inspectable audit report metadata schemas.
  • Provenance Manifests: Freezing SHA-256 cryptographic signatures of code files and audited snapshots.
  • Audit Disclosures: Exposing complete static analysis findings, warnings, and failure points transparently.
How to read scores & tiers
0 — Critical Risk 30 50 70 100 — Clear
T0 Quarantine T1 Review Required T2 Conditional T3 Clear

Score = weighted sum of Stage 1 (README Intent) + Stage 2R (Repo Consistency) + Stage 3 (Code/Bio Responsibility) − penalty.  Tier is the deployment-readiness verdict derived from score thresholds and key risk signals.  Stage 4 (Replication) is a separate lane and does not alter the formal tier.

Reports
Avg Score
Quarantine
Caution
Filter Tiers:
Showing 2 reports
Bioscience Compliance · 2026-05-18
Audit Date: 2026-05-18 · Expires: 2026-07-02
48 /100
48
Final Score
T1 Quarantine
S1
75
S2R
40
S3
25
S4*
30
Selection & Evaluation Brief: Selected for static safety audit as a high-utility bioscience repository with suspected clinical capability. Evaluated under the Bioscience Repository Compliance framework. The resulting T1 Quarantine verdict is driven by critical clinical-use boundary omissions (R2R_D2), lack of workflow replication pathways (R2R_D4), and governance documentation gaps. This is an exploratory verification run; strict prohibition is enforced against any patient-adjacent or clinical environments.
Missing clinical use boundary (R2R_D2)
−20
Unsupported workflow claim (R2R_D4)
−15
C2 Dependency Pinning — WARN
WARN
C1 Hardcoded Credentials — PASS
PASS
Bioscience Compliance · 2026-05-21
Audit Date: 2026-05-21 · Expires: 2026-07-05
60 /100
60
Final Score
T2 Caution
S1
70
S2R
50
S3
54
S4*
35
Selection & Evaluation Brief: Audited as a high-utility clinical-support assistant repository to evaluate safety-critical diagnostic claims. The resulting T2 Caution verdict reflects moderate alignment: while robust CI/CD hygiene (S3_T1), clear data provenance controls (S3_B1), and documented limitations (S3_B2) are established, the absence of an explicit clinical-use boundary restriction (R2R_D2) requires independent verification prior to any commercial or non-research deployment.
Missing clinical use boundary (R2R_D2)
−20
CI/CD workflow files present (S3_T1)
+15
Data provenance controls (S3_B1)
+15
Bias/limitations documented (S3_B2)
+8
C5 Compliance Boundary Integrity — WARN
WARN

No reports match your search.

Methodology & Frameworks

Verification Methodology Hub

Reusable templates, operational frameworks, and practical code supporting deterministic AI verification and bioscience compliance auditing. All resources are structured, citable, and ready to deploy.

Methodology Protocol

Structured review templates, frameworks, and operational audit protocols:

  • PR Action Plan v3: Agent review dashboard for systematic pull-request audit workflows with deterministic verdict dispatch.
  • Audit Frameworks: Governance gate protocols and verification methodology frameworks (in preparation).
  • Practical Code: Downloadable scan scripts, JSON schemas, and ledger utilities (in preparation).
Resource Archival Alignment

Ensuring all methodology resources are citable, inspectable, and reproducible:

  • Versioned Templates: Every template is version-tagged and archived with a stable ledger reference.
  • Citable Metadata: Resources link directly to originating audit records and verification runs.
  • Open Distribution: All practical code and frameworks are published as static, zero-dependency artifacts.
Resources
Templates
HTML Effectiveness Template
html-effectiveness framework · Blank reusable · 9 document types · Zero dependencies
Open →
Frameworks
No frameworks published yet.
Practical Code
No code resources published yet.
Link copied ✓