Design | Chapter 8 | Prerequisites: 1-Design-Thesis | 24 min read | Home
8. Module Architecture
All modules are implemented. The codebase consists of 28 source files and 8 scripts.
8.0 Three-Lane Verification Architecture
Proof verification operates across three lanes at different granularities:
| Lane | Purpose | Backend | Latency | Metric Category |
| A | Step-wise tactic search (inner loop) | Pantograph (local) | ~1ms/tactic | raw_success |
| B | Proof audit: verify, repair, decompose | Axle API (cloud) | ~100-200ms/call | axle_assisted_success, axle_repair_only |
| C | High-assurance recheck (leaderboard only) | lean4checker / Comparator / SafeVerify | seconds | independent validation |
Lane A is the core search loop (proof_search.py → lean_interface.py). It applies tactics one at a time within a 600-step budget.
Lane B is the ProofAuditor service (proof_auditor.py → Axle API). It operates on complete or near-complete proofs after Lane A finishes. Operations: verify_proof (authoritative check), repair_proofs (close remaining goals with terminal tactics), sorry2lemma (decompose failures into subgoals for retry). Lane B never inflates Lane A metrics — results are tagged with strict SuccessCategory.
Lane C is optional and reserved for results submitted to benchmarks/leaderboards. Axle itself notes that verify_proof trusts the Lean environment and is not adversarial-safe (see Axle docs). For high-assurance, use lean4checker, Comparator, or SafeVerify.
Cost controls (Lane B):
- Content-hash cache (SHA-256) avoids redundant Axle API calls
- Caller filters to top-N candidates or high critic score before invoking
- Bounded async concurrency (
max_concurrencyin config) - Graceful degradation on timeout/429/503 (returns failed
AuditResult, search continues)
Metric separation prevents inflated claims. Benchmark reports always show:
raw_success: proved by Lane A alone, no Axle involvementaxle_assisted_success: Lane A partial + Axle decompose/repairaxle_repair_only: Axlerepair_proofsclosed remaining goals
Scope caveat: Axle is designed for simple imports/theorems/definitions. Tool count and API surface are evolving (v1.0.0, March 2026). Pin environment aggressively in config; update only with Mathlib version bumps.
src/
├── encoder.py GoalEncoder (math-native model, see Research §3.2)
├── goal_analyzer.py GoalAnalyzer (features + bank positions + anchor logits)
├── bridge.py InformationBridge (continuous → discrete bottleneck)
├── proof_navigator.py ProofNavigator (ternary navigational decoder)
├── proof_network.py Proof network database interface
├── resolution.py Structured resolution (navigate + spread)
├── proof_search.py Lane A: outer search loop with Lean interaction
├── lean_interface.py Lane A: Pantograph-based Lean kernel interaction
├── proof_auditor.py Lane B: Axle-based verification/repair/decomposition
├── nav_contracts.py Navigational data contracts (NavigationalExample, NavOutput, etc.)
├── ternary_decoder.py TernaryLinear, ternary_quantize (shared)
├── domain_gate.py OOD detection gate
├── losses.py NavigationalLoss (UW-SO adaptive)
├── lowering.py Deterministic tactic lowering
├── contracts.py Core data contracts (ProofExample, etc.)
├── data.py Dataset loading (NavigationalDataset, etc.)
├── verification.py Proof verification
├── pab_profile.py PAB profile artifact
├── pab_metrics.py PAB metric functions
├── pab_tracker.py Checkpoint accumulation
├── behavioral_fingerprint.py Behavioral signatures
├── trainer.py Training loop
├── trainer_steps.py Train step mixin
├── trainer_checks.py Gradient health, logging
├── trainer_constants.py Domain inference
├── trainer_setup.py Pipeline construction
├── trainer_validation.py Proof validation
└── evaluate.py Evaluation harness
scripts/
├── extract_proof_network.py Populate proof network from Mathlib
├── build_nav_training_data.py Convert proof traces to nav labels
├── anchor_gap_analysis.py Iterative anchor gap analysis
├── tactic_maps.py Tactic-to-direction mapping tables
├── train_navigator.py Training script with curriculum
├── eval_retrieval.py Nav vs dense retrieval comparison
├── eval_spreading.py Spreading activation evaluation
└── run_benchmark.py MiniF2F + Mathlib benchmark runner
configs/
└── wayfinder.yaml Full Wayfinder experiment configuration
data/ (generated/downloaded at runtime)
├── proof_network.db SQLite semantic network
├── nav_training.jsonl Navigational training data
├── nav_eval.jsonl Navigational eval data (frozen)
└── leandojo_mathlib.jsonl LeanDojo extracted dataset
8.1 Navigational Modules
proof_navigator.py: Replaces ternary_decoder.py's vocabulary classification heads with 6 navigational direction heads (one per navigable bank). Inherits TernaryLinear for hidden layers. Outputs 6 ternary directions (3^6 = 729 coarse bins) + continuous anchor logits + progress + critic. Configurable via navigable_banks for graceful degradation to 3-bank.
proof_network.py: SQLite interface for the semantic network. Adapted from ModelAtlas's db.py. Schema: entities, entity_positions, anchors, entity_anchors, entity_links, anchor_idf. Core functions: navigate(), spread(), get_anchor_set(), compute_idf(), batch_get_positions().
resolution.py: Converts ProofNavigator output to concrete tactics and premises. Builds a StructuredQuery from ternary directions + top-k anchors. Calls navigate() on the tactic and premise entity tables. Combines with spread() scores from proof context. Returns ranked (tactic, premise_list) pairs.
proof_search.py: Outer search loop. Manages open goals, proof context, search budget. Calls the neural pipeline once per goal, resolution symbolically, and the Lean kernel for verification. Goal selection uses critic (solvability estimate) and progress (remaining steps). Uses Pipeline dataclass bundling encoder/analyzer/bridge/navigator.
lean_interface.py (Lane A): Wrapper around LeanDojo's Pantograph interaction protocol. Provides try_tactic(goal_state, tactic_text) -> TacticResult with success/failure and new goal states. Supports stub backend for offline testing. Configurable via lean.backend in config.
proof_auditor.py (Lane B): Axle API wrapper for proof verification, repair, and subgoal decomposition. ProofAuditor service with verify(), repair(), decompose(), check(), and extract_theorems(). Content-hash cache, bounded concurrency, graceful degradation on API errors. All results tagged with SuccessCategory (raw/assisted/repair_only). Optional dependency: pip install wayfinder[axle].
nav_contracts.py: Navigational data contracts including NavigationalExample, NavOutput, StructuredQuery, Candidate, SearchContext, BANK_NAMES.
8.2 Retained Modules
All modules from the BalancedSashimi port are retained. The GoalAnalyzer is extended (not replaced) with bank and anchor heads. The InformationBridge is unchanged. The TernaryLinear and STE infrastructure is shared between the existing TernaryDecoder and the new ProofNavigator.
Sources: docs/WAYFINDER_DESIGN.md