Research | Chapter 5 | Prerequisites: 3-Architecture-Specification | 56 min read | Home
4. Training
4.1 Data Preparation
Training corpus: LeanDojo's extracted Mathlib dataset (~98,734 theorems with proof traces). Each example provides:
- Theorem statement and goal state
- Tactic proof with intermediate goal states
- Premise information (which lemmas are referenced)
Conversion to navigational training data: For each proof step, extract:
- Ground-truth navigational directions (from the tactic used, mapped to bank coordinates)
- Ground-truth anchors (from the goal state and tactic/premise analysis)
- Ground-truth progress (number of remaining proof steps)
- Ground-truth solvability (1.0 for all goals in successful proofs)
Negative examples: Failed proof attempts, wrong tactic selections, incorrect premises — sourced from proof search logs and synthetic perturbation.
4.2 Curriculum Learning
Inspired by LeanAgent's complexity ordering (complexity = e^S where S = proof steps) and the navigational wavefront concept:
1. Phase A (warmup): Train on 1-2 step proofs only. The model learns basic navigation near the origin. 2. Phase B (expansion): Gradually increase proof depth. 3-5 step proofs. 3. Phase C (full): All proof lengths, with oversampling of medium-difficulty theorems.
This follows the Mutation Theory's phase-transition insight: start with easy kills (short proofs near the origin), then venture into harder regions.
Navigational wavefront curriculum (Phase C refinement). During expert iteration (Phase 5), the curriculum should be navigational, not just complexity-based: order theorems by their distance in the proof network from already-proven results, creating a wavefront of learnability that expands outward through mathematical space. Theorems whose premises are 1-2 hops from already-proven theorems are trained first; theorems requiring navigation into unexplored regions of the semantic network are deferred. This addresses the saturation problem observed across systems — up to 98.5% of generated proofs during expert iteration are incorrect, and remaining unproven theorems become exponentially harder. STP (Self-play Theorem Prover) found that 47% of self-generated conjectures at the "cusp of provability" are successfully proved, vs only 11.4% of real unproved statements. The wavefront curriculum targets this cusp by definition.
4.3 PAB Integration
PAB metrics are collected at every checkpoint (every 50 training steps):
- Loss components (L_nav, L_anchor, L_progress, L_critic)
- Ternary crystallization rate
- Per-bank navigational accuracy
- Progress prediction correlation with actual remaining steps
- Critic calibration (predicted solvability vs. actual proof success)
- Domain-wise accuracy (by Mathlib namespace)
Sources: docs/WAYFINDER_RESEARCH.md