Training — Wayfinder

Research | Chapter 5 | Prerequisites: 3-Architecture-Specification | 56 min read | Home

4. Training

4.1 Data Preparation

Training corpus: LeanDojo's extracted Mathlib dataset (~98,734 theorems with proof traces). Each example provides:

Theorem statement and goal state
Tactic proof with intermediate goal states
Premise information (which lemmas are referenced)

Conversion to navigational training data: For each proof step, extract:

Ground-truth navigational directions (from the tactic used, mapped to bank coordinates)
Ground-truth anchors (from the goal state and tactic/premise analysis)
Ground-truth progress (number of remaining proof steps)
Ground-truth solvability (1.0 for all goals in successful proofs)

Negative examples: Failed proof attempts, wrong tactic selections, incorrect premises — sourced from proof search logs and synthetic perturbation.

4.2 Curriculum Learning

Inspired by LeanAgent's complexity ordering (complexity = e^S where S = proof steps) and the navigational wavefront concept:

1. Phase A (warmup): Train on 1-2 step proofs only. The model learns basic navigation near the origin. 2. Phase B (expansion): Gradually increase proof depth. 3-5 step proofs. 3. Phase C (full): All proof lengths, with oversampling of medium-difficulty theorems.

This follows the Mutation Theory's phase-transition insight: start with easy kills (short proofs near the origin), then venture into harder regions.

Navigational wavefront curriculum (Phase C refinement). During expert iteration (Phase 5), the curriculum should be navigational, not just complexity-based: order theorems by their distance in the proof network from already-proven results, creating a wavefront of learnability that expands outward through mathematical space. Theorems whose premises are 1-2 hops from already-proven theorems are trained first; theorems requiring navigation into unexplored regions of the semantic network are deferred. This addresses the saturation problem observed across systems — up to 98.5% of generated proofs during expert iteration are incorrect, and remaining unproven theorems become exponentially harder. STP (Self-play Theorem Prover) found that 47% of self-generated conjectures at the "cusp of provability" are successfully proved, vs only 11.4% of real unproved statements. The wavefront curriculum targets this cusp by definition.

4.3 PAB Integration

PAB metrics are collected at every checkpoint (every 50 training steps):

Loss components (L_nav, L_anchor, L_progress, L_critic)
Ternary crystallization rate
Per-bank navigational accuracy
Progress prediction correlation with actual remaining steps
Critic calibration (predicted solvability vs. actual proof success)
Domain-wise accuracy (by Mathlib namespace)

Sources: docs/WAYFINDER_RESEARCH.md

4. Training

4.1 Data Preparation

4.2 Curriculum Learning

4.3 PAB Integration

See Also