Research | Chapter 8 | Prerequisites: 5-Evaluation | 56 min read | Home

7. Discussion

7.1 The Intelligence Budget Argument

The key insight driving Wayfinder is about where to spend intelligence. Current theorem provers spend enormous compute on neural inference during search — every candidate tactic, every candidate premise, every tree node expansion requires a forward pass through a large model. The model must encode all mathematical knowledge in its weights because search is parametric.

Wayfinder inverts this: mathematical knowledge is encoded explicitly in a structured semantic network (the proof network). The neural model handles only the genuinely hard part — understanding the current proof state and predicting which direction to go. Search and retrieval are outsourced to deterministic symbolic operations that are orders of magnitude cheaper per query.

This is the same design philosophy as ModelAtlas: "Don't waste tokens on problems that have been solved for 50 years — waste them on your terms." Set intersection, IDF computation, and Bellman-Ford are solved problems. Understanding a Lean goal state is not. Spend the neural compute where it matters.

7.2 Limitations

1. Proof network construction requires Mathlib-specific engineering. Bank positions and anchors must be extracted from Lean's type system, namespace hierarchy, and proof corpus. This is significant upfront work. The anchor gap analysis procedure (Section 2.3) mitigates but does not eliminate this cost.

2. Scoring mechanism sensitivity. The scoring mechanism (multiplicative, confidence-weighted, or hybrid — see Section 2.4) must balance precision against noise tolerance. The optimal mechanism likely varies with proof network maturity and may need to evolve during training. This is an active design question with multiple viable alternatives under evaluation.

3. Anchor dictionary coverage. The system can only retrieve premises whose relevant properties are captured by anchors. Subtle mathematical relationships not expressible as discrete labels will be missed. The iterative anchor gap analysis (Section 2.3) is designed to systematically close these gaps, but long-tail mathematical concepts will always be harder to anchor than common ones.

4. Encoder quality. The encoder is the system's perceptual bottleneck (Section 3.2). The strategy of evaluating math-native models, aggressively pruned large models, and fine-tuned sentence transformers mitigates this risk, but the optimal encoder may not be known until empirical evaluation in Phase 1.

5. Comparison fairness. DeepSeek-Prover-V2 uses a 671B model. Direct comparison on theorem-proved rate is unfair. Our contribution is architectural efficiency (proofs found per compute unit), not raw performance.

7.3 Hammer Complementarity

Wayfinder is not a replacement for ATP-based hammer tools (Sledgehammer, CoqHammer, LeanHammer). It is complementary. The AUTOMATION bank cleanly separates their domains:

  • AUTOMATION ≤ -1 (fully decidable): Goals that can be translated to first-order logic and solved by external ATPs. These are the hammer's domain. When the navigator predicts AUTOMATION = -1, delegate to LeanHammer/Aesop directly. This is the "known region" of proof space — no navigation needed.
  • AUTOMATION ≥ 0 (requires structured reasoning): Goals needing specific lemma selection, manual case analysis, induction, or multi-step decomposition. These are Wayfinder's domain. Hammers fail here because the goals aren't translatable to first-order logic, or the right premises aren't in the hammer's premise set.

LeanHammer (2025) shows that hybrid neural+symbolic achieves 37.3% — outperforming any single method by 21%. Wayfinder + hammer is a natural hybrid: navigational retrieval feeds premise candidates to hammers for the decidable portion, while handling the non-decidable portion via structured proof search. The AUTOMATION bank direction is the routing decision.

7.4 The Broader Vision

If navigational proof search works — if structured semantic networks can compete with dense retrieval for mathematical reasoning — the implications extend beyond theorem proving:

  • Any domain with structured output: Code generation (API calls from a known catalog), molecular design (reactions from known reaction types), circuit synthesis (components from known libraries).
  • Any domain with a verification oracle: The Lean kernel provides binary pass/fail. Any domain with an equivalent (compiler, simulator, SAT solver) can use the same navigational architecture.
  • Any domain where knowledge structure is known a priori: Mathematics has a rich, well-understood structure (algebraic hierarchies, topological properties, order relations). Any domain with similar structure can be encoded in a semantic network.

The thesis of Wayfinder is not specific to theorem proving. It is a general claim about the relationship between structured knowledge and neural prediction: when the knowledge structure is known, encode it explicitly and let the neural network navigate it. Don't force the neural network to learn what you already know.



Sources: docs/WAYFINDER_RESEARCH.md

See Also