Experiment Plan | Chapter 6 | Prerequisites: Phase-3 | 28 min read | Home

Phase 4: Evaluation and Ablation (Week 7-9)

Goal: Deep analysis of results across all three streams.

4.1 Stream 1 Analysis: Navigation vs. Dense Retrieval

Questions to answer:

  • Where does navigational retrieval outperform dense retrieval? (Which domains, which proof types?)
  • Where does it underperform? (Is the deficit in bank positions, anchors, or IDF weighting?)
  • What is the compute profile? (Neural forward passes saved per proof)
  • Does the multiplicative scoring prevent good matches that additive scoring would find?

4.2 Stream 2 Analysis: Navigational Decoder vs. Token Classification

Questions to answer:

  • Do navigational coordinates generalize to unseen theorem types better than vocabulary indices?
  • Does the ternary decoder's crystallization pattern reveal structured learning?
  • Which navigational bank is learned first? Does the order match architectural expectations?
  • Does the anchor prediction quality correlate with proof success?

4.3 Stream 3 Analysis: PAB Trajectory Evaluation

Questions to answer:

  • Does navigational accuracy trajectory predict final proof success better than loss trajectory?
  • Does crystallization rate predict navigational consistency?
  • Does progress prediction accuracy improve monotonically? Does it plateau?
  • Does PAB detect failure modes that endpoint metrics miss?

4.4 Ablation Matrix

Run full evaluation for each ablation variant from Phase 2:

VariantMiniF2FMathlibRetrieval R@16FP/proofNotes
Full WayfinderPrimary
Dense retrieval (no proof network)Navigation thesis
Tactic classification (no navigation)Architecture thesis
No spreading activationSpreading thesis
No progress headProgress thesis
Continuous decoder (no ternary)Ternary thesis
No IDF weightingIDF thesis
No bank alignment (anchors only)Bank thesis
Binary critic (BCE, not soft MSE)HTPS soft-target thesis
No proof history inputLeanProgress history thesis
No hammer delegationHammer complementarity thesis
No accessible-premises filterReProver filtering thesis
3-bank navigation (original design)6-bank expansion thesis

4.5 Write-up

Deliverables:

  • Updated WAYFINDER_RESEARCH.md with experimental results
  • docs/EXPERIMENT_RESULTS.md — raw results ledger
  • PAB profile comparisons across all variants
  • Failure analysis with navigational interpretation


Sources: docs/WAYFINDER_PLAN.md

See Also