Phase 4: Evaluation and Ablation

Experiment Plan | Chapter 6 | Prerequisites: Phase-3 | 28 min read | Home

Phase 4: Evaluation and Ablation (Week 7-9)

Goal: Deep analysis of results across all three streams.

Questions to answer:

Where does navigational retrieval outperform dense retrieval? (Which domains, which proof types?)
Where does it underperform? (Is the deficit in bank positions, anchors, or IDF weighting?)
What is the compute profile? (Neural forward passes saved per proof)
Does the multiplicative scoring prevent good matches that additive scoring would find?

Questions to answer:

Do navigational coordinates generalize to unseen theorem types better than vocabulary indices?
Does the ternary decoder's crystallization pattern reveal structured learning?
Which navigational bank is learned first? Does the order match architectural expectations?
Does the anchor prediction quality correlate with proof success?

Questions to answer:

Does navigational accuracy trajectory predict final proof success better than loss trajectory?
Does crystallization rate predict navigational consistency?
Does progress prediction accuracy improve monotonically? Does it plateau?
Does PAB detect failure modes that endpoint metrics miss?

Run full evaluation for each ablation variant from Phase 2:

Variant	MiniF2F	Mathlib	Retrieval R@16	FP/proof	Notes
Full Wayfinder	—	—	—	—	Primary
Dense retrieval (no proof network)	—	—	—	—	Navigation thesis
Tactic classification (no navigation)	—	—	—	—	Architecture thesis
No spreading activation	—	—	—	—	Spreading thesis
No progress head	—	—	—	—	Progress thesis
Continuous decoder (no ternary)	—	—	—	—	Ternary thesis
No IDF weighting	—	—	—	—	IDF thesis
No bank alignment (anchors only)	—	—	—	—	Bank thesis
Binary critic (BCE, not soft MSE)	—	—	—	—	HTPS soft-target thesis
No proof history input	—	—	—	—	LeanProgress history thesis
No hammer delegation	—	—	—	—	Hammer complementarity thesis
No accessible-premises filter	—	—	—	—	ReProver filtering thesis
3-bank navigation (original design)	—	—	—	—	6-bank expansion thesis

Deliverables:

Sources: docs/WAYFINDER_PLAN.md