Experiment Plan | Chapter 1 | 28 min read | Home

Overview

This document specifies the sequence of work, tooling, compute decisions, and stop/go criteria for running the Wayfinder experiments. All source code and scripts are implemented (27 src files, 8 scripts, 1 config). This plan covers data pipeline execution and experimental evaluation.

This plan produces three categories of results:

  • Stream 1 (Navigation): Does structured semantic navigation — bank positions, IDF-weighted anchors, spreading activation — outperform dense embedding retrieval for premise selection and proof search?
  • Stream 2 (Architecture): Does a ternary navigational decoder that produces directional coordinates, resolved through a semantic network, outperform tactic-token classification?
  • Stream 3 (Process Evaluation): Does PAB's trajectory evaluation reveal information about the navigational learning process that endpoint metrics cannot?

Every phase states what it validates in each stream. The experimental pipeline is designed so that a single set of experiments tests all three claims. Results are recorded in docs/EXPERIMENT_RESULTS.md.

Time horizon: 6-10 weeks for Phase 0-4. Phase 5+ contingent on results.

Compute budget: Apple Silicon (M-series Mac) as primary target. The architecture is designed for efficiency: small learnable navigational components (~400K trainable params) and symbolic search (SQLite). Encoder selection (Phase 0.6) will determine whether the encoder is frozen, fine-tuned, or aggressively pruned from a larger model — this is the main variable in compute requirements. A pruned math-native model (70-350M effective params) may require brief cloud GPU time for the pruning/fine-tuning step but runs locally thereafter.



Sources: docs/WAYFINDER_PLAN.md

See Also