The Proof Network — Wayfinder

Design | Chapter 2 | Prerequisites: 1-Design-Thesis | 24 min read | Home

2. The Proof Network

The proof network is a SQLite database encoding mathematical structure. It is the architectural equivalent of ModelAtlas's network.db — the precomputed semantic space through which all search navigates.

2.1 Proof Banks

Six orthogonal signed dimensions. Each bank has a zero state at the mode — the most common mathematical situation a working mathematician encounters.

Bank	Zero State	Negative Direction	Positive Direction	Extraction Source
STRUCTURE	equality / rewrite	decidable, arithmetic	quantified, dependent	Goal type analysis
DOMAIN	general algebra	concrete (ℕ, ℤ, ℚ)	abstract (topology, category)	Namespace hierarchy
DEPTH	2-3 tactic proof	trivial (1 tactic)	deep (10+ tactics)	Proof length
AUTOMATION	partially automated	fully auto (omega, simp)	manual reasoning	Tactic analysis
CONTEXT	moderate context (3-5 hyps)	no hypotheses	rich context (10+)	Goal state parsing
DECOMPOSITION	single goal	atomic (no splits)	multi-subgoal	Proof structure

Signed position encoding: Each entity stores (sign, depth) per bank. sign ∈ {-1, 0, +1}, depth ∈ {0, 1, 2, 3}. Signed position = sign × depth.

Why signed hierarchies instead of categories: A categorical "domain" field with values {algebra, topology, analysis} can't express proximity. Signed positions give gradient scoring — a linear algebra lemma (DOMAIN = 0, between concrete and abstract) scores better against an algebra query than a topology lemma, without binary mismatch. Zero placement at the mode means most queries resolve near the origin.

2.2 Anchor Dictionary

A shared vocabulary of semantic labels. Target: ~300 anchors at launch, growing organically as the system discovers mathematical patterns.

Bootstrap anchors (created before training):

STRUCTURE bank: equality, inequality, membership, universal-quantifier, existential-quantifier, iff-statement, implication, negation, dependent-type, pi-type, sigma-type, propositional, data-type, function-type

DOMAIN bank: nat-arithmetic, int-arithmetic, rat-arithmetic, real-analysis, complex-analysis, group-theory, ring-theory, field-theory, module-theory, linear-algebra, topology, metric-space, measure-theory, probability, order-theory, lattice-theory, set-theory, finset, multiset, category-theory, homological-algebra, number-theory, combinatorics, graph-theory

DEPTH bank: one-liner, two-step, multi-step, deep-induction, structural-recursion, nested-proof

AUTOMATION bank: omega-solvable, simp-solvable, decide-solvable, norm-num-solvable, linarith-solvable, ring-solvable, field-simp-solvable, aesop-solvable, tauto-solvable, needs-manual-intro, needs-manual-cases, needs-manual-induction, needs-manual-apply, needs-manual-exact

CONTEXT bank: hypothesis-free, hypothesis-light, hypothesis-rich, uses-local-def, uses-instance, uses-classical, uses-choice

DECOMPOSITION bank: single-goal, needs-cases, needs-induction, needs-have-chain, needs-constructor, multi-branch, needs-rcases, needs-obtain

Cross-cutting anchors (mathematical properties, not bank-specific): monotonicity, commutativity, associativity, distributivity, idempotency, transitivity, reflexivity, symmetry, antisymmetry, injectivity, surjectivity, bijectivity, continuity, differentiability, integrability, compactness, connectedness, finiteness, countability, convergence, boundedness, well-foundedness, decidability, cancellation, absorption

Known anchor gaps (to be addressed in bootstrap):

Type coercion: needs-cast, nat-int-coercion, subtype-coercion, coe-simp — Mathlib proofs frequently struggle with coercion between numeric types
Algebraic hierarchy: monoid-level, group-level, ring-level, field-level — position in the typeclass hierarchy
Proof patterns: split-and-recombine, contrapositive-argument, epsilon-delta, sequence-limit, diagonal-argument — recurring structural patterns beyond individual tactics

Anchor lifecycle: 1. Bootstrap: ~300 anchors from the lists above, plus known gap anchors. 2. Deterministic extraction: Type analysis and namespace matching assign anchors with confidence 1.0. 3. Pattern matching: Proof text analysis assigns anchors with confidence 0.8. 4. Discovery: New anchors may be minted when the extraction pipeline encounters unclassifiable patterns. These start at confidence 0.5 and are validated by checking co-occurrence with existing anchors. 5. Gap analysis (iterative, Phase 0-1): For 500 random proof steps, attempt resolution with current anchors. For each failure (correct premise not in top-16), identify what anchors would have connected goal to premise. Cluster gaps by theme, add as new anchors, re-run. Iterate until top-16 recall exceeds 70% on the sample. This is the highest-leverage Phase 0 activity.

IDF computation: idf(anchor) = log(N / count_entities_with_anchor). Cached in anchor_idf table. Invalidated and recomputed after any batch update to entity-anchor links.

2.3 Entity Links

Explicit relationships between mathematical entities:

Relation	Weight	Extraction Source
`depends_on`	0.9	Proof references this lemma
`same_namespace`	0.7	Share Mathlib namespace (e.g., `Nat.`)
`same_declaration_block`	0.8	Adjacent in source file
`commonly_co_occurs`	0.6	Frequently used together in proofs
`tactic_precedes`	0.5	Tactic T commonly follows tactic S
`generalization_of`	0.85	One lemma generalizes another

Links enable the spreading activation channel. When a proof has already used lemma A, activation spreads through depends_on and commonly_co_occurs links to find related lemmas.

2.4 Tactic Entities

Tactics are entities in the proof network, not just string labels. Each tactic has bank positions and anchors:

Tactic	STRUCTURE	AUTOMATION	Anchors
`omega`	(-1, 2)	(-1, 2)	`omega-solvable`, `nat-arithmetic`, `int-arithmetic`, `decidable`
`simp`	(-1, 1)	(-1, 1)	`simp-solvable`, `rewriting`, `simplification`
`intro`	(0, 0)	(+1, 1)	`needs-manual-intro`, `universal-quantifier`, `implication`
`cases`	(+1, 1)	(+1, 1)	`needs-cases`, `multi-branch`, `inductive-type`
`induction`	(+1, 2)	(+1, 2)	`needs-induction`, `structural-recursion`, `nat-arithmetic`
`apply`	(0, 0)	(+1, 1)	`needs-manual-apply`, `implication`
`exact`	(-1, 1)	(+1, 1)	`needs-manual-exact`, `one-liner`
`have`	(+1, 1)	(+1, 1)	`needs-have-chain`, `multi-step`, `decomposition`
`rw`	(0, 0)	(0, 0)	`rewriting`, `equality`
`linarith`	(-1, 2)	(-1, 2)	`linarith-solvable`, `inequality`, `arithmetic`

When the Proof Navigator produces navigational coordinates, navigate() over the tactic entity table finds the tactics whose bank positions align with the predicted direction.

External validation: This two-level resolution (coarse tactic category → fine argument selection) is validated by TacticToe's finding that argument prediction for abstracted tactics has the highest impact on proof success. The hard part isn't selecting apply vs cases — it's selecting which lemma to apply. Navigation handles the coarse level; the proof network's anchor-based premise retrieval handles the fine level. See WAYFINDER_RESEARCH.md Section 2.5 for the activation steering evidence that further supports this decomposition.

Sources: docs/WAYFINDER_DESIGN.md

2. The Proof Network

2.1 Proof Banks

2.2 Anchor Dictionary

2.3 Entity Links

2.4 Tactic Entities

See Also