Design | Chapter 2 | Prerequisites: 1-Design-Thesis | 24 min read | Home

2. The Proof Network

The proof network is a SQLite database encoding mathematical structure. It is the architectural equivalent of ModelAtlas's network.db — the precomputed semantic space through which all search navigates.

2.1 Proof Banks

Six orthogonal signed dimensions. Each bank has a zero state at the mode — the most common mathematical situation a working mathematician encounters.

BankZero StateNegative DirectionPositive DirectionExtraction Source
STRUCTUREequality / rewritedecidable, arithmeticquantified, dependentGoal type analysis
DOMAINgeneral algebraconcrete (ℕ, ℤ, ℚ)abstract (topology, category)Namespace hierarchy
DEPTH2-3 tactic prooftrivial (1 tactic)deep (10+ tactics)Proof length
AUTOMATIONpartially automatedfully auto (omega, simp)manual reasoningTactic analysis
CONTEXTmoderate context (3-5 hyps)no hypothesesrich context (10+)Goal state parsing
DECOMPOSITIONsingle goalatomic (no splits)multi-subgoalProof structure

Signed position encoding: Each entity stores (sign, depth) per bank. sign ∈ {-1, 0, +1}, depth ∈ {0, 1, 2, 3}. Signed position = sign × depth.

Why signed hierarchies instead of categories: A categorical "domain" field with values {algebra, topology, analysis} can't express proximity. Signed positions give gradient scoring — a linear algebra lemma (DOMAIN = 0, between concrete and abstract) scores better against an algebra query than a topology lemma, without binary mismatch. Zero placement at the mode means most queries resolve near the origin.

2.2 Anchor Dictionary

A shared vocabulary of semantic labels. Target: ~300 anchors at launch, growing organically as the system discovers mathematical patterns.

Bootstrap anchors (created before training):

STRUCTURE bank: equality, inequality, membership, universal-quantifier, existential-quantifier, iff-statement, implication, negation, dependent-type, pi-type, sigma-type, propositional, data-type, function-type

DOMAIN bank: nat-arithmetic, int-arithmetic, rat-arithmetic, real-analysis, complex-analysis, group-theory, ring-theory, field-theory, module-theory, linear-algebra, topology, metric-space, measure-theory, probability, order-theory, lattice-theory, set-theory, finset, multiset, category-theory, homological-algebra, number-theory, combinatorics, graph-theory

DEPTH bank: one-liner, two-step, multi-step, deep-induction, structural-recursion, nested-proof

AUTOMATION bank: omega-solvable, simp-solvable, decide-solvable, norm-num-solvable, linarith-solvable, ring-solvable, field-simp-solvable, aesop-solvable, tauto-solvable, needs-manual-intro, needs-manual-cases, needs-manual-induction, needs-manual-apply, needs-manual-exact

CONTEXT bank: hypothesis-free, hypothesis-light, hypothesis-rich, uses-local-def, uses-instance, uses-classical, uses-choice

DECOMPOSITION bank: single-goal, needs-cases, needs-induction, needs-have-chain, needs-constructor, multi-branch, needs-rcases, needs-obtain

Cross-cutting anchors (mathematical properties, not bank-specific): monotonicity, commutativity, associativity, distributivity, idempotency, transitivity, reflexivity, symmetry, antisymmetry, injectivity, surjectivity, bijectivity, continuity, differentiability, integrability, compactness, connectedness, finiteness, countability, convergence, boundedness, well-foundedness, decidability, cancellation, absorption

Known anchor gaps (to be addressed in bootstrap):

  • Type coercion: needs-cast, nat-int-coercion, subtype-coercion, coe-simp — Mathlib proofs frequently struggle with coercion between numeric types
  • Algebraic hierarchy: monoid-level, group-level, ring-level, field-level — position in the typeclass hierarchy
  • Proof patterns: split-and-recombine, contrapositive-argument, epsilon-delta, sequence-limit, diagonal-argument — recurring structural patterns beyond individual tactics

Anchor lifecycle: 1. Bootstrap: ~300 anchors from the lists above, plus known gap anchors. 2. Deterministic extraction: Type analysis and namespace matching assign anchors with confidence 1.0. 3. Pattern matching: Proof text analysis assigns anchors with confidence 0.8. 4. Discovery: New anchors may be minted when the extraction pipeline encounters unclassifiable patterns. These start at confidence 0.5 and are validated by checking co-occurrence with existing anchors. 5. Gap analysis (iterative, Phase 0-1): For 500 random proof steps, attempt resolution with current anchors. For each failure (correct premise not in top-16), identify what anchors would have connected goal to premise. Cluster gaps by theme, add as new anchors, re-run. Iterate until top-16 recall exceeds 70% on the sample. This is the highest-leverage Phase 0 activity.

IDF computation: idf(anchor) = log(N / count_entities_with_anchor). Cached in anchor_idf table. Invalidated and recomputed after any batch update to entity-anchor links.

Explicit relationships between mathematical entities:

RelationWeightExtraction Source
depends_on0.9Proof references this lemma
same_namespace0.7Share Mathlib namespace (e.g., Nat.)
same_declaration_block0.8Adjacent in source file
commonly_co_occurs0.6Frequently used together in proofs
tactic_precedes0.5Tactic T commonly follows tactic S
generalization_of0.85One lemma generalizes another

Links enable the spreading activation channel. When a proof has already used lemma A, activation spreads through depends_on and commonly_co_occurs links to find related lemmas.

2.4 Tactic Entities

Tactics are entities in the proof network, not just string labels. Each tactic has bank positions and anchors:

TacticSTRUCTUREAUTOMATIONAnchors
omega(-1, 2)(-1, 2)omega-solvable, nat-arithmetic, int-arithmetic, decidable
simp(-1, 1)(-1, 1)simp-solvable, rewriting, simplification
intro(0, 0)(+1, 1)needs-manual-intro, universal-quantifier, implication
cases(+1, 1)(+1, 1)needs-cases, multi-branch, inductive-type
induction(+1, 2)(+1, 2)needs-induction, structural-recursion, nat-arithmetic
apply(0, 0)(+1, 1)needs-manual-apply, implication
exact(-1, 1)(+1, 1)needs-manual-exact, one-liner
have(+1, 1)(+1, 1)needs-have-chain, multi-step, decomposition
rw(0, 0)(0, 0)rewriting, equality
linarith(-1, 2)(-1, 2)linarith-solvable, inequality, arithmetic

When the Proof Navigator produces navigational coordinates, navigate() over the tactic entity table finds the tactics whose bank positions align with the predicted direction.

External validation: This two-level resolution (coarse tactic category → fine argument selection) is validated by TacticToe's finding that argument prediction for abstracted tactics has the highest impact on proof success. The hard part isn't selecting apply vs cases — it's selecting which lemma to apply. Navigation handles the coarse level; the proof network's anchor-based premise retrieval handles the fine level. See WAYFINDER_RESEARCH.md Section 2.5 for the activation steering evidence that further supports this decomposition.



Sources: docs/WAYFINDER_DESIGN.md

See Also