Design | Chapter 2 | Prerequisites: 1-Design-Thesis | 24 min read | Home
2. The Proof Network
The proof network is a SQLite database encoding mathematical structure. It is the architectural equivalent of ModelAtlas's network.db — the precomputed semantic space through which all search navigates.
2.1 Proof Banks
Six orthogonal signed dimensions. Each bank has a zero state at the mode — the most common mathematical situation a working mathematician encounters.
| Bank | Zero State | Negative Direction | Positive Direction | Extraction Source |
| STRUCTURE | equality / rewrite | decidable, arithmetic | quantified, dependent | Goal type analysis |
| DOMAIN | general algebra | concrete (ℕ, ℤ, ℚ) | abstract (topology, category) | Namespace hierarchy |
| DEPTH | 2-3 tactic proof | trivial (1 tactic) | deep (10+ tactics) | Proof length |
| AUTOMATION | partially automated | fully auto (omega, simp) | manual reasoning | Tactic analysis |
| CONTEXT | moderate context (3-5 hyps) | no hypotheses | rich context (10+) | Goal state parsing |
| DECOMPOSITION | single goal | atomic (no splits) | multi-subgoal | Proof structure |
Signed position encoding: Each entity stores (sign, depth) per bank. sign ∈ {-1, 0, +1}, depth ∈ {0, 1, 2, 3}. Signed position = sign × depth.
Why signed hierarchies instead of categories: A categorical "domain" field with values {algebra, topology, analysis} can't express proximity. Signed positions give gradient scoring — a linear algebra lemma (DOMAIN = 0, between concrete and abstract) scores better against an algebra query than a topology lemma, without binary mismatch. Zero placement at the mode means most queries resolve near the origin.
2.2 Anchor Dictionary
A shared vocabulary of semantic labels. Target: ~300 anchors at launch, growing organically as the system discovers mathematical patterns.
Bootstrap anchors (created before training):
STRUCTURE bank: equality, inequality, membership, universal-quantifier, existential-quantifier, iff-statement, implication, negation, dependent-type, pi-type, sigma-type, propositional, data-type, function-type
DOMAIN bank: nat-arithmetic, int-arithmetic, rat-arithmetic, real-analysis, complex-analysis, group-theory, ring-theory, field-theory, module-theory, linear-algebra, topology, metric-space, measure-theory, probability, order-theory, lattice-theory, set-theory, finset, multiset, category-theory, homological-algebra, number-theory, combinatorics, graph-theory
DEPTH bank: one-liner, two-step, multi-step, deep-induction, structural-recursion, nested-proof
AUTOMATION bank: omega-solvable, simp-solvable, decide-solvable, norm-num-solvable, linarith-solvable, ring-solvable, field-simp-solvable, aesop-solvable, tauto-solvable, needs-manual-intro, needs-manual-cases, needs-manual-induction, needs-manual-apply, needs-manual-exact
CONTEXT bank: hypothesis-free, hypothesis-light, hypothesis-rich, uses-local-def, uses-instance, uses-classical, uses-choice
DECOMPOSITION bank: single-goal, needs-cases, needs-induction, needs-have-chain, needs-constructor, multi-branch, needs-rcases, needs-obtain
Cross-cutting anchors (mathematical properties, not bank-specific): monotonicity, commutativity, associativity, distributivity, idempotency, transitivity, reflexivity, symmetry, antisymmetry, injectivity, surjectivity, bijectivity, continuity, differentiability, integrability, compactness, connectedness, finiteness, countability, convergence, boundedness, well-foundedness, decidability, cancellation, absorption
Known anchor gaps (to be addressed in bootstrap):
- Type coercion:
needs-cast,nat-int-coercion,subtype-coercion,coe-simp— Mathlib proofs frequently struggle with coercion between numeric types - Algebraic hierarchy:
monoid-level,group-level,ring-level,field-level— position in the typeclass hierarchy - Proof patterns:
split-and-recombine,contrapositive-argument,epsilon-delta,sequence-limit,diagonal-argument— recurring structural patterns beyond individual tactics
Anchor lifecycle: 1. Bootstrap: ~300 anchors from the lists above, plus known gap anchors. 2. Deterministic extraction: Type analysis and namespace matching assign anchors with confidence 1.0. 3. Pattern matching: Proof text analysis assigns anchors with confidence 0.8. 4. Discovery: New anchors may be minted when the extraction pipeline encounters unclassifiable patterns. These start at confidence 0.5 and are validated by checking co-occurrence with existing anchors. 5. Gap analysis (iterative, Phase 0-1): For 500 random proof steps, attempt resolution with current anchors. For each failure (correct premise not in top-16), identify what anchors would have connected goal to premise. Cluster gaps by theme, add as new anchors, re-run. Iterate until top-16 recall exceeds 70% on the sample. This is the highest-leverage Phase 0 activity.
IDF computation: idf(anchor) = log(N / count_entities_with_anchor). Cached in anchor_idf table. Invalidated and recomputed after any batch update to entity-anchor links.
2.3 Entity Links
Explicit relationships between mathematical entities:
| Relation | Weight | Extraction Source |
depends_on | 0.9 | Proof references this lemma |
same_namespace | 0.7 | Share Mathlib namespace (e.g., Nat.) |
same_declaration_block | 0.8 | Adjacent in source file |
commonly_co_occurs | 0.6 | Frequently used together in proofs |
tactic_precedes | 0.5 | Tactic T commonly follows tactic S |
generalization_of | 0.85 | One lemma generalizes another |
Links enable the spreading activation channel. When a proof has already used lemma A, activation spreads through depends_on and commonly_co_occurs links to find related lemmas.
2.4 Tactic Entities
Tactics are entities in the proof network, not just string labels. Each tactic has bank positions and anchors:
| Tactic | STRUCTURE | AUTOMATION | Anchors |
omega | (-1, 2) | (-1, 2) | omega-solvable, nat-arithmetic, int-arithmetic, decidable |
simp | (-1, 1) | (-1, 1) | simp-solvable, rewriting, simplification |
intro | (0, 0) | (+1, 1) | needs-manual-intro, universal-quantifier, implication |
cases | (+1, 1) | (+1, 1) | needs-cases, multi-branch, inductive-type |
induction | (+1, 2) | (+1, 2) | needs-induction, structural-recursion, nat-arithmetic |
apply | (0, 0) | (+1, 1) | needs-manual-apply, implication |
exact | (-1, 1) | (+1, 1) | needs-manual-exact, one-liner |
have | (+1, 1) | (+1, 1) | needs-have-chain, multi-step, decomposition |
rw | (0, 0) | (0, 0) | rewriting, equality |
linarith | (-1, 2) | (-1, 2) | linarith-solvable, inequality, arithmetic |
When the Proof Navigator produces navigational coordinates, navigate() over the tactic entity table finds the tactics whose bank positions align with the predicted direction.
External validation: This two-level resolution (coarse tactic category → fine argument selection) is validated by TacticToe's finding that argument prediction for abstracted tactics has the highest impact on proof success. The hard part isn't selecting apply vs cases — it's selecting which lemma to apply. Navigation handles the coarse level; the proof network's anchor-based premise retrieval handles the fine level. See WAYFINDER_RESEARCH.md Section 2.5 for the activation steering evidence that further supports this decomposition.
Sources: docs/WAYFINDER_DESIGN.md