01.The Translation Paradoxmajoropen▾
Informal mathematics relies on high-context ambiguity; formal mathematics requires zero-ambiguity type precision. Type inference fails because natural language omits declarations.
Human writers omit type declarations — "Let n be a number" implies n ∈ ℕ or n ∈ ℝ depending on context.
Model infers types stochastically, guessing Float for reals or Int for naturals, causing cascading type mismatch errors.
(x : Float) … → Compiler Error: expected Set ℝ, got Float.01.Dimensional Misalignmentmajoropen▾
Models correctly identify algebraic structures but fail to capture dimensionality, producing valid code that formalizes a different problem.
Notation ℚ[x,y] vs ℚ[x] provides minimal visual distinction but a critical semantic difference.
Model defaults to univariate Polynomial instead of MvPolynomial (Fin 2), treating Y as a constant coefficient.
Irreducible (X²+Y²−1 : Polynomial ℚ) := sorry02.Semantic Drift — Tautology Trapmajoropen▾
Models optimize for token likelihood, generating valid code that is mathematically trivial compared to the intended statement.
Complex geometric predicates are statistically rare; simple tautologies are statistically common in training data.
Model rewrites "Show four points form a convex quadrilateral" as "The convex hull of four points is convex" — a tautology.
"Show that the convex hull of four points is convex." (tautology)03.Implicit Knowledge Gapstructuralopen▾
Human textbooks omit steps "obvious" to readers that formal compilers strictly require, causing missing type class instance errors.
Proofs say "it is trivial to see that X is a group" — the implicit prerequisite is absent from the source text.
Lean 4 rejects code because a type class instance is missing from the dependency graph.
failed to synthesize instance Group X01.Definition Bottleneckstructuralpartial▾
Fundamental conflict between the imperative flow of natural language and the declarative structure of dependent type theory.
Informal math defines objects linearly. Formal systems require dependency injection or hypothesis lifting.
Models insert let bindings into theorem signatures, syntactically illegal in Lean 4.
Syntax error: expected ';' or unexpected token 'let'.01.Library Hallucinationmajorpartial▾
Models hallucinate Mathlib predicates that don't exist, causing identifier errors despite correct mathematical reasoning.
Lean's Mathlib is volatile and lacks formalized versions of many advanced concepts.
Model invents predicates like IsIcosahedronSet, UniformDistrib, IsNormal — all absent from Mathlib.
unknown identifier 'IsNormal'01.Errors in Validation Pipelinestructuralpartial▾
Both syntactic and semantic validation checks can be inaccurate, producing false positives or negatives. Real accuracy is typically lower than reported.
Semantic evaluation pipelines often contain LLMs which can be unexpectedly unreliable as verifiers.
False positive and false negative errors occur, inflating reported benchmark performance.
Incorrect "Pass" verdict on semantically wrong formalization.01.Translation of Complex Logicminoropen▾
Competition-level problems contain bidirectional logic that models simplify, losing necessary converse directions.
Informal math contains iff constraints the LLM understands but encodes only partially.
Model formalizes only one direction of an iff, producing a strictly weaker statement.
Formalization encodes only the → direction; ← direction is absent.01."The Wild" vs "The Lab"structuralopen▾
Benchmarks like MiniF2F cover well-supported high-school problems. Real-world definitions are far more demanding and context-dependent.
Wild definitions are intricate, context-dependent, and implicit — assuming extensive background theory.
Models tuned on MiniF2F fail on real-world definitions requiring deeply nested formal structures.
Compilation failure or semantically incorrect formalization.