Large Language Models (LLMs) exhibit persistent logical failures in complex reasoning due to the lack of an internal axiomatic framework [1] . We propose Mathesis, a neuro-symbolic architecture that encodes mathematical states as higher-order hypergraphs and uses a Symbolic Reasoning Kernel (SRK)-a differentiable logic engine that maps constraints to a continuous energy landscape. By defining a global energy function E(G), where zero energy implies logical consistency, the SRK yields gradient-based signals to train a Hypergraph Transformer Brain, turning proof search into energy minimization. Multi-step deduction is enabled via Monte Carlo Tree Search and Evolutionary Proof Search, guided by learned value functions and semantic unification.
Large language models (LLMs) achieve strong performance on linguistic tasks and code generation by modeling the statistical distribution of natural language [2]. However, they exhibit systematic failures in formal mathematical reasoning, often generating steps that violate basic axioms-so-called "hallucinations" [1]. This stems from the probabilistic nature of transformer architectures, which lack mechanisms for logical verification or enforcement of semantic constraints. Although chain-of-thought (CoT) prompting induces intermediate reasoning steps, it does not ensure their logical validity: the underlying process remains high-dimensional sequence prediction, not symbolic derivation [3].
Neuro-symbolic architectures aim to combine neural pattern recognition with symbolic rigor. For example, AlphaGeometry solves Olympiad-level geometry problems by coupling a generative model with a symbolic deduction engine [4]. Yet conventional neuro-symbolic systems typically employ non-differentiable solvers that act as black boxes, yielding only sparse binary feedback (e.g., “proof valid/invalid”). Without gradient signals from the symbolic component, the neural module cannot be trained directly to satisfy logical constraints. Prior efforts toward differentiable logic-such as tensor programs or neural logic machines-struggle to scale beyond small, finite domains due to the unbounded search space of mathematics [5].
We introduce Mathesis, a new architecture that overcomes gradient sparsity through a symbolic reasoning kernel (SRK). The SRK acts as a differentiable “physics engine” for logic: it embeds mathematical hypergraphs into a continuous energy landscape where logical consistency corresponds to a zero-energy state. This yields dense, gradient-based feedback for a Hypergraph Transformer Brain, steering its generative policy toward axiom-compliant derivations. Unlike prior approaches, Mathesis encodes mathematical states as higher-order hypergraphs (Section 4), capturing multi-arity relations and nested logical connectives with high fidelity. The system integrates this neuro-symbolic core with structured search strategies-including Monte Carlo tree search (MCTS) and evolutionary proof search (EPS)-to enable deliberate, “System 2” reasoning (Section 6).
To facilitate rigorous neuro-symbolic reasoning, we formalize the mathematical workspace as a structured, higher-order heterogeneous hypergraph. This representation distinguishes between syntactic construction (terms) and semantic truth (facts), and explicitly handles nested logical structures and variable quantification scopes [6].
We define the state of a proof as a tuple tracking structure, truth status, and variable binding scopes.
A mathematical state is a tuple S = (G, F), where G = (V, E) is a directed higher-order hypergraph.
V is the set of nodes, representing mathematical terms (e.g., variables x, constants 0, compound terms x + y).
E is the set of hyperedges, representing relations, operations, and logical connectives.
• To support nested logic (e.g., (A ∧ B) =⇒ C), we adopt a higherorder definition: a hyperedge e ∈ E is an ordered sequence of elements from V ∪ E. That is, an edge can connect nodes or other edges.This structure is essential for capturing the compositional nature of complex logical formulas, a challenge also addressed in modern knowledge hypergraph reasoning [7].
- F ⊆ E is the set of Facts. This is a distinguished subset of hyperedges representing assertions currently held to be true within the global context (e.g., axioms, premises, and derived theorems).
Typing System: We define type mappings ϕ V : V → T V and ϕ E : E → T E to enforce semantic consistency.
• Node Types (T V ): {Variable, Constant, CompoundTerm}.
• Hyperedge Types (T E ): We distinguish three semantic categories:
-Constructors (T Con ): Functional operations that define a term. Inputs are drawn from V , and the output maps to a unique
-Predicates (T P red ): Atomic logical assertions. (e.g., Equals(v a , v b ), Parallel(l 1 , l 2 )).
-Connectives (T Conn ): Higher-order logical operators taking edges as inputs. (e.g., Implies(e premise , e conclusion ), And(e 1 , e 2 )).
Quantification and Scoping: To handle quantification (∀, ∃), we introduce Scope Attributes on hyperedges. A quantified statement is represented by a hyperedge e quant of type ForAll or Exists.
• e quant = (V bound , e body )
• V bound ⊂ V : The set of variables bound by this quantifier.
• e body ∈ E: The logical formula (edge) being quantified.
Example: The statement “∀x, (x = x)” is represented by: 1. Term: Node v x (Type: Variable).
Predicate: Edge e eq = (v x , v x ) (Type: Equals).
Quantification: Edge e root = ({v x }, e eq ) (Type: ForAll).
Fact Status: e root ∈ F. Note that e eq is not in F independently; it is only true within the context of the quantifier.
We frame Automated Theorem Proving (ATP) as a search for a valid derivation path that adds the goal stat
This content is AI-processed based on open access ArXiv data.