Dimensional Type Systems and Deterministic Memory Management: Design-Time Semantic Preservation in Native Compilation

We present a compilation framework in which dimensional type annotations persist through multi-stage MLIR lowering, enabling the compiler to jointly resolve numeric representation selection and deterministic memory management as coeffect properties o…

Authors: Houston Haynes

Dimensional T yp e Systems and Deterministic Memory Managemen t: Design-Time Seman tic Preserv ation in Nativ e Compilation Houston Ha ynes Sp eakEZ T ec hnologies, Asheville, NC hhaynes2@alumni.unca.edu Marc h 2026 Abstract W e presen t a compilation framew ork in which dimensional type annotations persist through m ulti-stage MLIR low ering, enabling the compiler to join tly resolve n umeric repre- sen tation selection and deterministic memory management as coeffect prop erties of a single program semantic graph (PSG). The coupling b et ween these tw o concerns is the central con tribution: dimensional inference determines v alue ranges; v alue ranges determine represen- tation selection; representation selection determines word width and memory fo otprint; and memory fo otprin t, combined with escap e classification, determines allo cation strategy , cache b eha vior, and cross-target transfer fidelity . Each step in this c hain consumes the output of the preceding inference. The Dimensional T yp e System (DTS) extends Hindley–Milner unification with constrain ts dra wn from finitely generated ab elian groups, yielding dimensional inference that is decidable in p olynomial time, complete (no annotations required), and principal. Where con ven tional systems erase dimensional annotations b efore co de generation, DTS carries them as com- pilation metadata through each lo wering stage, making them av ailable at the p oin t where represen tation selection and memory placement decisions o ccur. The dimensional range of computed v alues guides p er-target format choice: p osit arithmetic with tap ered precision on FPGA targets, IEEE 754 on general-purp ose CPUs, or fixed-p oin t on neuromorphic cores. Deterministic Memory Management (DMM), formalized as a co effect discipline within the same graph, unifies escap e analysis and memory placement with the dimensional framework. The escape analysis classifies v alue lifetimes into four categories (stac k-scop ed, closure- captured, return-escaping, byref-escaping), eac h mapping to a sp ecific allo cation strategy v erified at compile time. F or p osit targets, the quire accum ulator’s allo cation, lifetime, and exact accumulation seman tics are resolved as co effect prop erties within the PSG. W e iden tify implications for auto-differentiation: the dimensional algebra is closed under the c hain rule, and forward-mode gradient computation [ 3 ] exhibits a sp ecific co effect signature (no activ ation tap e, O (1) auxiliary memory p er lay er) that the framework can verify . The practical consequence is a developmen t environmen t where escap e diagnostics, allo cation strategy , representation fidelit y , and cache lo cality estimation are design-time views ov er the compilation graph. 1 In tro duction 1.1 Dimensional Annotation Lifetime Con temp orary type systems for n umeric computation differ in ho w long dimensional information remains av ailable during compilation. Systems with dimensional annotations (F#’s Units of Measure [ 13 ], Bo ost.Units in C++ [ 23 ]) discard those annotations b efore co de generation. The dimensional information serves as a compile-time chec k and then v anishes; the emitted co de is dimensionally una ware. W e refer to this as e arly er asur e : the annotations are consumed during type chec king and do not survive to the compilation stages where representation selection and memory placement decisions o ccur. Systems with rich dep endent types (F* [ 26 ], Idris [ 4 ], Agda [ 17 ]) preserv e t yp e-lev el information into generated co de, but at the cost of decidability: t yp e c hecking in the general dep endent case is undecidable, and practical systems rely on SMT solv ers with timeout heuristics. 1 Neither approach satisfies the requiremen ts of systems that interface with physical reality across heterogeneous hardw are targets. A sensor fusion pip eline running on an x86 host, an FPGA accelerator, and a neuromorphic pro cessor needs dimensional constraints that p ersist through compilation long enough to guide memory placemen t and inform cross-target data transfer proto cols. Early erasure discards this information b efore it can b e used. F ull dep endent t yp es pro vide interactiv e developmen t en vironments in practice (Lean 4, Agda, Idris 2), but their t yp e chec king is undecidable in general, and practical systems rely on timeout heuristics and fuel limits for solv er-back ed verification. DTS tak es a middle path: dimensional annotations p ersist as compilation metadata through multi-stage low ering, a v ailable at each stage where they inform decisions, and are dropp ed b efore native co de emission. The annotations do not exist at runtime; there is no reified type information, no typeof , and no runtime dispatc h on dimensions. The distinction from early erasure is annotation lifetime, not reification. DTS’s restriction to decidable algebraic theories (ab elian groups ov er Z , enum sorts, bitvector constrain ts) guarantees bounded-time inference for ev ery query , a prop ert y that simplifies language serv er architecture and enables unconditional resp onse-time guarantees for design-time to oling. The tradeoff is expressiv eness: DTS cannot enco de arbitrary predicates. The decidability guaran tee enables a sp ecific category of design-time feedback: multi-target resolution, memory placemen t analysis, escape diagnostics, and representation fidelity scoring. Enco ding these compilation-in ternal prop erties as t yp es in a dep enden t system w ould imp ose an o verhead that is arc hitecturally unnecessary when the compiler already computes these prop erties during normal elab oration. 1.2 Con tribution This pap er mak es three claims: 1. Dimensional annotations that p ersist through compilation enable join t resolution of representation selection and memory managemen t. This coupling is the central con tribution and the reason DTS and DMM share a pap er. Dimensional inference deter- mines v alue ranges; v alue ranges determine represen tation selection; representation selection determines word width and memory fo otprint; memory fo otprin t, combined with escap e classification, determines allo cation strategy , cache b ehavior, and cross-target transfer fidelit y . These decisions comp ose within the Program Semantic Graph (PSG) as co effect prop erties, and the c hain cannot b e decomp osed without losing information that flo ws b et w een stages. The algebraic foundation is a finitely generated ab elian group o v er Z , which places DTS in a sp ecific formal nic he: decidable in p olynomial time, fully inferrable via extension of Hindley–Milner unification, and preserv able as metadata through multi-stage compilation without altering the generated co de’s op erational seman tics. This nic he is distinct from b oth dep enden t t yp es and parametric p olymorphism (Section 2.4 ). 2. The inference mac hinery derives comp osition-dep enden t prop erties that determine do wnstream compilation decisions. Dimensional annotations can enter the system through multiple paths: Hindley–Milner inference from unannotated source co de (the default), explicit programmer annotation, domain library bindings (e.g., a physics library that pre- p opulates dimensional constrain ts), or external to oling including AI-assisted co de generation. The compilation pip eline’s b ehavior is identical regardless of prov enance. The inference con tribution is not annotation conv enience; it is the deriv ation of prop erties that emerge from constrain t interaction across the program graph. Dimensional range, escap e classification, and represen tation compatibility are comp osition-dep enden t: they cannot b e determined from an y single v alue’s annotation but arise from the in teraction of constraints at function b oundaries, lo op nesting, and cross-module interfaces. These deriv ed prop erties join tly determine represen tation selection, word width, allo cation strategy , and cache b ehavior. 2 3. The unified DTS+DMM graph enables a no v el category of softw are design-time to oling. Because the PSG retains dimensional and memory annotations through compilation, a language serv er can surface the compiler’s internal analysis as in teractiv e design guidance: escap e analysis diagnostics, allo cation promotion w arnings, cache lo cality estimates, and restructuring suggestions. This transforms the compilation graph from a transient build artifact in to a p ersisten t design-time resource. 1.3 Scop e and Con text The system describ ed here is implemented in the Clef programming language and the Fidelity compilation framew ork. Clef is a functional language in the ML family whose primary syntactic and semantic lineage is F#, but sev eral other systems were formative in its design. F* [ 26 ] demonstrated that represen tation width and t yp e identit y could b e treated as indep endent concerns, a separation that directly informed Clef ’s approach to dimensional preserv ation: the t yp e carries the ph ysical seman tics while the represen tation (p osit width, float format, fixed-p oin t configuration) is resolved indep enden tly p er target. F*’s use of SMT-LIB2 [ 2 ] for automated pro of discharge also established the feasibility of in tegrating solver-bac ked verification into an ML-family w orkflow, a pattern that informs the Fidelit y framew ork’s constraint architecture. OCaml’s mo dule system and its approac h to abstract t yp es influenced the design of Clef ’s compilation unit b oundaries. The Fidelit y compiler’s multi-pass arc hitecture draws on the nanopass metho dology [ 22 ], originally dev elop ed in Scheme, which demonstrated that decomp os- ing compilation into many small, indep enden tly v erifiable transformations pro duces compilers that are easier to extend and reason ab out. The Fidelit y framew ork compiles Clef source through a canonical MLIR middle-end (Com- p oser) that fans out to multiple back end pathw a ys: LL VM for CPU, GPU, MCU, and W ebAssem- bly targets; CIRCT for FPGA syn thesis via v endor to olc hains (e.g., Viv ado); and MLIR-AIE for AI Engine arc hitectures. The dimensional and co effect annotations describ ed in this pap er are carried through this fan-out as PSG co data, a v ailable to every lo w ering path. The design-time to oling is provided by Lattice (compiler services and language server proto col implementation) and A telier (in tegrated developmen t environmen t). Throughout this pap er, we use Clef syntax for examples, but the formal prop erties of DTS and DMM are language-indep enden t. The binary PSG describ ed in this pap er is generalized in companion work [ 8 ] to a Program Hyp ergraph (PHG), where the same inference mac hinery extends to grade inference ov er Clifford algebras and co-lo cation constraints for spatial dataflow targets. The PHG introduces k -ary h yp eredges that capture irreducible m ulti-wa y relations, including geometric pro ducts, tile assign- men t constrain ts, and DMA route configurations, that the binary PSG cannot represen t without in tro ducing semantically empty intermediate no des. Where the presen t pap er demonstrates the DTS/DMM coupling for scalar and tensor workloads, the PHG pap er extends the argument to geometric algebra neural net works and physics-a ware computation, with direct implications for the contin uous learning and spatial partitioning applications of forw ard-mo de automatic differen tiation. 2 Dimensional T yp e Systems: F ormal Characterization 2.1 Algebraic F oundation A dimensional type system assigns to each numeric v alue a dimension drawn from a finitely generated free ab elian group. The base dimensions (length, time, mass, temp erature, electric curren t, luminous intensit y , amoun t of substance) generate the group under multiplication, with in teger exp onen ts. F ormally , let D = Z n b e the dimension space, where n is the num b er of base dimensions. 3 Eac h dimension d ∈ D is a vector of integer exp onen ts: d velocity = (1 , − 1 , 0 , 0 , . . . ) (length 1 · time − 1 ) (1) d force = (1 , − 2 , 1 , 0 , . . . ) (length 1 · time − 2 · mass 1 ) (2) Dimensional consistency of an arithmetic expression reduces to linear algebra o ver Z : addition requires op erand dimensions to b e equal; multiplication adds exp onen t vectors; division subtracts them; exp onen tiation scales them. These op erations are closed in Z n and decidable in O ( n ) p er op eration. This is the critical distinction from dep endent types. A dep endent type can enco de an arbitrary predicate o v er v alues. Checking whether t wo dep enden t types are equal ma y require pro ving an arbitrary theorem. Dimensional consistency c hec king requires comparing t w o integer v ectors, a constant-time op eration p er base dimension. 2.2 Inference via Extended Hindley–Milner Unification F#’s Units of Measure system [ 13 ] demonstrated that dimensional constrain ts integrate naturally with Hindley–Milner t yp e inference. The extension is direct: t yp e v ariables carry an asso ciated dimension v ariable; unification of type v ariables propagates to unification of dimension v ariables; dimension unification reduces to solving systems of linear equations o ver Z . The inference algorithm pro ceeds as follo ws: 1. Constrain t generation. Eac h arithmetic op eration generates a dimensional constraint. A ddition of a + b generates d ( a ) = d ( b ) . Multiplication of a * b generates d ( result ) = d ( a ) + d ( b ) . 2. Unification. Dimensional constraints form a system of linear equations ov er Z n . The system is solv ed by Gaussian elimination ov er Z , yielding either a unique solution, a parametric family of solutions (dimensional p olymorphism), or no solution (dimensional inconsistency). 3. Generalization. Unsolv ed dimension v ariables in a function’s type are generalized to dimension parameters, pro ducing dimensionally p olymorphic functions. A function let scale factor value = factor * value infers t yp e float<’d1> -> float<’d2> -> float<’d1 * ’d2> without any annotation. The inference is complete (every dimensionally consisten t program can b e t yp ed without annotation), principal (the inferred type is the most general), and decidable (the constraint system is finite and the solution algorithm terminates). These properties are shared with standard Hindley–Milner inference and are not shared with dep endent type inference in general. Annotation pro v enance and comp osition-dep enden t prop erties. Dimensional annota- tions can en ter the system through sev eral paths: HM inference from unannotated source (the default describ ed ab o v e), explicit programmer annotation, domain library bindings that pre- p opulate dimensional constrain ts for sp ecific fields (e.g., a planned Fidelity.Physics library), or external to oling including AI-assisted co de generation. The compilation pip eline treats all annotations iden tically regardless of pro venance; the downstream representation selection and memory managemen t decisions are the same whether a dimension was inferred or declared. The inference mac hinery’s contribution extends b ey ond annotation conv enience. The chain from dimensional constraint through range analysis, representation selection, w ord width, and cache b eha vior pro duces c omp osition-dep endent pr op erties : properties that emerge from constrain t interaction across the program graph and cannot b e determined from an y individual v alue’s annotation. A function that multiplies a mass by an acceleration inherits a force dimension, and the range of the result is b ounded by the pro duct of the input ranges, which in 4 turn constrains the p osit or IEEE 754 format that the compiler selects. These derived ranges, escap e classifications, and representation compatibilities propagate through function b oundaries, lo op nesting, and cross-mo dule interfaces. Companion work on the Program Hyp ergraph [ 8 ] demonstrates a concrete case: grade inference in Clifford algebra, using the same constraint mac hinery , identifies that approximately 95% of the Cayley table entr ies are structurally zero for t ypical grade combinations in 3D Pro jectiv e Geometric Algebra, producing a 20 × co de generation impro vemen t that no p er-v alue annotation could provide. 2.3 Preserv ation Through Multi-Stage Compilation The defining prop ert y of DTS, as distinct from F#’s Units of Measure, is that dimensional annotations p ersist through compilation. In F#, units are discarded during IL generation; a float b ecomes a float64 in the emitted Common Intermediate Language. This is early erasure: dimensions serve as compile-time chec ks and are then discarded b efore the compilation stages where they could inform represen tation selection or memory placement. In DTS, dimensions are carried as attributes through the compilation pip eline: Stage 1: Source → T yp ed AST. Dimensional inference pro duces a fully annotated AST where ev ery numeric expression carries its resolved dimension. Stage 2: Typed AST → PSG. The Program Seman tic Graph preserv es dimensional annotations as no de attributes. The PSG is the central data structure for b oth compilation and design-time services; dimensional information in the PSG is accessible to the language server for design-time resolution displa y . Stage 3: PSG → MLIR. The PSG carries dimensional annotations and co effects as co data: no de-lev el attributes computed during PSG elab oration and saturation, then observ ed during the co de generation tra versal. Alex (the compiler targeting lay er) trav erses the enriched PSG via a zipp er and emits MLIR using p ortable dialects (memref, arith, func). The dimensional and co effect information resides in the PSG as co data; it guides the MLIR emission but is not itself enco ded into the MLIR represen tation. Stage 4: MLIR → T arget-sp ecific lo w ering. The MLIR emitted in Stage 3 fans out to bac k end-sp ecific low ering pip elines: the LL VM dialect for CPU, GPU, MCU, and W ebAssembly targets; CIR CT dialects for FPGA syn thesis; or MLIR-AIE dialects for AI Engine architectures. By this p oint, the dimensional and co effect co data from the PSG has already guided represen tation selection: a float ma y hav e b een lo w ered to float64 on x86 via the LL VM back end, posit<32,2> on an FPGA target via CIR CT, or fixed<24,signed> on a neuromorphic core. Stage 5: T arget dialect → Mac hine co de. At the final low ering stage, dime nsional attributes are no longer needed for co de generation and are low ered to debug metadata (DW ARF annotations on x86, equiv alen t metadata on other targets). The dimensions do not affect the op erational semantics of the generated co de; they are metadata that can be consumed b y debuggers, profilers, and p ost-mortem analysis to ols. This preserv ation mo del has a sp ecific prop ert y: dimensions never influenc e c ontr ol flow or data layout in a way that c ould c ause diver genc e b etwe en a dimensione d and undimensione d c ompilation of the same pr o gr am. The generated instructions are iden tical; only the metadata and target-sp ecific numeric representation selections differ. This is w eaker than full dep enden t t yp e preserv ation (where t yp e information can affect runtime b eha vior) but stronger than early erasure (where dimensional information is discarded b efore the compilation stages where it could inform represen tation and memory decisions). 5 2.4 DTS is Not Dep enden t T yping The relationship b et ween DTS and dep enden t type systems warran ts careful delineation, as imprecise classification would p osition DTS as a restricted dep enden t t yp e system. This misc haracterizes the algebraic structure. T able 1: Comparison of DTS and dep enden t type systems. Prop ert y DTS Dep enden t Types T yp e chec king Decidable (linear algebra ov er Z ) Undecidable in general Inference Complete and principal Incomplete; requires annotations R untime representation No runtime cost; metadata only Ma y require runtime witnesses Expressiv eness Ab elian group constraints on n umeric t yp es Arbitrary predicates ov er v alues Pro of obligations None (consistency is syntactic) Ma y require interactiv e pro of Compilation mo del A ttributes that guide co de generation T yp es that participate in co de generation A dep enden t t yp e system can enco de dimensional constraints (one can define Vector (n : Nat) in Idris and enforce length-indexed op erations). But the enco ding uses the full p o w er of dep enden t t yp es to express a constrain t that DTS captures with a restricted algebraic structure. The restriction is not a limitation; it is the source of the decidability , completeness, and inference prop erties that make DTS practical for in teractiv e design-time to oling. The analogy is to regular expressions and context-free grammars. Regular expressions are not “restricted CF Gs”; they are a distinct formal class with distinct closure prop erties, distinct recognition algorithms, and distinct practical applications. DTS o ccupies an analogous p osition relativ e to dep endent types: a distinct formal class that happens to o verlap in expressive p o w er for a sp ecific domain (dimensional constraints on n umeric v alues) but differs in every computational prop ert y that matters for practical to oling. 2.5 Extension: Memory Dimensions The DTS framework extends naturally b ey ond ph ysical units. An y constraint domain that forms a finitely generated ab elian group can b e enco ded as a dimension. Memory space iden tifiers (stac k, arena, heap, sp ecific hardw are memory regions) form such a group under a trivial multiplication (the “pro duct” of t w o memory spaces is the pair of constraints that b oth must b e satisfied). More precisely , memory dimensions do not form an ab elian group under the same arithmetic as ph ysical dimensions. They form an enumeration sort in the SMT sense: a finite set of v alues with equality but no arithmetic. The dimensional algebra handles this b y assigning memory dimensions to a separate sort within the constraint system. Ph ysical dimensions are solved by Gaussian elimination ov er Z ; memory dimensions are solved b y equalit y unification ov er a finite domain. Both are decidable and b oth participate in the same inference pass. This is the bridge to DMM. Memory placemen t is a dimensional constraint solved by the same machinery that solves ph ysical unit constraints. The unification of these tw o constraint domains within a single inference framework is the formal basis for the design-time to oling describ ed in S ection 4 . 2.6 Represen tation Selection as a Dimensional F unction The p ersistence of dimensional annotations through compilation creates a capability that early- erasure systems cannot provide: the compiler can select numeric representations based on the dimensional domain of the v alues b eing computed. IEEE 754 distributes precision uniformly across its represen table range. A float64 allo cates the same n umber of man tissa bits to v alues near 1.0 as to v alues near 10 300 . F or computations whose v alues span a narrow dimensional range (gravitational forces b et ween 10 − 11 and 10 30 6 newtons, mem brane p oten tials b et w een − 80 and +40 millivolts, sensor readings b etw een 0 and 100 celsius), the ma jorit y of IEEE 754’s precision budget is allo cated to ranges that the computation will nev er visit. Gustafson’s p osit arithmetic [ 7 , 6 ] mak es a differen t allo cation. P osits use tap er e d pr e cision : a v ariable-length regime field concentrates man tissa bits near 1.0 and reduces precision at extremes. The P osit Standard (2022) [ 19 ] standardized the exp onent size ( es = 2 ) across all bit widths, enabling trivial con v ersion b et ween precisions by app ending or rounding bits. Recen t work on b ounded p osits (b-p osits) [ 9 ] constrains the regime field to a fixed maximum size ( rs ≤ 6 ), which b ounds the regime to b etw een 2 and 6 bits. This constrain t enables deco der implementation via simple m ultiplexers, achieving 79% less p o wer, 71% smaller area, and 60% reduced latency compared to standard p osit deco ders, while matc hing or exceeding IEEE-compliant float32 hardw are p erformance. A further consequence of the b ounded regime is hardware reuse across precisions: with rs = 6 , the maximum non-fraction field width is 1 + rs + es bits, whic h is iden tical for 16-bit, 32-bit, and 64-bit op erands. IEEE 754 cannot share deco de hardware across precisions b ecause the exp onen t field width and bias c hange with format. The b-p osit design eliminates this obstacle. DTS provides the formal mechanism for what p osit arithmetic presupp oses: knowledge of whic h v alue ranges matter for a given computation. The dimensional annotation on a v alue constrains its semantic range. The compiler can ev aluate h o w differen t representations distribute precision across that range and select the one that minimizes w orst-case relative error. F ormally , giv en a v alue v with dimension d and a v alue range [ a, b ] (inferred from dimensional constrain ts, domain annotations, or platform binding sp ecifications), and a set of av ailable represen tations R = { r 1 , . . . , r k } on target T , the compiler selects: r ∗ = arg min r ∈ R max x ∈ [ a,b ] | x − round r ( x ) | | x | (3) F or IEEE 754, the w orst-case relativ e error is approximately 2 − p (where p is the mantissa width) uniformly across the representable range. F or p osits with es = 2 , the worst-case relative error is minimal near 1.0 and increases tow ard the regime extremes. The dimensional range [ a, b ] determines whic h distribution is preferable. Represen tation selection is a deterministic function from dimensional constrain ts and target capabilities. The function is computable at compile time; its inputs are prop erties of the PSG (dimensional annotations and platform bindings), and its output is a co de generation decision that the language serv er can surface at design time: force: float Dimensional range: [1e-11, 1e30] (from gravitational constant and stellar masses) +-- x86_64: float64 (worst-case relative error: 1.11e-16, uniform) +-- xilinx: posit<32, es=2> (worst-case relative error: 2.3e-8 at range extremes, | 1.5e-9 near 1.0) +-- Note: posit provides 10x better precision in [0.01, 100] subrange where 94% of computed forces reside The design en vironment shows which representation was selected and why : the dimensional range, the precision distribution of each candidate representation, and the o verlap b et ween the precision “sw eet sp ot” and the actual v alue distribution. This capability is bidirectional. If the engineer sp ecifies a p osit representation explicitly (b ecause the computation b enefits from tap ered precision), the dimensional constraints can verify that the p osit’s dynamic range encompasses the exp ected v alue range. F or p osit32 with es = 2 , the representable range is approximately [10 − 36 , 10 36 ] . If the dimensional range exceeds this, the compiler emits a diagnostic: Warning: posit<32, es=2> dynamic range [1e-36, 1e36] does not cover full dimensional range [1e-11, 1e72] of astronomicalDistance Consider: float64 (covers full range) or scaling to AU (fits posit range) 7 The suggestion to scale to astronomical units is itself a dimensional op eration: the compiler kno ws that 1 A U ≈ 1 . 5 × 10 11 meters, and that re-dimensioning the computation in AU brings the v alue range closer to p osit32’s representable b ounds. This guidance is p ossible only b ecause the dimension surviv es to the p oin t where representation selection o ccurs. 3 Deterministic Memory Managemen t as Co effect Discipline 3.1 Co effects and Con textual Prop erties Effects describ e what a computation do es to its en vironment (m utation, I/O, exceptions). Co effects describ e what a computation requires from its environmen t (capabilities, resources, con textual assumptions) [ 18 ]. Memory allo cation strategy is a co effect: a function that allo cates from an arena requires that an arena exists in its calling context; a function that places v alues on the stac k requires that the stack frame outlives those v alues. In the Clef/Fidelity framework, co effects are track ed in the PSG as annotations on computa- tion no des. The co effect system handles three categories: Allo cation co effects. Where do es a v alue’s storage come from? Stac k frame, arena, reference- coun ted heap, static memory , hardware-specific region (FPGA BRAM, neuromorphic neuron state memory). Lifetime co effects. Ho w long do es a v alue p ersist? Lexical scop e (stac k), arena scop e (freed when arena is released), ownership-based (freed when last reference drops), static (program lifetime). Capabilit y co effects. What do es the computation require from its con text? Mutable access, target-sp ecific hardware features, dimensional consistency of inputs. 3.2 Escap e Analysis as Co effect Propagation Classical escap e analysis determines whether a v alue outlives its creating scop e. In most compilers, this is a binary classification (escap es or do es not) used to decide b etw een stac k and heap allo cation. The analysis runs during optimization, is opaque to the soft ware engineer, and pro duces no design-time feedback. Ownership-based systems suc h as Rust [ 10 ] brought lifetime verification to the surface as a compile-time discipline, requiring the engineer to annotate lifetimes at function b oundaries; the compiler then accepts or rejects the program based on those annotations. The co effect mo del describ ed here pursues the same goal of static lifetime v erification, with a different annotation strategy and a different resp onse to violations. In the co effect mo del, escap e analysis is a propagation of lifetime constraints through the PSG. When a v alue is created, it receiv es a tentativ e lifetime co efficien t (t ypically the lexical scop e of its binding). When the v alue is used, the usage imp oses a lifetime requirement (the v alue m ust liv e at least as long as the usage site’s scop e). If the usage’s required lifetime exceeds the v alue’s tentativ e lifetime, the v alue’s lifetime is promoted. The promotion is recorded in the PSG as a co effect annotation, a visible, na vigable prop erty of the graph. The language serv er can rep ort: “this v alue w as created with stack-eligible lifetime but promoted to arena allo cation b ecause it escap es via the return path at line 42. ” The formal rule: If λ required ( v , use i ) > λ tentativ e ( v ) for an y use i, then λ ( v ) := max i  λ required ( v , use i )  (4) where λ denotes the lifetime ordering: stac k < arena < heap < static. 8 3.2.1 Escap e Classi fication The binary escap es/does-not-escap e model discards information. A v alue that escap es via closure capture has differen t allo cation requiremen ts than one that escap es via return v alue or b yref parameter. The co effect system classifies escap e b ehavior into a discriminated union that preserv es this information: Escap eKind( v ) ∈ { Stac kScop ed , ClosureCapture ( t ) , ReturnEscap e , ByRefEscap e } (5) where t iden tifies the closure no de that captures v . Eac h classification maps to a specific allo cation strategy and lifetime b ound: T able 2: Escap e classification and allo cation strategy mapping. Escap e Classification Allo cation Strategy Lifetime Bound Diagnostic Stac kScop ed Stac k ( memref.alloca ) Lexical scop e None (optimal) ClosureCapture ( t ) Arena (closure env.) Lifetime of closure t “Captured by closure at line n ” ReturnEscap e Arena (caller’s scop e) Caller’s scop e “Escapes via return path” ByRefEscap e Arena (param. origin) Origin scop e of ref. “Escap es via b yref parameter” The classification is computed during PSG elab oration, b efore the trav ersal that generates MLIR. This ordering is critical: the PSG’s zipp er-based trav ersal witnesses escap e annotations that were resolved during elab oration; it do es not compute them during emission. The trav ersal is purely navigational; all allo cation decisions are prop erties of the graph, not decisions made during co de generation. 1 The classification in teracts with the lifetime ordering. A ClosureCapture ( t ) escap e imp oses the constraint λ ( v ) ≥ λ ( t ) : the captured v alue m ust live at least as long as the closure that captures it. If the closure itself escap es (is returned, stored in a data structure, passed to another function), the constrain t propagates transitiv ely . The PSG records the full escap e chain, enabling the language serv er to display the transitive reason for a promotion: “this v alue w as promoted to arena b ecause it is captured b y a closure that is returned from the enclosing function. ” 3.2.2 Comp ositional Allo cation Resolution The escap e classification determines allo cation strategy , but the resolution must comp ose across function b oundaries without requiring source-lev el duplication. A function that op erates on a Span should w ork identically whether the span is stac k-allo cated, arena-allo cated, or bac ked by a hardware memory region. The comp ositional principle: allocation strategy is resolved at the p oin t of use b y detecting the t yp e’s memory representation and comp osing the appropriate access op erations. When the compiler encoun ters a mutable v ariable reference where a v alue is exp ected, it comp oses a load op eration transparently: resolv e( v ) = ( v if τ ( v ) is a v alue type load( v ) if τ ( v ) = MemRef ( τ ′ ) (6) This is the lv alue/rv alue distinction expressed as a t yp e-driv en transformation. The resolution is computed from the t yp e, not from parameter threading, preserving the monadic comp osition of 1 This separation has a practical consequence for the inline k eyword. When a function allo cates on the stack and returns a p ointer, the p ointer b ecomes inv alid when the function returns. Marking the function inline causes the compiler to expand the function b ody at the call site, lifting the allo cation to the caller’s frame. This is escap e analysis b y annotation: the inline k eyword asserts that the function should not create a distinct stack frame, and the compiler verifies that the inlined allo cation do es not escap e the caller. The co effect system records this as a mandatory inline constrain t, distinct from p erformance-motiv ated inlining, which the compiler defers to the MLIR optimization pip eline where full program con text is av ailable. 9 the compilation pip eline. Eac h compilation phase remains a pure transformation from annotated graph to annotated graph; no phase carries hidden state ab out which v alues ha v e b een loaded and whic h hav e not. 3.3 The Push, Bounded, and P oll Mo dels of Co effect Sp ecification Dev elop ers interact with the co effect system through three mo dels that form a sp ectrum analogous to t yp e annotation in ML-family languages. The parallel is direct: t yp e inference transformed programming from ceremon y to expression b y letting the compiler determine what it could from con text. Lifetime inference follo ws the same principle. Push mo del (explicit declaration). The engineer annotates a function with explicit co effect constrain ts: let processReadings [] [] (sensors: Span< float >) : ProcessedData = // ... The compiler propagates these constraints forw ard through the function b o dy . Ev ery v alue in the b o dy inherits the target and memory constrain ts from the declaration. Inference resolv es the remaining details (sp ecific register allo cation, BRAM placemen t on FPGA, cache line alignment) within the declared constrain ts. The PSG reaches saturation quickly b ecause the engineer has pro vided sufficient b oundary conditions for the inference to con verge without ambiguit y . Bounded mo del (scop ed inference). The engineer pro vides scop e b oundaries; the compiler infers within those b ounds: let processReadings () = arena { let ! readings = readSensors () let summary = summarize readings return (readings, summary) } The computation expression marks the lifetime b oundary . The let! syn tax signals allo cation from the arena. The compiler handles parameter threading, reference passing, and cleanup. The source sp ecifies wher e inference should op erate (within the arena scop e); the compiler determines how v alues are allo cated and when they are released. This is analogous to annotating function signatures while lea ving lo cal bindings inferred, a common pattern in ML-family languages. P oll mo del (full inference). The engineer writes without co effect annotations: let processReadings sensors = // ... The compiler infers co effects from usage context. If the function is called from three sites with different target configurations, the inference engine unifies across all call sites, propagating constrain ts backw ard to determine the function’s co effect requirements. The function ev entually reac hes the same saturated state, but the path is longer and the result ma y b e context-dependent: the function ma y resolve differently dep ending on whic h call site is considered. The three mo dels correspon d to a sp ectrum of inference scop e: No mo del is incorrect. The push mo del pro duces PSG no des that saturate faster, remain stable under dep endency changes, and display unambiguous resolution in the design-time to oling. 10 T able 3: Push, Bounded, and Poll mo dels of co effect sp ecification. Mo del Type analogy Dev elop er provides Compiler infers PSG saturation Push let x: int = 5 F ull coeffect constrain ts In ternal details Immediate Bounded let f (x: int) = ... Scope b oundaries Allo cation within scop e F ast P oll let x = 5 Nothing All co effects from con- text Con text-dep enden t The b ounded mo del offers a middle ground with mo dest annotation cost and fast conv ergence. The p oll mo del imp oses no annotation burden but pro duces no des whose saturation dep ends on external con text. The design-time to oling exploits these differences to provide “pit of success” guidance. When a function’s co effect resolution v aries across call sites, the language serv er displays the v ariation and suggests either a b ounded scop e (computation expression) or an explicit annotation. The engineer is not comp elled to annotate; the to oling sho ws the consequences of not annotating. It rew ards more explicit mo dels with cleaner, more stable resolution display , creating a natural gradien t tow ard explicit co effect sp ecification for functions where it matters. 3.4 Escap e-Driv en Restructuring Guidance The most concrete instance of design-time co effect guidance is escap e-driven memory promotion. When the compiler determines that a stac k-eligible v alue must b e promoted to arena allo cation due to an escap e path, the language serv er can analyze the escap e path and prop ose structural alternativ es: Caller-pro vided buffer. The escap e o ccurs b ecause the function allo cates internally and returns the result. If the caller pro vides the destination buffer, the v alue never escap es the callee’s frame. The function signature changes from pro ducing a v alue to filling a caller-owned buffer. Con tin uation-passing st yle. If the caller needs only transient access to the v alue, the function can accept a con tin uation that consumes the v alue within the callee’s frame. The v alue nev er escap es; stac k allo cation is preserv ed. Explicit promotion. If the intended design calls for the v alue to outlive the callee’s frame (b ecause it will b e shared across subsystems or stored in a long-liv ed data structure), the allo cation strategy is annotated explicitly . The promotion still o ccurs, but it is declared inten t, v erified by the compiler. Eac h alternativ e is a concrete refactoring with quantifiable consequences: the caller-pro vided buffer eliminates allo cation entirely; the contin uation preserv es stack lo cality (and b y extension, cac he residency); the explicit annotation do cuments in tent and stabilizes the PSG against future c hanges. In an ownership-based system, the same escap e w ould pro duce a rejection; the engineer m ust diagnose the escap e path and arrive at one of these restructurings indep enden tly . The co effect mo del surfaces the diagnosis and the alternatives together. The restructuring guidance is generated from the same PSG that p erforms dimensional inference. The escap e path is a chain of edges in the graph; the lifetime promotion is a co effect annotation on those edges; the alternative restructurings are graph transformations that the compiler can preview b efore the engineer accepts them. There is no separate analysis to ol; the compilation graph is the analysis to ol. 11 3.5 The Quire as Co effect Case Study The p osit quire accum ulator provides a concrete illustration of how DTS and DMM con v erge on a single construct. A quire is a fixed-width exact accumulator that holds intermediate results of multiply-add op erations without rounding; rounding o ccurs once, when the final result is con verted bac k to a p osit v alue [ 7 ]. The Posit Standard (2022) [ 19 ] defines the quire width as n 2 / 2 bits for an n -bit p osit, yielding a 512-bit accum ulator for p osit32. This fixed relationship b et w een p osit precision and quire width simplifies b oth hardware implementation and compiler mo deling. F rom the DTS p erspective, the quire is a numeric container whose dimensional seman tics are determined by the p osit v alues it accumulates. A quire accumulating pro ducts of float and float carries dimension newtons · meters = joules. The dimensional algebra trac ks through the fused multiply-add op erations: let work (forces: Span< float >) (distances: Span< float >) : float = let mutable q = Quire.zero for i in 0 .. forces.Length - 1 do q ← Quire.fma q forces.[i] distances.[i] // dimension: newtons * meters = joules Quire.toPosit q // single rounding, dimension preserved The source co de carries no dimensional annotations b ey ond the parameter types. DTS infers that q carries dimension joules and that the final con version preserves this dimension. The quire’s in ternal represen tation is invisible to the dimensional algebra; what matters is that the dimension flo ws through the accumulation chain and is verified at the output. F rom the DMM p ersp ectiv e, the quire is a memory resource with sp ecific co effect requirements: Allo cation co effect. F or p osit32, the 512-bit quire o ccupies 64 bytes, exactly one cache line on a t ypical arc hitecture. On a CPU target, this is stack-eligible for short-lived accumulations and arena-eligible for long-lived ones. On an FPGA target, the quire is a 512-bit v alue in the p osit arithmetic pip eline, mapp ed to fabric resources by synthesis. On a neuromorphic target, the quire may b e una v ailable entirely (the target lacks the accumulator width), triggering a capabilit y co effect failure. Lifetime co effect. The quire must p ersist across the entire accumulation lo op. Its lifetime is b ounded b y the lo op scop e in the common case. If the quire escap es (returned from a function, stored in a data structure for incremental accumulation across function calls), the same escap e analysis from Section 3.2 applies: the compiler detects the promotion and surfaces it at design time. Capabilit y co effect. Not all targets supp ort exact accumulation. The co effect system records this as a capabilit y requirement: T able 4: Quire supp ort across target arc hitectures. T arget Quire supp ort Co effect resolution x86_64 Soft w are emulation (64 B on stack) Allo cation: stack; ∼ 50 cycles/FMA Xilinx FPGA 512-bit fabric pip eline Allo cation: fabric; 1 cycle/FMA RISC-V + Xp osit Hardw are quire instruction Allo cation: arch. register; 1 cycle/FMA Neuromorphic (Loihi 2) Not av ailable Capabilit y failure The con vergence is in the PSG. The quire no de carries dimensional annotations (from DTS), allo cation and lifetime annotations (from DMM), and capability annotations (from the co effect 12 system). All three are prop erties of the same graph no de, resolved by the same inference pip eline, visible through the same language serv er interface. The design-time view: q: Quire (exact accumulator) Dimension: joules (inferred from fma operands) +-- x86_64: stack, 64 bytes, 1 cache line, ~50 cycles/fma +-- xilinx: 512-bit fabric pipeline, 1 cycle/fma +-- loihi2: not available (no exact accumulation support) Lifetime: loop scope (lines 3-5), no escape detected The quire is a v alue with dimensional, allo cation, and capabilit y prop erties that the existing DTS+DMM framework handles through its standard inference and co effect mac hinery . Its size is deterministic for a giv en p osit precision ( n 2 / 2 bits p er the Posit Standard [ 19 ]), making memory analysis straightforw ard: once the target’s p osit width is kno wn, the quire’s cac he fo otprint and allo cation strategy follow directly . 4 The Program Seman tic Graph as Design-Time Resource 4.1 Elab oration, Saturation, and Laten t Preserv ation The PSG progresses through t wo computational phases: Elab oration. Ra w parsed syn tax is enriched with type and dimensional information through inference. Eac h no de acquires type annotations, dimensional constraints, and co effect require- men ts. Elab oration is the exp ensive phase; it inv olv es constrain t generation, unification, and resolution across the full dep endency graph. Saturation. The elab orated graph is iteratively refined un til all inference v ariables are resolv ed and all co effect constraints are propagated to fixp oin t. A saturated no de has a complete, stable set of annotations: its type, dimension, memory placemen t, lifetime, and target-sp ecific resolution are all determined. Concretely , the co effects computed during elab oration and saturation include: T able 5: Co effect categories computed during PSG elab oration and saturation. Co effect Category What It Resolves When Consumed Emission strategy Inline, separate function, or mo dule init? MLIR generation Capture analysis Outer-scop e v ariables a lambda requires Closure lay out, escap e classification Lifetime requirements Minimum lifetime for a v alue Allo cation strategy selection SSA pre-assignment SSA iden tifier for the no de’s result MLIR emission Dimensional resolution Physical dimension of a v alue Represen tation selection, transfer fidelity T arget reac hability Configured targets where no de is reachable Co de generation filtering These coeffects are all computed b efor e the graph trav ersal that generates target co de. The tra versal is purely navigational: it visits no des in dep endency order, observ es the pre- computed co effects, and emits the corresp onding target represen tation. This “passive trav ersal” mo del, inspired b y Petricek’s co effect formalization [ 18 ] and Huet’s zipp er for imm utable graph na vigation, ensures that the same co effect annotations consumed b y co de generation are av ailable to the language server for design-time display . There is no separate analysis; the compilation graph is the analysis. Because the PSG p ersists as a long-liv ed structure in the language server, the current design leans tow ard laten t preserv ation: when a subgraph b ecomes inactiv e (a feature flag is disabled, a target is dropp ed), its saturated annotations are retained rather than discarded, allo wing reactiv ation without full re-elab oration. 13 4.2 Three-State No de Mo del The PSG main tains three states for each no de: T able 6: Three-state no de mo del for PSG no des. State Elab orated Saturated Activ e Optimizer Language Serv er Liv e Y es Y es Y es Y es F ull resolution display Laten t Y es Y es No No Dimmed, preserv ed reso- lution F resh No No No No Syn tax only , no resolu- tion A liv e no de participates in compilation and design-time displa y . A latent no de is excluded from compilation but retains its annotations for insp ection and rapid reactiv ation. A fresh no de has b een parsed but never elab orated; it app ears in the design-time displa y as syntax without t yp e or dimensional resolution. The distinction b etw een latent and fresh is op erationally significan t. Reactiv ating a latent no de is O ( b oundary ) ; the elab oration and saturation work has already b een done. Activ ating a fresh no de is O ( subgraph ) ; the full inference pip eline must run. The design-time to oling reflects this difference: laten t no des display their resolved t yp es (whic h are lik ely still correct), while fresh no des displa y only their syn tactic structure with a prompt to build. 4.3 Soft Delete and Reac habilit y The laten t preserv ation mo del implies a soft-delete semantics for reac hability analysis. When the compiler determines that a no de is unreachable under the curren t configuration (feature set, target set, dep endency set), it marks the no de as latent. The no de’s edges are annotated with a reac habilit y bitvector: one bit p er configured target, indicating on which targets the edge is activ e. This p er-target reachabilit y is essential for multi-target compilation. A function may b e reac hable on x86 and FPGA but unreac hable on a neuromorphic target (b ecause the target lac ks floating-p oin t computation paths). The reac habilit y status of the function is not a single b o olean; it is a bitv ector that the language server can display as a p er-target compatibilit y matrix. The optimizer and co de generator consume only the active subgraph; they filter on the reac hability bitv ector during graph trav ersal. The language serv er consumes the full graph; it displa ys latent no des with their preserved resolution, enabling insp ection of co de paths that are not curren tly compiled but could b e activ ated by changing the configuration. 4.4 Design-Time F eedbac k as Compilation Bypro duct The PSG-as-design-resource mo del pro duces sev eral categories of design-time feedbac k that are b ypro ducts of the compilation pro cess, not separate analyses: Dimensional resolution display . Ev ery numeric v alue carries its resolv ed dimension in the PSG. The language server renders this as inline annotations, hov er to oltips, and a p ersisten t resolution panel showing the curren t function’s dimensional resolution across all configured targets. Memory placemen t display . Every v alue carries its resolv ed allo cation strategy and lifetime in the PSG. The language serv er renders this alongside dimensional information, showing where eac h v alue lives in the target’s memory top ology . 14 Escap e analysis diagnostics. When the co effect system promotes a v alue’s allo cation strategy (stac k to arena, arena to heap), the promotion is recorded in the PSG as a co effect annotation. The language serv er renders this as a diagnostic with the escap e path, the promotion reason, and restructuring alternativ es. Cac he lo calit y estimates. F or v alues in hot lo ops (detected via lo op nesting analysis, also a PSG annotation), the language server can estimate cac he residency based on the v alue’s size, alignmen t, and allo cation strategy . A stack-allocated 800-byte span o ccupies 12.5 L1 cache lines and is guaranteed contiguous; an arena-allo cated span of the same size may or may not b e con tiguous dep ending on arena state. The estimated p erformance difference can b e quantified and displa yed. Cross-target transfer analysis. When a v alue crosses a hardw are b oundary (FPGA to CPU, CPU to NPU), the compiler resolv es the transfer proto col, latency , bandwidth, and precision fidelit y of an y numeric conv ersion. This information is a PSG annotation on the transfer edge. The language server renders it as a diagnostic on the v alue’s usage at the b oundary , making visible exactly what happ ens when a computation result mov es b et ween targets. F or hardw are/soft ware co-design w orkflo ws, the engineer sees the cost of a target b oundary b efore committing to an arc hitecture partition. None of these feedbac k categories require a separate analysis pass. They are all prop erties of the PSG that the compiler computes as part of normal compilation. The language server reads the PSG; the design-time to oling is a view o v er the compilation graph. 5 Related W ork 5.1 Units of Measure in F# Kennedy’s Units of Measure system for F# [ 13 ] established the core inference algorithm for dimensional t yp es in an ML-family language. The system is elegant, fully inferrable, and in tegrated with Hindley–Milner unification. Its limitation, b y design, is early erasure: units are c heck ed at compile time and discarded during IL generation, b efore the compilation stages where they could inform represen tation selection or memory placement. DTS extends Kennedy’s algebraic framew ork with dimensional p ersistence through compilation, m ulti-target resolution, and in tegration with the co effect system for memory dimensions. 5.2 Dep enden t T yp es in F*, Idris, and Agda F* [ 26 ] is an ML-family language with dep endent types and effect trac king, drawing from F#, OCaml, and Standard ML, and using an SMT solver (Z3) for automated pro of discharge. T wo asp ects of F*’s design w ere particularly influential for DTS. First, F*’s treatmen t of represen tation as a concern separable from t yp e iden tit y informed the core DTS principle that a float carries its dimensional semantics indep enden tly of whether the underlying represen tation is a 64-bit IEEE 754 float, a 32-bit p osit, or a 16-bit fixed-p oint v alue. In F*, refinemen t types can constrain v alues without altering their runtime representation; DTS applies an analogous separation at the level of ph ysical dimensions and numeric format. Second, F*’s in tegration of SMT-LIB2 [ 2 ] via Z3 demonstrated that solver-bac ked constrain t resolution could b e embedded transparently within an ML-family t yp e c hec king w orkflow, a pattern that informs ho w the Fidelit y framework resolves dimensional, memory , and target constraints during PSG elab oration. Idris [ 4 ] provides dep endent t yp es with a fo cus on practical programming. Agda [ 17 ] is a pro of assistan t that doubles as a programming language. All three systems can enco de dimensional 15 constrain ts, but the enco ding uses the full p o w er of dep enden t types, sacrificing decidability and complete inference. DTS achiev es the same dimensional correctness guarantees with a restricted algebraic framew ork that preserves these prop erties. 5.3 R ust Ownership and Borro w Chec king R ust’s ownership system [ 10 ] pro vides deterministic memory management through a discipline of o wnership, b orro wing, and lifetime annotation. The b orrow chec ker is a static analysis that rejects programs where lifetimes are inconsisten t. R ust’s approach front-loads the annotation burden: the engineer sp ecifies lifetimes in function signatures, and the compiler verifies them. The Clef/Fidelity approach differs in three resp ects. First, the analysis op erates at a differen t depth in the compilation pip eline. Our understanding of ownership-based b orro w c hec king is that it analyzes a mid-level intermediate represen tation deriv ed from the surface syntax. The Clef co effect analysis op erates on the Program Seman tic Graph after type c hecking, SR TP resolution, and dimensional inference hav e completed; the escap e classifier therefore has access to seman tic information that is not syn tactically visible at the function signature lev el. This depth of context enables escap e classifications (Section 3.2.1 ) that account for dimensional constraints, resolved t yp e parameters, and closure capture structure join tly . Second, lifetimes are inferred by default (the poll model of Section 3.3 ), with explicit annotation a v ailable when the engineer needs control (the push model) or when inference pro duces surprising results. This parallels the difference b etw een mandatory lifetime annotations and ML-family t yp e inference: both achiev e static guarantees, but the annotation burden falls differen tly . T able 7: Comparison of Rust and Clef memory management approaches. Prop ert y R ust Clef Lifetime sp ecification Mandatory at function b oundaries Inferred; three levels of explicitness Allo cation strategy Ownership-determined Co effect-determined Design-time feedback A ccept/reject with error diagnostics Escap e diagnostics with restructuring al- ternativ es Annotation cost Every function with references Only where inference is insufficient Analysis depth Mid-lev el IR from surface syntax PSG after t yp e c hecking, SR TP , and di- mensional inference Multi-target implications Single compilation target Strategy may v ary p er target Third, the design-time to oling pro vides graduated feedbac k. When the coeffect system promotes a v alue’s allo cation, the language server displa ys the escap e path and prop oses concrete restructuring alternativ es (Section 3.4 ). In an accept/reject mo del, the engineer diagnoses the escap e path and restructures the co de indep endently; the Clef mo del in v ests the compiler’s escap e analysis as a design-time resource, surfacing the r e asons for the allo cation decision alongside actionable alternativ es. The static guaran tee is preserved in b oth cases; the difference lies in the feedbac k granularit y during developmen t. A further distinction emerges in multi-target compilation. When a single co debase targets m ultiple back ends with different memory hierarc hies, a fixed o wnership mo del applies the same allo cation strategy everywhere. The co effect mo del allo ws the same function’s allo cation decisions to v ary by target: a v alue that is stack-allocated on a general-purp ose CPU might b e placed in a scratc hpad region on an embedded MCU, or mapp ed to a different memory tier on an accelerator. The escap e classification is target-inv arian t; the allo cation r esp onse to that classification is target-sp ecific. This separation is consistent with the represen tation selection mo del of Section 2.6 , where the dimensional annotation constrains the v alue seman tics and the target determines the concrete represen tation. 16 5.4 K oka Effects and Co effects Our review of K oka [ 16 ] show ed that its effect trac king in the type system allows the compiler to sp ecialize effect handling (e.g., eliminating heap allo cation for effects that can b e handled on the stac k). The co effect mo del in Clef extends this to memory placement: allo cation strategy is a co effect that flo ws through the seman tic graph and is resolved at each call site. The in tegration with dimensional types is nov el: a v alue’s ph ysical dimension and its memory placement are join tly trac ked in the same graph, enabling diagnostics that relate dimensional correctness to memory b eha vior. 5.5 P osit Arithmetic and Domain-A w are Represen tation Gustafson’s p osit arithmetic [ 7 ] addresses the numeric represen tation problem from the hardware and arithmetic side: tap ered precision allo cates more mantissa bits to v alue ranges near 1.0, where most computations concentrate, and few er bits to extreme ranges. The Posit Standard (2022) [ 19 ] unified the exp onen t size ( es = 2 ) across all precisions and formalized the quire accum ulator at n 2 / 2 bits for n -bit p osits, providing exact accumulation for dot pro ducts and fused m ultiply-add sequences. Gustafson’s comprehensive treatment [ 6 ] extends this foundation with parameterizable formats, including b ounded p osits (b- posits) where the regime field is constrained to a maximum size rs , and asymmetric configurations where the precision profile can differ for magnitudes ab o v e and b elo w 1. Jonnalagadda, Thotli, and Gustafson [ 9 ] provide the first hardware efficiency analysis of b ounded p osits, demonstrating that the b ounded regime constraint eliminates the v ariable-length field deco ding ov erhead that has historically been the primary ob jection to p osit hardw are. The b-p osit deco der matches IEEE float hardware in area and latency while preserving p osit’s sup erior accuracy prop erties. This result is directly relev ant to DTS: the represen tation selection function of Section 2.6 can no w include b-p osit configurations in its candidate set with confidence that the hardw are cost is comp etitiv e with IEEE 754. P osit arithmetic implicitly assumes that the compiler or engineer knows which v alue ranges matter for a giv en computation. DTS makes this knowledge explicit and formal: the dimensional annotation constrains the v alue range, and the representation selection function (Section 2.6 ) could use this constraint to choose among a v ariet y of representations, including IEEE 754, p osit, b-p osit, or fixed-p oin t formats. The t w o systems are complemen tary: p osit provides the represen tation with domain-matched precision distribution; DTS pro vides the formal mec hanism for determining whic h domain applies. The quire accumulator illustrates this complementarit y at the DMM lev el. The quire is a memory resource whose allo cation, lifetime, and target av ailabilit y are co effect prop erties (Section 3.5 ). Without the co effect framework, quire managemen t is ad ho c; with it, the compiler can v erify that quire lifetime is correct, that the target supp orts exact accumulation, and that the allo cation strategy matc hes the accum ulation pattern. The deterministic quire size ( n 2 / 2 bits for a giv en p osit precision) mak es this analysis straightforw ard. 5.6 MLIR and Multi-Lev el Compilation MLIR [ 15 ] provides the infrastructure for m ulti-stage compilation with extensible dialects and attributes. The DTS preserv ation model relies on MLIR’s attribute system to carry dimensional metadata through dialect low ering. The contribution is not to MLIR itself but to the demonstration that dimensional t yp e metadata can b e preserved through the full MLIR compilation pip eline without loss, using standard MLIR extension mec hanisms. 17 5.7 Rank P olymorphism and Shap e-Indexed T yp es Slepak, Shiv ers, and Manolios develop Remora [ 25 ], a rank-p olymorphic arra y language whose t yp e system tracks arra y shap e as a sequence of natural-n umber indices. The system uses restricted dep endent t yp es to verify that rank-p olymorphic lifting pro duces shap e-consisten t results, with decidable type c hecking and a pro of of t yp e soundness. Slepak et al. formalize rank-p olymorphic t ype inference as constrain t satisfaction ov er string equations [ 24 ]; DTS inference op erates ov er in teger linear constraints in ab elian groups (Section 2.2 ). The dimensional indices in Remora are elements of the free monoid ov er N (arra y shap es); the dimensional indices in DTS are elemen ts of Z k (ph ysical quantities). Both systems demonstrate that enco ding dimensional information at the t yp e level enables verification that conv entional t yp e systems cannot express. The architectural difference is that Remora’s shap e indices require dep enden t types with existential quantification for dynamic shap es, while DTS dimensional indices are fully inferrable within extended Hindley–Milner unification. This distinction reflects the underlying algebraic complexity: shap e concatenation in the free monoid admits less structure for inference than in teger linear constraints in an ab elian group. 6 F uture W ork 6.1 F ormal Decidabilit y Pro of The decidability claim for DTS inference rests on the reduction to linear algebra ov er Z . A formal pro of of decidability , including the interaction b etw een ph ysical dimensions and memory dimensions (whic h use different algebraic structures within the same constraint system), would strengthen the theoretical foundation. 6.2 Unified Shap e and Quan tit y Indices The orthogonalit y of array-shape indices (as in rank-p olymorphic systems such as Remora [ 25 ]) and physical-quan tity dimensions (as in DTS) suggests that b oth can co exist as indep enden t axes in a unified type-level index structure. A matrix of forces has b oth a shap e (e.g., 3 × 4 , from the domain of rank p olymorphism) and a physical dimension ( newtons , from the domain of DTS). Neither system alone captures b oth. A system combining shap e-indexed rank p olymorphism with physical dimensional inference would verify b oth geometric compatibility and quantit y consistency , prop erties that are currently chec k ed by separate systems or not chec k ed at all. The algebraic structures in volv ed, the free monoid o v er N for shap es and Z k for quan tities, are indep enden t and comp ose as a direct pro duct. Whether inference ov er this pro duct structure preserv es the decidability and principal-t yp e prop erties of either comp onent is an op en question. 6.3 Quan tified Design-Time F eedbac k The cac he lo calit y estimates and p erformance pro jections describ ed in Section 4.4 are currently heuristic. In tegration with hardw are p erformance mo dels (cac he hierarc hy simulators, memory bandwidth mo dels, PCIe latency tables) would pro duce quantified estimates with confidence in terv als, further grounding the restructuring guidance in measurable costs. 6.4 Incremen tal A doption Through P orting A practical adoption path for Clef would b e the incremental p orting of existing co debases. Co de arriving from R ust carries lifetime annotations but no dimensional discipline; the p orting pro cess w ould preserve the lifetime structure while the PSG infers dimensional constrain ts ov er the existing con trol flow. Co de from TypeScript or Go carries neither dimensional annotations nor explicit lifetime management; p orting from these languages would b e a deep er refinement, where 18 the design-time to oling w ould surface b oth dimensional and lifetime information that the PSG infers from an initial unadorned translation. Python and C would represent a similar starting p oin t, with the additional c hallenge of weak or absent static typing at the source. In each case, the p orting pro cess would b e a multi-pass refinement: an initial translation w ould pro duce v alid Clef source with minimal annotations, and the design-time to oling would guide the engineer tow ard progressiv ely stronger constraints. Each pass through the feedbac k lo op w ould add annotations that the compiler can verify , tightening the program’s static guarantees incremen tally . The goal w ould be a “pit of success” mo del where the to oling makes the w ell-t yp ed, lifetime-correct version of the co de easier to reach than the under-sp ecified v ersion. F or engineers accustomed to garbage-collected or dynamically t yp ed environmen ts, this graduated path could reduce the friction of adopting a statically typed, lo w-level compilation target. The design of this refinemen t w orkflow, including ho w the language server would prioritize suggestions and ho w partial annotation would interact with inference, warran ts dedicated study . 6.5 P osit Hardw are Co-Design and Dimensional Range Analysis The represen tation selection function in Section 2.6 is currently a compile-time decision. F or reconfigurable targets (FPGAs), the compiler could go further: giv en the dimensional ranges of all v alues in a computation, the compiler could determine whether a non-standard b-p osit configuration [ 9 ] (e.g., 20-bit with es = 2 and rs = 5 , or an asymmetric configuration with differen t precision profiles for magnitudes ab ov e and b elo w 1 [ 6 ]) would provide b etter precision- p er-bit than an y standard configuration. The b ounded regime field makes this searc h tractable: rs v alues b etw een 2 and 6 combined with es v alues b etw een 1 and 5 pro duce a small, enumerable parameter space. This would require extending the CIR CT compilation path to parameterize the p osit arithmetic pip eline based on dimensional analysis results, a form of t yp e-directed hardware syn thesis. 6.6 Dataflo w Arc hitectures and Con trol-Flo w/Data-Flo w P artitioning The DTS+DMM mo del as presented in this pap er assumes a control-flo w execution mo del, but the PSG’s structure ma y also be relev an t to the growing class of dataflo w and spatial arc hitectures. Coarse-Grained Reconfigurable Arrays (CGRAs), spatial dataflow accelerators, and other non-V on Neumann compute fabrics are proliferating as alternativ es to GPU-cen tric approac hes for HPC and AI inference workloads. These architectures execute computation graphs spatially across arrays of pro cessing elements with explicit data mo vemen t b etw een them. The PSG’s co effect annotations, which already describ e data dep endencies, escap e b eha vior, and memory placemen t, carry information that could inform the partitioning of a computation graph across spatial hardw are. A longer-term question is whether the DTS+DMM framework could ev en tually supp ort inference ab out which sections of a co debase would b enefit from control-flo w execution and whic h w ould b e b etter suited to dataflow mapping. The PSG’s saturation phase computes dep endency structure, memory access patterns, and dimensional constraints for ev ery subgraph; this information could, in principle, inform a partitioning heuristic that routes compute-b ound, regular subgraphs to ward spatial targets and irregular, branch-hea vy su bgraphs tow ard V on Neumann cores. This is a substantial op en problem that the current pap er do es not address, but the PSG’s structure app ears to pro vide a natural starting p oin t for in vestigating it. The PSG’s binary edge structure is sufficien t for the claims presented here, but certain compilation decisions for spatial dataflo w targets would expose its limits. As a concrete example, AMD’s XDNA 2 NPU arranges AI Engine tiles in a tw o-dimensional grid with explicit, programmer-managed data mov emen t via DMA and configurable in terconnect [ 21 ]. Mapping op erations to this architecture requires co-lo cating sets of op erations on tiles, configuring sets of data routes b etw een tiles, and partitioning sets of columns into spatial workload contexts. These 19 are constraints ov er sets of no des, and their natural formalism is the h yp eredge. A heterogeneous w orkstation combining a V on Neumann host, a spatial dataflo w accelerator, and a reconfigurable fabric w ould present multiple targeting strategies with distinct transfer b oundaries and memory hierarc hies. The co effect in teractions at these b oundaries, where dimensional constrain ts, escap e analysis, capability requirements, and transfer fidelity conv erge on a single partitioning decision, are already implicitly m ulti-wa y in the current PSG; a Program Hypergraph (PHG) generalization w ould make them first-class. W e defer this generalization to a subsequent pap er, noting that h yp ergraph partitioning for spatial mapping is an established problem in VLSI placement [ 12 ] and that MLIR’s AIE dialect [ 1 ] provides infrastructure for spatial dataflow targeting within the existing Fidelit y compilation pip eline. 6.7 Delimited Con tin uations and In teraction Nets A separate line of inv estigation concerns the PSG’s p oten tial role as a transparent compute graph that mediates b etw een con trol-flo w and data-flow execution mo dels at a finer granularit y than target-level partitioning. Clef adopts computation expressions from the F# tradition, and under analysis these decomp ose into t wo fundamental patterns: delimited con tin uations (DCon t) for sequential, effectful computations, and in teraction nets (Inet) for pure, parallelizable computations. If the PSG’s co effect annotations could classify subgraphs along this axis, the compiler would hav e a basis for routing effectful regions tow ard stack-based contin uation implemen tations and pure regions tow ard parallel execution, whether on SIMD units, GPU w arps, or spatial dataflow tiles. Both sides of this duality are now represented in the MLIR ecosystem. Kang et al. [ 11 ] at Carnegie Mellon Univ ersity introduce a DCont dialect for MLIR that models delimited con tinuations as first-class operations, targeting W ebAssembly’s emerging stack switching primitiv es. Coll [ 5 ] at the Univ ersit y of Buenos Aires in tro duces an Inet dialect that implements the three Symmetric In teraction Com binators (Erase, Construct, Duplicate) from Lafon t’s in teraction net formalism [ 14 ] as MLIR op erations with declarative rewrite rules. T ogether, these t wo dialects demonstrate that b oth con tin uation-based sequential control flow and interaction- net-based parallel graph reduction can b e represented and lo w ered within the same MLIR infrastructure that the Fidelit y compilation pip eline uses for co de generation. The implications for DTS+DMM are sp eculativ e but w orth noting. A PSG that carries b oth dimensional/co effect annotations and DCon t/Inet classification would b e a compilation artifact that sim ultaneously describ es what a computation means (dimensions, types), how it manages resources (escap e analysis, allo cation), and whether its execution is inherently sequential or parallelizable. This w ould extend the design-time feedback mo del: the language server could surface not only escap e diagnostics and allo cation strategies but also the con tinuation structure of effectful co de and the parallelism opp ortunities in pure regions. W e consider this a promising direction for future w ork. 6.8 F ormal V erification In tegration V erification is a central commitment of the Fidelit y framework, driven b y the goal of pro ducing systems suitable for high-reliability domains: real-time control, embedded systems, safety-critical infrastructure. The PSG’s dimensional and co effect annotations provide the foundation f or a dual-phase verification mo del that the framework implements across the design-time and compile-time b oundaries. The first phase op erates at design time. Because the DTS constraints reduce to quan tifier-free linear integer arithmetic (QF_LIA), the dimensional pro of obligations that the PSG generates are decidable and solv able in b ounded time b y SMT solvers such as Z3. The language serv er derives these obligations automatically from PSG structure during elab oration, verifying dimensional consistency and memory safety prop erties without requiring dev elop er annotations. The b ounded 20 decidabilit y of QF_LIA is essential: it means the verification feedback meets real-time resp onse requiremen ts for interactiv e design-time to oling, providing con tinuous pro of status as the engineer w orks. The second phase op erates at compile time, using MLIR’s SMT dialect to embed verification conditions directly in the intermediate representation. As mlir-opt transformations execute, the embedded SMT assertions v alidate that each lo wering pass preserves the semantic prop erties established during design-time v erification. This creates translation v alidation across the full compilation pipeline: the pro ofs generated at the PSG level are carried through each transformation and re-verified after low ering, providing end-to-end assurance that the prop erties the engineer observ es at design time are preserv ed in the emitted co de. The t w o phases reinforce eac h other: design-time verification establishes the prop erties, and compile-time v erification confirms their preserv ation. T o our knowledge, this combination of SMT-back ed design-time pro of generation with MLIR-level translation v alidation has not b een assem bled in existing compilation framew orks. The bounded decidabilit y of the underlying constraint theories (QF_LIA for dimensional algebra, co effect lattices for memory safet y) is what mak es the dual-phase mo del tractable, and is the basis for our confidence in this arc hitectural direction. 6.9 Information A ccrual and Deferred Optimization The PSG’s p ersistence as a design-time resource raises a question ab out when optimization decisions should b e made. Let I k represen t the information a v ailable at compilation stage k . The stages common to all targets (source parsing, PSG elab oration, MLIR emission, MLIR optimization) form a shared prefix; the bac kend-specific stages diverge at the fan-out p oin t: I source ⊂ I PSG ⊂ I MLIR ⊂ I MLIR-opt ⊂ I back end ⊂ I native (7) A t the source level, the compiler knows t yp es and dimensions. At the PSG lev el, it additionally kno ws co effects, escap e classifications, and saturated annotations. A t the MLIR level, it knows the full program structure in SSA form. At the MLIR optimization lev el, it knows call frequencies and loop nesting. Bey ond this p oint, the information set is back end-sp ecific: the LL VM path adds target-sp ecific parameters for CPU, GPU, MCU, or W ebAssem bly (cache line sizes, pip eline depths, SIMD widths, memory constraints); the CIR CT path adds FPGA resource budgets, timing constraints, and routing top ology; other back ends contribute their o wn target-sp ecific con text. The qualit y of optimization decisions Q improv es with av ailable information: Q ( I k ) < Q ( I k +1 ) for all k (8) This formalizes an arc hitectural principle: decisions that can b e deferred to later compilation stages should b e, b ecause later stages hav e strictly more information. DTS annotations exemplify this. Dimensional information preserved through early stages enables representation selection at the MLIR lev el, where the target architecture is known. Had the dimensions b een discarded at the source lev el (the early-erasure mo del of F#’s Units of Measure), the represen tation selection decision w ould b e imp ossible at the p oin t where it can b e made optimally . The principle extends to memory managemen t. Escap e classification (Section 3.2.1 ) is computed during PSG elab oration b ecause it requires type and scop e information. Allocation strategy is resolved during MLIR emission b ecause it requires target memory top ology . Cac he alignmen t, register allo cation, and hardware resource mapping are determined during bac kend- sp ecific lo w ering b ecause they require target-sp ecific parameters (microarc hitectural details for CPU targets via LL VM, resource budgets and timing for FPGA targets via CIR CT). Each decision is made at the stage where its inputs are first av ailable, whic h is the stage where the decision can b e made with maxim um con text. 21 6.10 Implications for Numerically Disciplined Mac hine Learning The formal prop erties of DTS ha ve implications for machine learning that the present pap er iden tifies but do es not fully develop. W e note four sp ecific connections that w arran t indep endent in vestigation. Dimensional algebra under di fferen tiation. The dimensional algebra is closed under differen tiation. If f : R ⟨ d 1 ⟩ → R ⟨ d 2 ⟩ , where ⟨ d ⟩ denotes the dimensional annotation, then: ∂ f ∂ x : R ⟨ d 2 · d − 1 1 ⟩ (9) The gradien t of a loss function with dimension ⟨ loss ⟩ with resp ect to a parameter with dimension ⟨ d ⟩ carries dimension ⟨ loss · d − 1 ⟩ . This prop erty follo ws from the ab elian group structure: differen tiation is division in the dimensional algebra, and division is closed in Z n . The inference algorithm of Section 2.2 extends to auto-differentiation graphs without mo dification: eac h gradient no de inherits a dimension from the chain rule, and dimensional consistency of the full gradien t computation is verified by the same Gaussian elimination that verifies the forward pass. The practical consequence: in a physics-informed mo del where the loss function includes terms with ph ysical units (force residuals in newtons, energy conserv ation violations in joules), DTS can v erify that gradient accumulation resp ects dimensional consistency . A gradien t with dimension ⟨ newtons / meters ⟩ cannot b e accum ulated with a gradient of dimension ⟨ joules / seconds ⟩ without a dimensional error. This verification is decidable, requires no annotation b eyond the physical dimensions already presen t in the forward computation, and has zero runtime cost. F orw ard-mo de differentiation as a co effect prop erty . Baydin, Pearlm utter, Syme, W o o d, and T orr [ 3 ] demonstrated that the forward gradient, an unbiased estimate of the gradien t computed via forw ard-mo de automatic differentiation, can replace bac kpropagation en tirely . The forw ard gradient is ev aluated in a single forward pass, eliminating the backw ard pass and the activ ation tap e it requires. This has a specific coeffect signature within the DMM framework. Reverse-mode AD (bac kpropagation) requires storing intermediate activ ations for the bac kw ard pass, imp osing an O ( L ) auxiliary memory requirement where L is the num b er of lay ers. This is a co effect: the bac kward pass r e quir es the activ ation tap e as a con textual resource. T able 8 summarizes the co effect signatures of the t w o mo des. T able 8: Co effect signatures of rev erse-mo de and forw ard-mo de automatic differentiation. AD Mo de Auxiliary Memory Gradient A ctiv ation T ap e Rev erse-mo de O ( L · B ) Exact (full Jacobian ⊤ ) Required; spans backw ard pass F orw ard-mo de [ 3 ] O (1) per lay er Unbiased estimate Not required The forw ard-mo de co effect signature (no activ ation tap e, O (1) auxiliary memory p er lay er) means the escap e analysis of Section 3.2 is trivially satisfied: no intermediate v alues escap e their la yer’s scop e, and the en tire gradient computation is stack-eligible. The co effect system can v erify this prop ert y at compile time: given a computation graph annotated with AD mo de, the lifetime analysis confirms that forward-mode imp oses no lifetime obligations b ey ond the current la yer’s scop e. The quire accumulator (Section 3.5 ) comp ounds this adv an tage. F orw ard-mo de computes a directional deriv ative ∇ v f ( θ ) = ⟨∇ f ( θ ) , v ⟩ for a random p erturbation vector v . The inner pro duct is an accum ulation of pro ducts, exactly the op eration the quire makes exact. The co effect system trac ks the quire’s lifetime through the forward pass iden tically to how it tracks 22 quire lifetime in an y accumulation lo op: allocation at lo op entry , accumulation within the lo op b ody , conv ersion at lo op exit. The con vergence of these three prop erties (DTS verifying dimensional consistency of the gradien t graph, forward-mode eliminating the activ ation tap e co effect, and the quire providing exact accum ulation) pro duces a system where gradient computation is dimensionally v erified, memory-minimal, and n umerically exact. Each prop erty is indep endently established; their comp osition within the PSG is the no v el contribution. Represen tation selection for neural net w ork v alue distributions. Neural net w ork activ ations and gradien ts hav e well-c haracterized v alue distributions, typically concentrated near zero with heavy tails. The representation selection function of Section 2.6 applies: giv en the dimensional range of activ ations in a sp ecific lay er (inferrable from training statistics or dimensional constraints on the input domain), the compiler can select p osit widths that concen trate precision where the v alues cluster. The quire (Section 3.5 ) pro vides exact gradient accum ulation, eliminating the rounding errors that comp ound across millions of parameters during training. This connection b et w een DTS (which pro vides the dimensional range) and p osit arithmetic (which provides domain-matched precision) is an instance of the representation selection framew ork applied to a sp ecific computational domain. The b ounded p osit (b-p osit) format [ 9 ] extends this connection. ML workloads op erate ov er a narrow er dynamic range than general scientific computing, typically [10 − 14 , 10 1 ] , whic h p ermits smaller exp onent and regime field sizes than the es = 2 , rs = 6 configuration suited to HPC. Gustafson [ 6 ] describ es asymmetric b-p osit configurations where the precision profile differs for magnitudes b elow and ab o v e 1: a steep er tap er on the left half of the p osit ring (magnitudes < 1 , where most activ ations reside) paired with a flatter, higher-accuracy profile on the right half. An exp onen t bias shift from 2 0 to 2 − 2 or 2 − 3 cen ters the high-precision region on the activ ation distribution’s mo de. Research at the National Univ ersity of Singap ore has demonstrated that suc h configurations maintain classification accuracy down to 5-bit representations, with a sharp accuracy degradation threshold at 4 bits. W e see DTS as a formal mec hanism that could mak e these configurations selectable at compile time. The dimensional range annotation on a neural net work lay er’s activ ations constrains the v alue distribution; the representation selection function ev aluates candidate b- p osit parameterizations ( es , rs , exp onen t bias) against that distribution. The b-p osit’s b ounded regime field ensures that the hardware cost of the selected configuration is predictable, and the format’s cross-precision hardw are reuse prop ert y (Section 2.6 ) means a single deco de unit can serv e 8-bit, 16-bit, and 32-bit b-p osit op erations in a mixed-precision training pip eline. Ph ysics-informed loss term v erification. Physics-informed neural netw orks [ 20 ] enco de ph ysical la ws as differentiable loss terms. A loss term that p enalizes violations of Newton’s second la w would compute F − ma and minimize the squared residual. DTS can verify that F , m , and a carry dimensions ⟨ newtons ⟩ , ⟨ kg ⟩ , and ⟨ m · s − 2 ⟩ resp ectiv ely , and that the subtraction F − ma is dimensionally consistent. This verification is a compile-time chec k on the loss function’s structure, not a runtime constraint on the trained mo del’s outputs. It ensures that the physics constrain ts imp osed during training are dimensionally well-formed, a prop ert y that existing ML framew orks cannot verify b ecause dimensional information is nev er enco ded. 7 Conclusion Dimensional T yp e Systems are not a restricted form of dep endent types. They are a distinct formal category with distinct algebraic structure (finitely generated ab elian groups), distinct computational prop erties (decidable, fully inferrable, principal types), and distinct practical 23 applications (preserv ation through m ulti-stage compilation, m ulti-target resolution, domain-a w are represen tation selection, integration with memory management co effects). The in tegration of DTS with Deterministic Memory Management through a shared co effect discipline in the Program Semantic Graph pro duces a unified framework for design-time semantic analysis. The compiler’s in ternal representation b ecomes the engineer’s design to ol. Escap e classification, allo cation promotion, cache lo calit y estimation, representation fidelity diagnostics, and cross-target transfer analysis are all views o ver the same graph that enforces dimensional consistency . The escap e classification taxonom y (Section 3.2.1 ) demonstrates that escape analysis need not b e binary: distinguishing closure capture from return escap e from byref escap e enables targeted allo cation strategies and precise engineering diagnostics. The con v ergence of DTS with p osit arithmetic demonstrates that the framework’s implications extend b ey ond type theory . Gustafson’s p osit representation [ 7 , 6 ] presupp oses that the compiler kno ws which v alue ranges matter; DTS provides the formal mechanism for that knowledge. The b ounded p osit format [ 9 ] resolv es the hardw are efficiency concern that has historically limited p osit adoption, making p osit configurations viable candidates in the represen tation selection function. The quire accum ulator presupp oses that memory management is deterministic and v erifiable; DMM as a co effect discipline provides that guarantee. Neither system was designed with the other in mind, yet they comp ose naturally within the PSG b ecause b oth formalize prop erties of numeric computation that existing t yp e systems lea v e implicit. The information accrual principle (Section 6 ) formalizes why preserv ation matters: eac h compilation stage has strictly more information than its predecessor, and decisions made at later stages are strictly b etter informed. Dimensional annotations preserved through early stages enable represen tation selection, escap e-aw are allo cation, and cross-target transfer analysis at the stages where those decisions can b e made optimally . Early erasure forecloses these p ossibilities; dimensional p ersistence enables them. The practical consequence is that the compiler’s in ternal analysis (escape classification, allo cation strategy , representation fidelity , cac he residency) is av ailable as design-time feedback without a separate to oling lay er. The PSG serv es b oth roles b ecause the information required for compilation and the information useful for soft ware design are the same information. This pap er has presented three claims. First, that dimensional annotations persisting through compilation enable the compiler to join tly resolve represen tation selection and deterministic memory management, and that this coupling is the reason DTS and DMM b elong in a single framew ork (Sections 1 – 4 ). Second, that the inference machinery derives comp osition-dep enden t prop erties, including dimensional range, escap e classification, and representation compatibility , that emerge from constraint in teraction across the program graph and cannot b e replaced b y p er-v alue annotation regardless of prov enance (Sections 2 – 3 ). Third, that the unified graph enables design-time analysis, including representation fidelity diagnostics and cross-target transfer analysis, that early-erasure systems cannot provide (Sections 4 – 6 ). The p osit quire case study (Section 3.5 ) and the forw ard-mo de auto-differentiation analysis (Section 6.10 ) illustrate sp ecific applications; the formal properties on whic h they dep end are established in the referenced literature [ 7 , 19 , 6 , 9 , 3 ]. A c knowledgmen ts This pap er ow es a particular debt to John L. Gustafson, whose detailed corresp ondence on p osit arithmetic, b ounded p osit parameterization, and domain-sp ecific precision tuning shap ed ho w the author thinks ab out representation selection. The treatment of asymmetric b-p osit configurations and hardw are reuse in Sections 2.6 and 6.10 reflects his influence directly . Don Syme’s F# and its Units of Measure system are the type-theoretic substrate from whic h DTS draws its inference architecture. His feedback on this manuscript sharp ened the framing of dimensional p ersistence and the relationship b et ween annotation prov enance and 24 compilation-stage decisions. P aul Snively provided early guidance on verification reference materials that op ened a line of in v estigation the author w ould not hav e pursued otherwise; the formal verification asp ects of the Fidelit y framework researc h b ear his mark. Martin Coll’s w ork on the Inet dialect for MLIR and his ongoing engagement with the Fidelity pro ject ha ve b een a consistent source of b oth technical insigh t and encouragemen t. Soft w are A v ailability The Clef language, Comp oser compiler, and supp orting libraries describ ed in this pap er are dev elop ed u nder the Fidelity F ramew ork pro ject. Source rep ositories are a v ailable at https: //github.com/FidelityFramework . The language sp ecification, design rationale, and compiler do cumen tation are published at https://clef- lang.com . Central comp onen ts of the framew ork are dual-licensed; terms are detailed in eac h rep ository . All comp onents referenced in this pap er, including the DTS inference engine, escap e analysis pip eline, and BAREWire in terc hange proto col, are under activ e dev elopment. References [1] AMD/Xilinx. MLIR-AIE: An MLIR-based to olchain for AMD AI engines, 2024. gith ub.com/Xilinx/mlir-aie. [2] Clark Barrett, Aaron Stump, and Cesare Tinelli. The SMT-LIB standard: V ersion 2.0. In Pr o c e e dings of the 8th International W orkshop on Satisfiability Mo dulo The ories (SMT) , 2010. [3] A tilim Günes Baydin, Barak A. Pearlm utter, Don Syme, F rank W o od, and Philip T orr. Gradien ts without backpropagation. arXiv pr eprint arXiv:2202.08587 , 2022. [4] Edwin Brady . Idris, a general-purp ose dep endently t yp ed programming language: Design and implemen tation. Journal of F unctional Pr o gr amming , 23(5), 2013. [5] Martin Coll. Inet dialect: Declarative rewrite rules for interaction nets. MLIR Op en Design Meeting, 2025. April 10, 2025. Universit y of Buenos Aires. [6] John L. Gustafson. Every Bit Counts: Posit Computing . Chapman & Hall/CR C Computa- tional Science. CR C Press, 2024. [7] John L. Gustafson and Isaac T. Y onemoto. Beating floating p oint at its o wn game: P osit arithmetic. Sup er c omputing F r ontiers and Innovations , 4(2), 2017. [8] Houston Ha ynes. The program hypergraph: Multi-w a y relational structure for geometric algebra, spatial compute, and ph ysics-a ware compilation, 2026. Companion pap er. Sp eakEZ T echnologies. [9] A dity a Anirudh Jonnalagadda, Rishi Thotli, and John L. Gustafson. Closing the gap b et w een float and p osit hardw are efficiency . arXiv pr eprint arXiv:2603.01615 , 2025. [10] Ralf Jung, Jacques-Henri Jourdan, Robb ert Krebb ers, and Derek Drey er. R ustBelt: Securing the foundations of the Rust programming language. In Pr o c e e dings of the 45th A CM SIGPLAN-SIGA CT Symp osium on Principles of Pr o gr amming L anguages (POPL) , 2018. [11] By eong jee Kang, Harsh Desai, Limin Jia, and Brandon Lucia. W AMI: Compilation to W ebAssembly through MLIR without losing abstraction. arXiv pr eprint arXiv:2506.16048 , 2025. 25 [12] George Karypis and Vipin Kumar. Multilev el k-wa y hypergraph partitioning. VLSI Design , 11(3):285–300, 2000. [13] Andrew Kennedy . T yp es for units-of-measure: Theory and practice. In Centr al Eur op e an F unctional Pr o gr amming Scho ol , volume 6299 of LNCS . Springer, 2009. [14] Y ves Lafon t. Interaction nets. Pr o c e e dings of the 17th A CM SIGPLAN-SIGA CT Symp osium on Principles of Pr o gr amming L anguages (POPL) , pages 95–108, 1990. [15] Chris Lattner, Mehdi Amini, Uda y Bondh ugula, Albert Cohen, Andy Da vis, Jacques Pienaar, Riv er Riddle, T atiana Shp eisman, Nicolas V asilache, and Oleksandr Zinenko. MLIR: Scaling compiler infrastructure for domain sp ecific computation. In Pr o c e e dings of the IEEE/A CM International Symp osium on Co de Gener ation and Optimization (CGO) , 2021. [16] Daan Leijen. K oka: Programming with row p olymorphic effect t yp es. In Pr o c e e dings of the 5th W orkshop on Mathematic al ly Structur e d F unctional Pr o gr amming (MSFP) , 2014. [17] Ulf Norell. T owar ds a pr actic al pr o gr amming language b ase d on dep endent typ e the ory . PhD thesis, Chalmers Univ ersity of T ec hnology , 2007. [18] T omas Petricek, Dominic Orchard, and Alan Mycroft. Co effects: A calculus of con text- dep enden t computation. In Pr o c e e dings of the 19th A CM SIGPLAN International Confer enc e on F unctional Pr o gr amming (ICFP) , 2014. [19] P osit W orking Group. Standard for p osit arithmetic (2022), 2022. posithub.org. [20] Maziar Raissi, Paris P erdikaris, and George Em Karniadakis. Physics-informed neural net works: A deep learning framework for solving forward and in v erse problems in volving nonlinear partial differen tial equations. Journal of Computational Physics , 378:686–707, 2019. [21] Alejandro Rico, Saurabh P areek, Javier Cab ezas, Da vid Clarke, et al. AMD XDNA NPU in R yzen AI pro cessors. IEEE Micr o , 44(6):73–83, 2024. [22] Dipan wita Sarkar, Oscar W addell, and R. Kent Dybvig. A nanopass infrastructure for compiler education. In Pr o c e e dings of the Ninth A CM SIGPLAN International Confer enc e on F unctional Pr o gr amming (ICFP ’04) , pages 201–212. A CM, 2004. [23] Matthias Sc hab el and Steven W atanabe. Bo ost.units: Zero-o verhead dimensional analysis and unit/quan tity manipulation and conv ersion, 2008. Bo ost C++ Libraries. [24] Justin Slepak, Panagiotis Manolios, and Olin Shivers. Rank p olymorphism viewed as a constrain t problem. In Pr o c e e dings of the 5th A CM SIGPLAN International W orkshop on Libr aries, L anguages and Compilers for A rr ay Pr o gr amming (ARRA Y@PLDI) , 2018. [25] Justin Slepak, Olin Shiv ers, and Panagiotis Manolios. An array-orien ted language with static rank p olymorphism. In Pr o c e e dings of the 23r d Eur op e an Symp osium on Pr o gr amming (ESOP) , v olume 8410 of LNCS , pages 27–46. Springer, 2014. [26] Nikhil Sw amy , Cătălin Hriţcu, Chantal Keller, Aseem Rastogi, Antoine Delignat-La v aud, Si- mon F orest, Karthik eyan Bhargav an, Cédric F ournet, Pierre-Y v es Strub, Markulf K ohlw eiss, Jean-Karim Zinzindohoué, and San tiago Zanella-Béguelin. Dep endent t yp es and multi- monadic effects in F*. In Pr o c e e dings of the 43r d A CM SIGPLAN-SIGA CT Symp osium on Principles of Pr o gr amming L anguages (POPL) , 2016. 26 A DTS Inference Example Consider the follo wing unannotated Clef function: let computeForce mass1 mass2 distance = let g = 6.674e-11 g * mass1 * mass2 / (distance * distance) The DTS inference pro ceeds as follo ws: 1. g is assigned dimension v ariable ’d_g . 2. mass1 is assigned ’d_m1 , mass2 is assigned ’d_m2 . 3. distance is assigned ’d_dist . 4. g * mass1 generates constraint: d ( result 1 ) = ’d_g + ’d_m1. 5. result_1 * mass2 generates constraint: d ( result 2 ) = ’d_g + ’d_m1 + ’d_m2. 6. distance * distance generates constraint: d ( denom ) = 2 · ’d_dist. 7. result_2 / denom generates constraint: d ( return ) = ’d_g + ’d_m1 + ’d_m2 − 2 · ’d_dist. A t this p oin t, the function is dimensionally p olymorphic: it accepts any com bination of dimen- sions that satisfies the algebraic constraints. If the function is called with mass1 : float , mass2 : float , distance : float , unification resolves: • ’d_m1 = kg, ’d_m2 = kg, ’d_dist = m • ’d_g = m 3 · kg − 1 · s − 2 (inferred from the kno wn v alue of the gra vitational constant, or from the return t yp e if annotated as float ) • Return dimension: m 3 · kg − 1 · s − 2 + kg + kg − 2 · m = kg · m · s − 2 = newtons ✓ The inference is complete without an y dimensional annotations in the source co de. B Escap e Analysis and Restructuring Example Consider: let processReadings (sensors: Span< float >) = let readings = sensors ▷ Span.map ( fun s → s * calibrationFactor) let summary = summarize readings (readings, summary) The co effect analysis determines: 1. readings is created from a Span.map op eration. T entativ e lifetime: lexical scop e of processReadings . 2. readings is used in summarize readings . Required lifetime: lexical scop e of processReadings . No promotion needed for this usage. 3. readings app ears in the return tuple (readings, summary) . Required lifetime: caller’s scop e. This exceeds the tentativ e lifetime. 4. Promotion: readings lifetime is promoted from stac k (lexical scop e) to arena (caller’s scop e). The language serv er surfaces the promotion and prop oses three alternativ es: 27 Alternativ e 1: Caller-pro vided buffer. let processReadings (sensors: Span< float >) (output: Span< float >) = sensors ▷ Span.mapInto output ( fun s → s * calibrationFactor) summarize output Co effect: no escap e, stack-eligible. Allo cation cost: zero (caller o wns the buffer). Alternativ e 2: Con tin uation st yle. let processReadings (sensors: Span< float >) (k: Span< float > → Summary → ’a) = let readings = sensors ▷ Span.map ( fun s → s * calibrationFactor) k readings (summarize readings) Co effect: no escap e, stack-eligible. Allo cation cost: zero (con tin uation runs within frame). Alternativ e 3: Explicit annotation. let processReadings [] (sensors: Span< float >) = let readings = sensors ▷ Span.map ( fun s → s * calibrationFactor) let summary = summarize readings (readings, summary) Co effect: declared arena allo cation. Allo cation cost: arena allocation (amortized). PSG annotation: confirmed inten t, stable under dep endency changes. C Represen tation Selection with P osit Arithmetic Consider a gravitational force computation compiled for tw o targets: x86_64 (CPU) and a Xilinx FPGA with a p osit arithmetic pip eline. let computeForce (m1: float ) (m2: float ) (r: float ) : float = let g = 6.674e-11 g * m1 * m2 / (r * r) The DTS inference resolves the return dimension as newtons ( kg · m · s − 2 ). The compiler’s represen tation selection pro ceeds p er target: x86_64 target. The platform binding sp ecifies IEEE 754 float64 as the default numeric represen tation. The dimensional range of the gra vitational constan t ( 6 . 674 × 10 − 11 ) com bined with plausible mass and distance ranges (planetary: 10 22 to 10 30 kg, 10 6 to 10 11 m) pro duces force v alues spanning roughly 10 − 2 to 10 25 newtons. IEEE 754 float64 co vers this range with uniform relativ e error of ≈ 1 . 11 × 10 − 16 , w ell within engineering precision. Selection: float64 . Xilinx FPGA target. The platform binding sp ecifies posit32 ( es = 2 ) as the preferred represen tation. The dynamic range of p osit32 extends to approximately 10 ± 36 . The dimensional range [10 − 2 , 10 25 ] newtons falls well within this b ound. Posit32 with es = 2 provides approxi- mately 2 − 27 relativ e error near 1.0, degrading to 2 − 8 at the regime extremes. F or forces near 10 0 newtons (the most common case in n-b o dy simulation), p osit32 pro vides b etter precision than float32 and comparable precision to float64. 28 The compiler selects posit32 for the FPGA target and emits the force computation into the p osit arithmetic pip eline: regime extraction, fraction m ultiplication in DSP48 slices, accum ulation in the quire. The quire p ersists for exactly the duration of the accumulation lo op, a 512-bit v alue in the FPGA fabric. The language serv er displays the cross-target resolution: computeForce: float → float → float → float +-- x86_64: float64 → float64 → float64 → float64 | Precision: 1.11e-16 relative error (uniform) | Quire: not used (no accumulation loop detected) +-- xilinx: posit32 → posit32 → posit32 → posit32 | Precision: ~1.5e-9 in [0.01, 100], ~3.9e-3 at regime extremes | Quire: available, 512-bit fabric pipeline | Dynamic range: [1e-36, 1e36] covers [1e-2, 1e25] +-- Transfer (xilinx → x86_64): posit32 → float64 Protocol: BAREWire over PCIe Fidelity: 1.0 (lossless; float64 range exceeds posit32 range) The cross-target transfer fidelit y of 1.0 (lossless) is a consequence of the dimensional analysis: ev ery p osit32 v alue within its represen table range is exactly represen table in float64, which co v ers 10 ± 308 . The compiler pro ves this at compile time from the representation sp ecifications. A transfer in the opp osite direction (float64 → p osit32) w ould show fidelity < 1 . 0 with a precision loss estimate deriv ed from the dimensional range. This example illustrates the full DTS+DMM pip eline for p osit arithmetic: dimensional inference determines the v alue range, representation selection chooses the n umeric format p er target, the quire’s allo cation and lifetime are resolved as co effects, and the language server presen ts the complete picture as an interactiv e design-time diagnostic. 29

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment