Dimensional Type Systems and Deterministic Memory Management: Design-Time Semantic Preservation in Native Compilation

Dimensional T yp e Systems and Deterministic Memory Managemen t: Design-Time Seman tic Preserv ation in Nativ e Compilation Houston Ha ynes Sp eakEZ T ec hnologies, Asheville, NC hhaynes2@alumni.unca.edu Marc h 2026 Abstract W e presen t a compilation framew ork in which dimensional type annotations persist through m ulti-stage MLIR low ering, enabling the compiler to join tly resolve n umeric repre- sen tation selection and deterministic memory management as coeﬀect prop erties of a single program semantic graph (PSG). The coupling b et ween these tw o concerns is the central con tribution: dimensional inference determines v alue ranges; v alue ranges determine represen- tation selection; representation selection determines word width and memory fo otprint; and memory fo otprin t, combined with escap e classiﬁcation, determines allo cation strategy , cache b eha vior, and cross-target transfer ﬁdelity . Each step in this c hain consumes the output of the preceding inference. The Dimensional T yp e System (DTS) extends Hindley–Milner uniﬁcation with constrain ts dra wn from ﬁnitely generated ab elian groups, yielding dimensional inference that is decidable in p olynomial time, complete (no annotations required), and principal. Where con ven tional systems erase dimensional annotations b efore co de generation, DTS carries them as com- pilation metadata through each lo wering stage, making them av ailable at the p oin t where represen tation selection and memory placement decisions o ccur. The dimensional range of computed v alues guides p er-target format choice: p osit arithmetic with tap ered precision on FPGA targets, IEEE 754 on general-purp ose CPUs, or ﬁxed-p oin t on neuromorphic cores. Deterministic Memory Management (DMM), formalized as a co eﬀect discipline within the same graph, uniﬁes escap e analysis and memory placement with the dimensional framework. The escape analysis classiﬁes v alue lifetimes into four categories (stac k-scop ed, closure- captured, return-escaping, byref-escaping), eac h mapping to a sp eciﬁc allo cation strategy v eriﬁed at compile time. F or p osit targets, the quire accum ulator’s allo cation, lifetime, and exact accumulation seman tics are resolved as co eﬀect prop erties within the PSG. W e iden tify implications for auto-diﬀerentiation: the dimensional algebra is closed under the c hain rule, and forward-mode gradient computation [ 3 ] exhibits a sp eciﬁc co eﬀect signature (no activ ation tap e, O (1) auxiliary memory p er lay er) that the framework can verify . The practical consequence is a developmen t environmen t where escap e diagnostics, allo cation strategy , representation ﬁdelit y , and cache lo cality estimation are design-time views ov er the compilation graph. 1 In tro duction 1.1 Dimensional Annotation Lifetime Con temp orary type systems for n umeric computation diﬀer in ho w long dimensional information remains av ailable during compilation. Systems with dimensional annotations (F#’s Units of Measure [ 13 ], Bo ost.Units in C++ [ 23 ]) discard those annotations b efore co de generation. The dimensional information serves as a compile-time chec k and then v anishes; the emitted co de is dimensionally una ware. W e refer to this as e arly er asur e : the annotations are consumed during type chec king and do not survive to the compilation stages where representation selection and memory placement decisions o ccur. Systems with rich dep endent types (F* [ 26 ], Idris [ 4 ], Agda [ 17 ]) preserv e t yp e-lev el information into generated co de, but at the cost of decidability: t yp e c hecking in the general dep endent case is undecidable, and practical systems rely on SMT solv ers with timeout heuristics. 1 Neither approach satisﬁes the requiremen ts of systems that interface with physical reality across heterogeneous hardw are targets. A sensor fusion pip eline running on an x86 host, an FPGA accelerator, and a neuromorphic pro cessor needs dimensional constraints that p ersist through compilation long enough to guide memory placemen t and inform cross-target data transfer proto cols. Early erasure discards this information b efore it can b e used. F ull dep endent t yp es pro vide interactiv e developmen t en vironments in practice (Lean 4, Agda, Idris 2), but their t yp e chec king is undecidable in general, and practical systems rely on timeout heuristics and fuel limits for solv er-back ed veriﬁcation. DTS tak es a middle path: dimensional annotations p ersist as compilation metadata through multi-stage low ering, a v ailable at each stage where they inform decisions, and are dropp ed b efore native co de emission. The annotations do not exist at runtime; there is no reiﬁed type information, no typeof , and no runtime dispatc h on dimensions. The distinction from early erasure is annotation lifetime, not reiﬁcation. DTS’s restriction to decidable algebraic theories (ab elian groups ov er Z , enum sorts, bitvector constrain ts) guarantees bounded-time inference for ev ery query , a prop ert y that simpliﬁes language serv er architecture and enables unconditional resp onse-time guarantees for design-time to oling. The tradeoﬀ is expressiv eness: DTS cannot enco de arbitrary predicates. The decidability guaran tee enables a sp eciﬁc category of design-time feedback: multi-target resolution, memory placemen t analysis, escape diagnostics, and representation ﬁdelity scoring. Enco ding these compilation-in ternal prop erties as t yp es in a dep enden t system w ould imp ose an o verhead that is arc hitecturally unnecessary when the compiler already computes these prop erties during normal elab oration. 1.2 Con tribution This pap er mak es three claims: 1. Dimensional annotations that p ersist through compilation enable join t resolution of representation selection and memory managemen t. This coupling is the central con tribution and the reason DTS and DMM share a pap er. Dimensional inference deter- mines v alue ranges; v alue ranges determine represen tation selection; representation selection determines word width and memory fo otprint; memory fo otprin t, combined with escap e classiﬁcation, determines allo cation strategy , cache b ehavior, and cross-target transfer ﬁdelit y . These decisions comp ose within the Program Semantic Graph (PSG) as co eﬀect prop erties, and the c hain cannot b e decomp osed without losing information that ﬂo ws b et w een stages. The algebraic foundation is a ﬁnitely generated ab elian group o v er Z , which places DTS in a sp eciﬁc formal nic he: decidable in p olynomial time, fully inferrable via extension of Hindley–Milner uniﬁcation, and preserv able as metadata through multi-stage compilation without altering the generated co de’s op erational seman tics. This nic he is distinct from b oth dep enden t t yp es and parametric p olymorphism (Section 2.4 ). 2. The inference mac hinery derives comp osition-dep enden t prop erties that determine do wnstream compilation decisions. Dimensional annotations can enter the system through multiple paths: Hindley–Milner inference from unannotated source co de (the default), explicit programmer annotation, domain library bindings (e.g., a physics library that pre- p opulates dimensional constrain ts), or external to oling including AI-assisted co de generation. The compilation pip eline’s b ehavior is identical regardless of prov enance. The inference con tribution is not annotation conv enience; it is the deriv ation of prop erties that emerge from constrain t interaction across the program graph. Dimensional range, escap e classiﬁcation, and represen tation compatibility are comp osition-dep enden t: they cannot b e determined from an y single v alue’s annotation but arise from the in teraction of constraints at function b oundaries, lo op nesting, and cross-module interfaces. These deriv ed prop erties join tly determine represen tation selection, word width, allo cation strategy , and cache b ehavior. 2 3. The uniﬁed DTS+DMM graph enables a no v el category of softw are design-time to oling. Because the PSG retains dimensional and memory annotations through compilation, a language serv er can surface the compiler’s internal analysis as in teractiv e design guidance: escap e analysis diagnostics, allo cation promotion w arnings, cache lo cality estimates, and restructuring suggestions. This transforms the compilation graph from a transient build artifact in to a p ersisten t design-time resource. 1.3 Scop e and Con text The system describ ed here is implemented in the Clef programming language and the Fidelity compilation framew ork. Clef is a functional language in the ML family whose primary syntactic and semantic lineage is F#, but sev eral other systems were formative in its design. F* [ 26 ] demonstrated that represen tation width and t yp e identit y could b e treated as indep endent concerns, a separation that directly informed Clef ’s approach to dimensional preserv ation: the t yp e carries the ph ysical seman tics while the represen tation (p osit width, ﬂoat format, ﬁxed-p oin t conﬁguration) is resolved indep enden tly p er target. F*’s use of SMT-LIB2 [ 2 ] for automated pro of discharge also established the feasibility of in tegrating solver-bac ked veriﬁcation into an ML-family w orkﬂow, a pattern that informs the Fidelit y framew ork’s constraint architecture. OCaml’s mo dule system and its approac h to abstract t yp es inﬂuenced the design of Clef ’s compilation unit b oundaries. The Fidelit y compiler’s multi-pass arc hitecture draws on the nanopass metho dology [ 22 ], originally dev elop ed in Scheme, which demonstrated that decomp os- ing compilation into many small, indep enden tly v eriﬁable transformations pro duces compilers that are easier to extend and reason ab out. The Fidelit y framew ork compiles Clef source through a canonical MLIR middle-end (Com- p oser) that fans out to multiple back end pathw a ys: LL VM for CPU, GPU, MCU, and W ebAssem- bly targets; CIRCT for FPGA syn thesis via v endor to olc hains (e.g., Viv ado); and MLIR-AIE for AI Engine arc hitectures. The dimensional and co eﬀect annotations describ ed in this pap er are carried through this fan-out as PSG co data, a v ailable to every lo w ering path. The design-time to oling is provided by Lattice (compiler services and language server proto col implementation) and A telier (in tegrated developmen t environmen t). Throughout this pap er, we use Clef syntax for examples, but the formal prop erties of DTS and DMM are language-indep enden t. The binary PSG describ ed in this pap er is generalized in companion work [ 8 ] to a Program Hyp ergraph (PHG), where the same inference mac hinery extends to grade inference ov er Cliﬀord algebras and co-lo cation constraints for spatial dataﬂow targets. The PHG introduces k -ary h yp eredges that capture irreducible m ulti-wa y relations, including geometric pro ducts, tile assign- men t constrain ts, and DMA route conﬁgurations, that the binary PSG cannot represen t without in tro ducing semantically empty intermediate no des. Where the presen t pap er demonstrates the DTS/DMM coupling for scalar and tensor workloads, the PHG pap er extends the argument to geometric algebra neural net works and physics-a ware computation, with direct implications for the contin uous learning and spatial partitioning applications of forw ard-mo de automatic diﬀeren tiation. 2 Dimensional T yp e Systems: F ormal Characterization 2.1 Algebraic F oundation A dimensional type system assigns to each numeric v alue a dimension drawn from a ﬁnitely generated free ab elian group. The base dimensions (length, time, mass, temp erature, electric curren t, luminous intensit y , amoun t of substance) generate the group under multiplication, with in teger exp onen ts. F ormally , let D = Z n b e the dimension space, where n is the num b er of base dimensions. 3 Eac h dimension d ∈ D is a vector of integer exp onen ts: d velocity = (1 , − 1 , 0 , 0 , . . . ) (length 1 · time − 1 ) (1) d force = (1 , − 2 , 1 , 0 , . . . ) (length 1 · time − 2 · mass 1 ) (2) Dimensional consistency of an arithmetic expression reduces to linear algebra o ver Z : addition requires op erand dimensions to b e equal; multiplication adds exp onen t vectors; division subtracts them; exp onen tiation scales them. These op erations are closed in Z n and decidable in O ( n ) p er op eration. This is the critical distinction from dep endent types. A dep endent type can enco de an arbitrary predicate o v er v alues. Checking whether t wo dep enden t types are equal ma y require pro ving an arbitrary theorem. Dimensional consistency c hec king requires comparing t w o integer v ectors, a constant-time op eration p er base dimension. 2.2 Inference via Extended Hindley–Milner Uniﬁcation F#’s Units of Measure system [ 13 ] demonstrated that dimensional constrain ts integrate naturally with Hindley–Milner t yp e inference. The extension is direct: t yp e v ariables carry an asso ciated dimension v ariable; uniﬁcation of type v ariables propagates to uniﬁcation of dimension v ariables; dimension uniﬁcation reduces to solving systems of linear equations o ver Z . The inference algorithm pro ceeds as follo ws: 1. Constrain t generation. Eac h arithmetic op eration generates a dimensional constraint. A ddition of a + b generates d ( a ) = d ( b ) . Multiplication of a * b generates d ( result ) = d ( a ) + d ( b ) . 2. Uniﬁcation. Dimensional constraints form a system of linear equations ov er Z n . The system is solv ed by Gaussian elimination ov er Z , yielding either a unique solution, a parametric family of solutions (dimensional p olymorphism), or no solution (dimensional inconsistency). 3. Generalization. Unsolv ed dimension v ariables in a function’s type are generalized to dimension parameters, pro ducing dimensionally p olymorphic functions. A function let scale factor value = factor * value infers t yp e float<’d1> -> float<’d2> -> float<’d1 * ’d2> without any annotation. The inference is complete (every dimensionally consisten t program can b e t yp ed without annotation), principal (the inferred type is the most general), and decidable (the constraint system is ﬁnite and the solution algorithm terminates). These properties are shared with standard Hindley–Milner inference and are not shared with dep endent type inference in general. Annotation pro v enance and comp osition-dep enden t prop erties. Dimensional annota- tions can en ter the system through sev eral paths: HM inference from unannotated source (the default describ ed ab o v e), explicit programmer annotation, domain library bindings that pre- p opulate dimensional constrain ts for sp eciﬁc ﬁelds (e.g., a planned Fidelity.Physics library), or external to oling including AI-assisted co de generation. The compilation pip eline treats all annotations iden tically regardless of pro venance; the downstream representation selection and memory managemen t decisions are the same whether a dimension was inferred or declared. The inference mac hinery’s contribution extends b ey ond annotation conv enience. The chain from dimensional constraint through range analysis, representation selection, w ord width, and cache b eha vior pro duces c omp osition-dep endent pr op erties : properties that emerge from constrain t interaction across the program graph and cannot b e determined from an y individual v alue’s annotation. A function that multiplies a mass by an acceleration inherits a force dimension, and the range of the result is b ounded by the pro duct of the input ranges, which in 4 turn constrains the p osit or IEEE 754 format that the compiler selects. These derived ranges, escap e classiﬁcations, and representation compatibilities propagate through function b oundaries, lo op nesting, and cross-mo dule interfaces. Companion work on the Program Hyp ergraph [ 8 ] demonstrates a concrete case: grade inference in Cliﬀord algebra, using the same constraint mac hinery , identiﬁes that approximately 95% of the Cayley table entr ies are structurally zero for t ypical grade combinations in 3D Pro jectiv e Geometric Algebra, producing a 20 × co de generation impro vemen t that no p er-v alue annotation could provide. 2.3 Preserv ation Through Multi-Stage Compilation The deﬁning prop ert y of DTS, as distinct from F#’s Units of Measure, is that dimensional annotations p ersist through compilation. In F#, units are discarded during IL generation; a float b ecomes a float64 in the emitted Common Intermediate Language. This is early erasure: dimensions serve as compile-time chec ks and are then discarded b efore the compilation stages where they could inform represen tation selection or memory placement. In DTS, dimensions are carried as attributes through the compilation pip eline: Stage 1: Source → T yp ed AST. Dimensional inference pro duces a fully annotated AST where ev ery numeric expression carries its resolved dimension. Stage 2: Typed AST → PSG. The Program Seman tic Graph preserv es dimensional annotations as no de attributes. The PSG is the central data structure for b oth compilation and design-time services; dimensional information in the PSG is accessible to the language server for design-time resolution displa y . Stage 3: PSG → MLIR. The PSG carries dimensional annotations and co eﬀects as co data: no de-lev el attributes computed during PSG elab oration and saturation, then observ ed during the co de generation tra versal. Alex (the compiler targeting lay er) trav erses the enriched PSG via a zipp er and emits MLIR using p ortable dialects (memref, arith, func). The dimensional and co eﬀect information resides in the PSG as co data; it guides the MLIR emission but is not itself enco ded into the MLIR represen tation. Stage 4: MLIR → T arget-sp eciﬁc lo w ering. The MLIR emitted in Stage 3 fans out to bac k end-sp eciﬁc low ering pip elines: the LL VM dialect for CPU, GPU, MCU, and W ebAssembly targets; CIR CT dialects for FPGA syn thesis; or MLIR-AIE dialects for AI Engine architectures. By this p oint, the dimensional and co eﬀect co data from the PSG has already guided represen tation selection: a float ma y hav e b een lo w ered to float64 on x86 via the LL VM back end, posit<32,2> on an FPGA target via CIR CT, or fixed<24,signed> on a neuromorphic core. Stage 5: T arget dialect → Mac hine co de. At the ﬁnal low ering stage, dime nsional attributes are no longer needed for co de generation and are low ered to debug metadata (DW ARF annotations on x86, equiv alen t metadata on other targets). The dimensions do not aﬀect the op erational semantics of the generated co de; they are metadata that can be consumed b y debuggers, proﬁlers, and p ost-mortem analysis to ols. This preserv ation mo del has a sp eciﬁc prop ert y: dimensions never inﬂuenc e c ontr ol ﬂow or data layout in a way that c ould c ause diver genc e b etwe en a dimensione d and undimensione d c ompilation of the same pr o gr am. The generated instructions are iden tical; only the metadata and target-sp eciﬁc numeric representation selections diﬀer. This is w eaker than full dep enden t t yp e preserv ation (where t yp e information can aﬀect runtime b eha vior) but stronger than early erasure (where dimensional information is discarded b efore the compilation stages where it could inform represen tation and memory decisions). 5 2.4 DTS is Not Dep enden t T yping The relationship b et ween DTS and dep enden t type systems warran ts careful delineation, as imprecise classiﬁcation would p osition DTS as a restricted dep enden t t yp e system. This misc haracterizes the algebraic structure. T able 1: Comparison of DTS and dep enden t type systems. Prop ert y DTS Dep enden t Types T yp e chec king Decidable (linear algebra ov er Z ) Undecidable in general Inference Complete and principal Incomplete; requires annotations R untime representation No runtime cost; metadata only Ma y require runtime witnesses Expressiv eness Ab elian group constraints on n umeric t yp es Arbitrary predicates ov er v alues Pro of obligations None (consistency is syntactic) Ma y require interactiv e pro of Compilation mo del A ttributes that guide co de generation T yp es that participate in co de generation A dep enden t t yp e system can enco de dimensional constraints (one can deﬁne Vector (n : Nat) in Idris and enforce length-indexed op erations). But the enco ding uses the full p o w er of dep enden t t yp es to express a constrain t that DTS captures with a restricted algebraic structure. The restriction is not a limitation; it is the source of the decidability , completeness, and inference prop erties that make DTS practical for in teractiv e design-time to oling. The analogy is to regular expressions and context-free grammars. Regular expressions are not “restricted CF Gs”; they are a distinct formal class with distinct closure prop erties, distinct recognition algorithms, and distinct practical applications. DTS o ccupies an analogous p osition relativ e to dep endent types: a distinct formal class that happens to o verlap in expressive p o w er for a sp eciﬁc domain (dimensional constraints on n umeric v alues) but diﬀers in every computational prop ert y that matters for practical to oling. 2.5 Extension: Memory Dimensions The DTS framework extends naturally b ey ond ph ysical units. An y constraint domain that forms a ﬁnitely generated ab elian group can b e enco ded as a dimension. Memory space iden tiﬁers (stac k, arena, heap, sp eciﬁc hardw are memory regions) form such a group under a trivial multiplication (the “pro duct” of t w o memory spaces is the pair of constraints that b oth must b e satisﬁed). More precisely , memory dimensions do not form an ab elian group under the same arithmetic as ph ysical dimensions. They form an enumeration sort in the SMT sense: a ﬁnite set of v alues with equality but no arithmetic. The dimensional algebra handles this b y assigning memory dimensions to a separate sort within the constraint system. Ph ysical dimensions are solved by Gaussian elimination ov er Z ; memory dimensions are solved b y equalit y uniﬁcation ov er a ﬁnite domain. Both are decidable and b oth participate in the same inference pass. This is the bridge to DMM. Memory placemen t is a dimensional constraint solved by the same machinery that solves ph ysical unit constraints. The uniﬁcation of these tw o constraint domains within a single inference framework is the formal basis for the design-time to oling describ ed in S ection 4 . 2.6 Represen tation Selection as a Dimensional F unction The p ersistence of dimensional annotations through compilation creates a capability that early- erasure systems cannot provide: the compiler can select numeric representations based on the dimensional domain of the v alues b eing computed. IEEE 754 distributes precision uniformly across its represen table range. A float64 allo cates the same n umber of man tissa bits to v alues near 1.0 as to v alues near 10 300 . F or computations whose v alues span a narrow dimensional range (gravitational forces b et ween 10 − 11 and 10 30 6 newtons, mem brane p oten tials b et w een − 80 and +40 millivolts, sensor readings b etw een 0 and 100 celsius), the ma jorit y of IEEE 754’s precision budget is allo cated to ranges that the computation will nev er visit. Gustafson’s p osit arithmetic [ 7 , 6 ] mak es a diﬀeren t allo cation. P osits use tap er e d pr e cision : a v ariable-length regime ﬁeld concentrates man tissa bits near 1.0 and reduces precision at extremes. The P osit Standard (2022) [ 19 ] standardized the exp onent size ( es = 2 ) across all bit widths, enabling trivial con v ersion b et ween precisions by app ending or rounding bits. Recen t work on b ounded p osits (b-p osits) [ 9 ] constrains the regime ﬁeld to a ﬁxed maximum size ( rs ≤ 6 ), which b ounds the regime to b etw een 2 and 6 bits. This constrain t enables deco der implementation via simple m ultiplexers, achieving 79% less p o wer, 71% smaller area, and 60% reduced latency compared to standard p osit deco ders, while matc hing or exceeding IEEE-compliant ﬂoat32 hardw are p erformance. A further consequence of the b ounded regime is hardware reuse across precisions: with rs = 6 , the maximum non-fraction ﬁeld width is 1 + rs + es bits, whic h is iden tical for 16-bit, 32-bit, and 64-bit op erands. IEEE 754 cannot share deco de hardware across precisions b ecause the exp onen t ﬁeld width and bias c hange with format. The b-p osit design eliminates this obstacle. DTS provides the formal mechanism for what p osit arithmetic presupp oses: knowledge of whic h v alue ranges matter for a given computation. The dimensional annotation on a v alue constrains its semantic range. The compiler can ev aluate h o w diﬀeren t representations distribute precision across that range and select the one that minimizes w orst-case relative error. F ormally , giv en a v alue v with dimension d and a v alue range [ a, b ] (inferred from dimensional constrain ts, domain annotations, or platform binding sp eciﬁcations), and a set of av ailable represen tations R = { r 1 , . . . , r k } on target T , the compiler selects: r ∗ = arg min r ∈ R max x ∈ [ a,b ] | x − round r ( x ) | | x | (3) F or IEEE 754, the w orst-case relativ e error is approximately 2 − p (where p is the mantissa width) uniformly across the representable range. F or p osits with es = 2 , the worst-case relative error is minimal near 1.0 and increases tow ard the regime extremes. The dimensional range [ a, b ] determines whic h distribution is preferable. Represen tation selection is a deterministic function from dimensional constrain ts and target capabilities. The function is computable at compile time; its inputs are prop erties of the PSG (dimensional annotations and platform bindings), and its output is a co de generation decision that the language serv er can surface at design time: force: float Dimensional range: [1e-11, 1e30] (from gravitational constant and stellar masses) +-- x86_64: float64 (worst-case relative error: 1.11e-16, uniform) +-- xilinx: posit<32, es=2> (worst-case relative error: 2.3e-8 at range extremes, | 1.5e-9 near 1.0) +-- Note: posit provides 10x better precision in [0.01, 100] subrange where 94% of computed forces reside The design en vironment shows which representation was selected and why : the dimensional range, the precision distribution of each candidate representation, and the o verlap b et ween the precision “sw eet sp ot” and the actual v alue distribution. This capability is bidirectional. If the engineer sp eciﬁes a p osit representation explicitly (b ecause the computation b eneﬁts from tap ered precision), the dimensional constraints can verify that the p osit’s dynamic range encompasses the exp ected v alue range. F or p osit32 with es = 2 , the representable range is approximately [10 − 36 , 10 36 ] . If the dimensional range exceeds this, the compiler emits a diagnostic: Warning: posit<32, es=2> dynamic range [1e-36, 1e36] does not cover full dimensional range [1e-11, 1e72] of astronomicalDistance Consider: float64 (covers full range) or scaling to AU (fits posit range) 7 The suggestion to scale to astronomical units is itself a dimensional op eration: the compiler kno ws that 1 A U ≈ 1 . 5 × 10 11 meters, and that re-dimensioning the computation in AU brings the v alue range closer to p osit32’s representable b ounds. This guidance is p ossible only b ecause the dimension surviv es to the p oin t where representation selection o ccurs. 3 Deterministic Memory Managemen t as Co eﬀect Discipline 3.1 Co eﬀects and Con textual Prop erties Eﬀects describ e what a computation do es to its en vironment (m utation, I/O, exceptions). Co eﬀects describ e what a computation requires from its environmen t (capabilities, resources, con textual assumptions) [ 18 ]. Memory allo cation strategy is a co eﬀect: a function that allo cates from an arena requires that an arena exists in its calling context; a function that places v alues on the stac k requires that the stack frame outlives those v alues. In the Clef/Fidelity framework, co eﬀects are track ed in the PSG as annotations on computa- tion no des. The co eﬀect system handles three categories: Allo cation co eﬀects. Where do es a v alue’s storage come from? Stac k frame, arena, reference- coun ted heap, static memory , hardware-speciﬁc region (FPGA BRAM, neuromorphic neuron state memory). Lifetime co eﬀects. Ho w long do es a v alue p ersist? Lexical scop e (stac k), arena scop e (freed when arena is released), ownership-based (freed when last reference drops), static (program lifetime). Capabilit y co eﬀects. What do es the computation require from its con text? Mutable access, target-sp eciﬁc hardware features, dimensional consistency of inputs. 3.2 Escap e Analysis as Co eﬀect Propagation Classical escap e analysis determines whether a v alue outlives its creating scop e. In most compilers, this is a binary classiﬁcation (escap es or do es not) used to decide b etw een stac k and heap allo cation. The analysis runs during optimization, is opaque to the soft ware engineer, and pro duces no design-time feedback. Ownership-based systems suc h as Rust [ 10 ] brought lifetime veriﬁcation to the surface as a compile-time discipline, requiring the engineer to annotate lifetimes at function b oundaries; the compiler then accepts or rejects the program based on those annotations. The co eﬀect mo del describ ed here pursues the same goal of static lifetime v eriﬁcation, with a diﬀerent annotation strategy and a diﬀerent resp onse to violations. In the co eﬀect mo del, escap e analysis is a propagation of lifetime constraints through the PSG. When a v alue is created, it receiv es a tentativ e lifetime co eﬃcien t (t ypically the lexical scop e of its binding). When the v alue is used, the usage imp oses a lifetime requirement (the v alue m ust liv e at least as long as the usage site’s scop e). If the usage’s required lifetime exceeds the v alue’s tentativ e lifetime, the v alue’s lifetime is promoted. The promotion is recorded in the PSG as a co eﬀect annotation, a visible, na vigable prop erty of the graph. The language serv er can rep ort: “this v alue w as created with stack-eligible lifetime but promoted to arena allo cation b ecause it escap es via the return path at line 42. ” The formal rule: If λ required ( v , use i ) > λ tentativ e ( v ) for an y use i, then λ ( v ) := max i  λ required ( v , use i )  (4) where λ denotes the lifetime ordering: stac k < arena < heap < static. 8 3.2.1 Escap e Classi ﬁcation The binary escap es/does-not-escap e model discards information. A v alue that escap es via closure capture has diﬀeren t allo cation requiremen ts than one that escap es via return v alue or b yref parameter. The co eﬀect system classiﬁes escap e b ehavior into a discriminated union that preserv es this information: Escap eKind( v ) ∈ { Stac kScop ed , ClosureCapture ( t ) , ReturnEscap e , ByRefEscap e } (5) where t iden tiﬁes the closure no de that captures v . Eac h classiﬁcation maps to a speciﬁc allo cation strategy and lifetime b ound: T able 2: Escap e classiﬁcation and allo cation strategy mapping. Escap e Classiﬁcation Allo cation Strategy Lifetime Bound Diagnostic Stac kScop ed Stac k ( memref.alloca ) Lexical scop e None (optimal) ClosureCapture ( t ) Arena (closure env.) Lifetime of closure t “Captured by closure at line n ” ReturnEscap e Arena (caller’s scop e) Caller’s scop e “Escapes via return path” ByRefEscap e Arena (param. origin) Origin scop e of ref. “Escap es via b yref parameter” The classiﬁcation is computed during PSG elab oration, b efore the trav ersal that generates MLIR. This ordering is critical: the PSG’s zipp er-based trav ersal witnesses escap e annotations that were resolved during elab oration; it do es not compute them during emission. The trav ersal is purely navigational; all allo cation decisions are prop erties of the graph, not decisions made during co de generation. 1 The classiﬁcation in teracts with the lifetime ordering. A ClosureCapture ( t ) escap e imp oses the constraint λ ( v ) ≥ λ ( t ) : the captured v alue m ust live at least as long as the closure that captures it. If the closure itself escap es (is returned, stored in a data structure, passed to another function), the constrain t propagates transitiv ely . The PSG records the full escap e chain, enabling the language serv er to display the transitive reason for a promotion: “this v alue w as promoted to arena b ecause it is captured b y a closure that is returned from the enclosing function. ” 3.2.2 Comp ositional Allo cation Resolution The escap e classiﬁcation determines allo cation strategy , but the resolution must comp ose across function b oundaries without requiring source-lev el duplication. A function that op erates on a Span should w ork identically whether the span is stac k-allo cated, arena-allo cated, or bac ked by a hardware memory region. The comp ositional principle: allocation strategy is resolved at the p oin t of use b y detecting the t yp e’s memory representation and comp osing the appropriate access op erations. When the compiler encoun ters a mutable v ariable reference where a v alue is exp ected, it comp oses a load op eration transparently: resolv e( v ) = ( v if τ ( v ) is a v alue type load( v ) if τ ( v ) = MemRef ( τ ′ ) (6) This is the lv alue/rv alue distinction expressed as a t yp e-driv en transformation. The resolution is computed from the t yp e, not from parameter threading, preserving the monadic comp osition of 1 This separation has a practical consequence for the inline k eyword. When a function allo cates on the stack and returns a p ointer, the p ointer b ecomes inv alid when the function returns. Marking the function inline causes the compiler to expand the function b ody at the call site, lifting the allo cation to the caller’s frame. This is escap e analysis b y annotation: the inline k eyword asserts that the function should not create a distinct stack frame, and the compiler veriﬁes that the inlined allo cation do es not escap e the caller. The co eﬀect system records this as a mandatory inline constrain t, distinct from p erformance-motiv ated inlining, which the compiler defers to the MLIR optimization pip eline where full program con text is av ailable. 9 the compilation pip eline. Eac h compilation phase remains a pure transformation from annotated graph to annotated graph; no phase carries hidden state ab out which v alues ha v e b een loaded and whic h hav e not. 3.3 The Push, Bounded, and P oll Mo dels of Co eﬀect Sp eciﬁcation Dev elop ers interact with the co eﬀect system through three mo dels that form a sp ectrum analogous to t yp e annotation in ML-family languages. The parallel is direct: t yp e inference transformed programming from ceremon y to expression b y letting the compiler determine what it could from con text. Lifetime inference follo ws the same principle. Push mo del (explicit declaration). The engineer annotates a function with explicit co eﬀect constrain ts: let processReadings [] [] (sensors: Span< float >) : ProcessedData = // ... The compiler propagates these constraints forw ard through the function b o dy . Ev ery v alue in the b o dy inherits the target and memory constrain ts from the declaration. Inference resolv es the remaining details (sp eciﬁc register allo cation, BRAM placemen t on FPGA, cache line alignment) within the declared constrain ts. The PSG reaches saturation quickly b ecause the engineer has pro vided suﬃcient b oundary conditions for the inference to con verge without ambiguit y . Bounded mo del (scop ed inference). The engineer pro vides scop e b oundaries; the compiler infers within those b ounds: let processReadings () = arena { let ! readings = readSensors () let summary = summarize readings return (readings, summary) } The computation expression marks the lifetime b oundary . The let! syn tax signals allo cation from the arena. The compiler handles parameter threading, reference passing, and cleanup. The source sp eciﬁes wher e inference should op erate (within the arena scop e); the compiler determines how v alues are allo cated and when they are released. This is analogous to annotating function signatures while lea ving lo cal bindings inferred, a common pattern in ML-family languages. P oll mo del (full inference). The engineer writes without co eﬀect annotations: let processReadings sensors = // ... The compiler infers co eﬀects from usage context. If the function is called from three sites with diﬀerent target conﬁgurations, the inference engine uniﬁes across all call sites, propagating constrain ts backw ard to determine the function’s co eﬀect requirements. The function ev entually reac hes the same saturated state, but the path is longer and the result ma y b e context-dependent: the function ma y resolve diﬀerently dep ending on whic h call site is considered. The three mo dels correspon d to a sp ectrum of inference scop e: No mo del is incorrect. The push mo del pro duces PSG no des that saturate faster, remain stable under dep endency changes, and display unambiguous resolution in the design-time to oling. 10 T able 3: Push, Bounded, and Poll mo dels of co eﬀect sp eciﬁcation. Mo del Type analogy Dev elop er provides Compiler infers PSG saturation Push let x: int = 5 F ull coeﬀect constrain ts In ternal details Immediate Bounded let f (x: int) = ... Scope b oundaries Allo cation within scop e F ast P oll let x = 5 Nothing All co eﬀects from con- text Con text-dep enden t The b ounded mo del oﬀers a middle ground with mo dest annotation cost and fast conv ergence. The p oll mo del imp oses no annotation burden but pro duces no des whose saturation dep ends on external con text. The design-time to oling exploits these diﬀerences to provide “pit of success” guidance. When a function’s co eﬀect resolution v aries across call sites, the language serv er displays the v ariation and suggests either a b ounded scop e (computation expression) or an explicit annotation. The engineer is not comp elled to annotate; the to oling sho ws the consequences of not annotating. It rew ards more explicit mo dels with cleaner, more stable resolution display , creating a natural gradien t tow ard explicit co eﬀect sp eciﬁcation for functions where it matters. 3.4 Escap e-Driv en Restructuring Guidance The most concrete instance of design-time co eﬀect guidance is escap e-driven memory promotion. When the compiler determines that a stac k-eligible v alue must b e promoted to arena allo cation due to an escap e path, the language serv er can analyze the escap e path and prop ose structural alternativ es: Caller-pro vided buﬀer. The escap e o ccurs b ecause the function allo cates internally and returns the result. If the caller pro vides the destination buﬀer, the v alue never escap es the callee’s frame. The function signature changes from pro ducing a v alue to ﬁlling a caller-owned buﬀer. Con tin uation-passing st yle. If the caller needs only transient access to the v alue, the function can accept a con tin uation that consumes the v alue within the callee’s frame. The v alue nev er escap es; stac k allo cation is preserv ed. Explicit promotion. If the intended design calls for the v alue to outlive the callee’s frame (b ecause it will b e shared across subsystems or stored in a long-liv ed data structure), the allo cation strategy is annotated explicitly . The promotion still o ccurs, but it is declared inten t, v eriﬁed by the compiler. Eac h alternativ e is a concrete refactoring with quantiﬁable consequences: the caller-pro vided buﬀer eliminates allo cation entirely; the contin uation preserv es stack lo cality (and b y extension, cac he residency); the explicit annotation do cuments in tent and stabilizes the PSG against future c hanges. In an ownership-based system, the same escap e w ould pro duce a rejection; the engineer m ust diagnose the escap e path and arrive at one of these restructurings indep enden tly . The co eﬀect mo del surfaces the diagnosis and the alternatives together. The restructuring guidance is generated from the same PSG that p erforms dimensional inference. The escap e path is a chain of edges in the graph; the lifetime promotion is a co eﬀect annotation on those edges; the alternative restructurings are graph transformations that the compiler can preview b efore the engineer accepts them. There is no separate analysis to ol; the compilation graph is the analysis to ol. 11 3.5 The Quire as Co eﬀect Case Study The p osit quire accum ulator provides a concrete illustration of how DTS and DMM con v erge on a single construct. A quire is a ﬁxed-width exact accumulator that holds intermediate results of multiply-add op erations without rounding; rounding o ccurs once, when the ﬁnal result is con verted bac k to a p osit v alue [ 7 ]. The Posit Standard (2022) [ 19 ] deﬁnes the quire width as n 2 / 2 bits for an n -bit p osit, yielding a 512-bit accum ulator for p osit32. This ﬁxed relationship b et w een p osit precision and quire width simpliﬁes b oth hardware implementation and compiler mo deling. F rom the DTS p erspective, the quire is a numeric container whose dimensional seman tics are determined by the p osit v alues it accumulates. A quire accumulating pro ducts of float and float carries dimension newtons · meters = joules. The dimensional algebra trac ks through the fused multiply-add op erations: let work (forces: Span< float >) (distances: Span< float >) : float = let mutable q = Quire.zero for i in 0 .. forces.Length - 1 do q ← Quire.fma q forces.[i] distances.[i] // dimension: newtons * meters = joules Quire.toPosit q // single rounding, dimension preserved The source co de carries no dimensional annotations b ey ond the parameter types. DTS infers that q carries dimension joules and that the ﬁnal con version preserves this dimension. The quire’s in ternal represen tation is invisible to the dimensional algebra; what matters is that the dimension ﬂo ws through the accumulation chain and is veriﬁed at the output. F rom the DMM p ersp ectiv e, the quire is a memory resource with sp eciﬁc co eﬀect requirements: Allo cation co eﬀect. F or p osit32, the 512-bit quire o ccupies 64 bytes, exactly one cache line on a t ypical arc hitecture. On a CPU target, this is stack-eligible for short-lived accumulations and arena-eligible for long-lived ones. On an FPGA target, the quire is a 512-bit v alue in the p osit arithmetic pip eline, mapp ed to fabric resources by synthesis. On a neuromorphic target, the quire may b e una v ailable entirely (the target lacks the accumulator width), triggering a capabilit y co eﬀect failure. Lifetime co eﬀect. The quire must p ersist across the entire accumulation lo op. Its lifetime is b ounded b y the lo op scop e in the common case. If the quire escap es (returned from a function, stored in a data structure for incremental accumulation across function calls), the same escap e analysis from Section 3.2 applies: the compiler detects the promotion and surfaces it at design time. Capabilit y co eﬀect. Not all targets supp ort exact accumulation. The co eﬀect system records this as a capabilit y requirement: T able 4: Quire supp ort across target arc hitectures. T arget Quire supp ort Co eﬀect resolution x86_64 Soft w are emulation (64 B on stack) Allo cation: stack; ∼ 50 cycles/FMA Xilinx FPGA 512-bit fabric pip eline Allo cation: fabric; 1 cycle/FMA RISC-V + Xp osit Hardw are quire instruction Allo cation: arch. register; 1 cycle/FMA Neuromorphic (Loihi 2) Not av ailable Capabilit y failure The con vergence is in the PSG. The quire no de carries dimensional annotations (from DTS), allo cation and lifetime annotations (from DMM), and capability annotations (from the co eﬀect 12 system). All three are prop erties of the same graph no de, resolved by the same inference pip eline, visible through the same language serv er interface. The design-time view: q: Quire (exact accumulator) Dimension: joules (inferred from fma operands) +-- x86_64: stack, 64 bytes, 1 cache line, ~50 cycles/fma +-- xilinx: 512-bit fabric pipeline, 1 cycle/fma +-- loihi2: not available (no exact accumulation support) Lifetime: loop scope (lines 3-5), no escape detected The quire is a v alue with dimensional, allo cation, and capabilit y prop erties that the existing DTS+DMM framework handles through its standard inference and co eﬀect mac hinery . Its size is deterministic for a giv en p osit precision ( n 2 / 2 bits p er the Posit Standard [ 19 ]), making memory analysis straightforw ard: once the target’s p osit width is kno wn, the quire’s cac he fo otprint and allo cation strategy follow directly . 4 The Program Seman tic Graph as Design-Time Resource 4.1 Elab oration, Saturation, and Laten t Preserv ation The PSG progresses through t wo computational phases: Elab oration. Ra w parsed syn tax is enriched with type and dimensional information through inference. Eac h no de acquires type annotations, dimensional constraints, and co eﬀect require- men ts. Elab oration is the exp ensive phase; it inv olv es constrain t generation, uniﬁcation, and resolution across the full dep endency graph. Saturation. The elab orated graph is iteratively reﬁned un til all inference v ariables are resolv ed and all co eﬀect constraints are propagated to ﬁxp oin t. A saturated no de has a complete, stable set of annotations: its type, dimension, memory placemen t, lifetime, and target-sp eciﬁc resolution are all determined. Concretely , the co eﬀects computed during elab oration and saturation include: T able 5: Co eﬀect categories computed during PSG elab oration and saturation. Co eﬀect Category What It Resolves When Consumed Emission strategy Inline, separate function, or mo dule init? MLIR generation Capture analysis Outer-scop e v ariables a lambda requires Closure lay out, escap e classiﬁcation Lifetime requirements Minimum lifetime for a v alue Allo cation strategy selection SSA pre-assignment SSA iden tiﬁer for the no de’s result MLIR emission Dimensional resolution Physical dimension of a v alue Represen tation selection, transfer ﬁdelity T arget reac hability Conﬁgured targets where no de is reachable Co de generation ﬁltering These coeﬀects are all computed b efor e the graph trav ersal that generates target co de. The tra versal is purely navigational: it visits no des in dep endency order, observ es the pre- computed co eﬀects, and emits the corresp onding target represen tation. This “passive trav ersal” mo del, inspired b y Petricek’s co eﬀect formalization [ 18 ] and Huet’s zipp er for imm utable graph na vigation, ensures that the same co eﬀect annotations consumed b y co de generation are av ailable to the language server for design-time display . There is no separate analysis; the compilation graph is the analysis. Because the PSG p ersists as a long-liv ed structure in the language server, the current design leans tow ard laten t preserv ation: when a subgraph b ecomes inactiv e (a feature ﬂag is disabled, a target is dropp ed), its saturated annotations are retained rather than discarded, allo wing reactiv ation without full re-elab oration. 13 4.2 Three-State No de Mo del The PSG main tains three states for each no de: T able 6: Three-state no de mo del for PSG no des. State Elab orated Saturated Activ e Optimizer Language Serv er Liv e Y es Y es Y es Y es F ull resolution display Laten t Y es Y es No No Dimmed, preserv ed reso- lution F resh No No No No Syn tax only , no resolu- tion A liv e no de participates in compilation and design-time displa y . A latent no de is excluded from compilation but retains its annotations for insp ection and rapid reactiv ation. A fresh no de has b een parsed but never elab orated; it app ears in the design-time displa y as syntax without t yp e or dimensional resolution. The distinction b etw een latent and fresh is op erationally signiﬁcan t. Reactiv ating a latent no de is O ( b oundary ) ; the elab oration and saturation work has already b een done. Activ ating a fresh no de is O ( subgraph ) ; the full inference pip eline must run. The design-time to oling reﬂects this diﬀerence: laten t no des display their resolved t yp es (whic h are lik ely still correct), while fresh no des displa y only their syn tactic structure with a prompt to build. 4.3 Soft Delete and Reac habilit y The laten t preserv ation mo del implies a soft-delete semantics for reac hability analysis. When the compiler determines that a no de is unreachable under the curren t conﬁguration (feature set, target set, dep endency set), it marks the no de as latent. The no de’s edges are annotated with a reac habilit y bitvector: one bit p er conﬁgured target, indicating on which targets the edge is activ e. This p er-target reachabilit y is essential for multi-target compilation. A function may b e reac hable on x86 and FPGA but unreac hable on a neuromorphic target (b ecause the target lac ks ﬂoating-p oin t computation paths). The reac habilit y status of the function is not a single b o olean; it is a bitv ector that the language server can display as a p er-target compatibilit y matrix. The optimizer and co de generator consume only the active subgraph; they ﬁlter on the reac hability bitv ector during graph trav ersal. The language serv er consumes the full graph; it displa ys latent no des with their preserved resolution, enabling insp ection of co de paths that are not curren tly compiled but could b e activ ated by changing the conﬁguration. 4.4 Design-Time F eedbac k as Compilation Bypro duct The PSG-as-design-resource mo del pro duces sev eral categories of design-time feedbac k that are b ypro ducts of the compilation pro cess, not separate analyses: Dimensional resolution display . Ev ery numeric v alue carries its resolv ed dimension in the PSG. The language server renders this as inline annotations, hov er to oltips, and a p ersisten t resolution panel showing the curren t function’s dimensional resolution across all conﬁgured targets. Memory placemen t display . Every v alue carries its resolv ed allo cation strategy and lifetime in the PSG. The language serv er renders this alongside dimensional information, showing where eac h v alue lives in the target’s memory top ology . 14 Escap e analysis diagnostics. When the co eﬀect system promotes a v alue’s allo cation strategy (stac k to arena, arena to heap), the promotion is recorded in the PSG as a co eﬀect annotation. The language serv er renders this as a diagnostic with the escap e path, the promotion reason, and restructuring alternativ es. Cac he lo calit y estimates. F or v alues in hot lo ops (detected via lo op nesting analysis, also a PSG annotation), the language server can estimate cac he residency based on the v alue’s size, alignmen t, and allo cation strategy . A stack-allocated 800-byte span o ccupies 12.5 L1 cache lines and is guaranteed contiguous; an arena-allo cated span of the same size may or may not b e con tiguous dep ending on arena state. The estimated p erformance diﬀerence can b e quantiﬁed and displa yed. Cross-target transfer analysis. When a v alue crosses a hardw are b oundary (FPGA to CPU, CPU to NPU), the compiler resolv es the transfer proto col, latency , bandwidth, and precision ﬁdelit y of an y numeric conv ersion. This information is a PSG annotation on the transfer edge. The language server renders it as a diagnostic on the v alue’s usage at the b oundary , making visible exactly what happ ens when a computation result mov es b et ween targets. F or hardw are/soft ware co-design w orkﬂo ws, the engineer sees the cost of a target b oundary b efore committing to an arc hitecture partition. None of these feedbac k categories require a separate analysis pass. They are all prop erties of the PSG that the compiler computes as part of normal compilation. The language server reads the PSG; the design-time to oling is a view o v er the compilation graph. 5 Related W ork 5.1 Units of Measure in F# Kennedy’s Units of Measure system for F# [ 13 ] established the core inference algorithm for dimensional t yp es in an ML-family language. The system is elegant, fully inferrable, and in tegrated with Hindley–Milner uniﬁcation. Its limitation, b y design, is early erasure: units are c heck ed at compile time and discarded during IL generation, b efore the compilation stages where they could inform represen tation selection or memory placement. DTS extends Kennedy’s algebraic framew ork with dimensional p ersistence through compilation, m ulti-target resolution, and in tegration with the co eﬀect system for memory dimensions. 5.2 Dep enden t T yp es in F*, Idris, and Agda F* [ 26 ] is an ML-family language with dep endent types and eﬀect trac king, drawing from F#, OCaml, and Standard ML, and using an SMT solver (Z3) for automated pro of discharge. T wo asp ects of F*’s design w ere particularly inﬂuential for DTS. First, F*’s treatmen t of represen tation as a concern separable from t yp e iden tit y informed the core DTS principle that a float carries its dimensional semantics indep enden tly of whether the underlying represen tation is a 64-bit IEEE 754 ﬂoat, a 32-bit p osit, or a 16-bit ﬁxed-p oint v alue. In F*, reﬁnemen t types can constrain v alues without altering their runtime representation; DTS applies an analogous separation at the level of ph ysical dimensions and numeric format. Second, F*’s in tegration of SMT-LIB2 [ 2 ] via Z3 demonstrated that solver-bac ked constrain t resolution could b e embedded transparently within an ML-family t yp e c hec king w orkﬂow, a pattern that informs ho w the Fidelit y framework resolves dimensional, memory , and target constraints during PSG elab oration. Idris [ 4 ] provides dep endent t yp es with a fo cus on practical programming. Agda [ 17 ] is a pro of assistan t that doubles as a programming language. All three systems can enco de dimensional 15 constrain ts, but the enco ding uses the full p o w er of dep enden t types, sacriﬁcing decidability and complete inference. DTS achiev es the same dimensional correctness guarantees with a restricted algebraic framew ork that preserves these prop erties. 5.3 R ust Ownership and Borro w Chec king R ust’s ownership system [ 10 ] pro vides deterministic memory management through a discipline of o wnership, b orro wing, and lifetime annotation. The b orrow chec ker is a static analysis that rejects programs where lifetimes are inconsisten t. R ust’s approach front-loads the annotation burden: the engineer sp eciﬁes lifetimes in function signatures, and the compiler veriﬁes them. The Clef/Fidelity approach diﬀers in three resp ects. First, the analysis op erates at a diﬀeren t depth in the compilation pip eline. Our understanding of ownership-based b orro w c hec king is that it analyzes a mid-level intermediate represen tation deriv ed from the surface syntax. The Clef co eﬀect analysis op erates on the Program Seman tic Graph after type c hecking, SR TP resolution, and dimensional inference hav e completed; the escap e classiﬁer therefore has access to seman tic information that is not syn tactically visible at the function signature lev el. This depth of context enables escap e classiﬁcations (Section 3.2.1 ) that account for dimensional constraints, resolved t yp e parameters, and closure capture structure join tly . Second, lifetimes are inferred by default (the poll model of Section 3.3 ), with explicit annotation a v ailable when the engineer needs control (the push model) or when inference pro duces surprising results. This parallels the diﬀerence b etw een mandatory lifetime annotations and ML-family t yp e inference: both achiev e static guarantees, but the annotation burden falls diﬀeren tly . T able 7: Comparison of Rust and Clef memory management approaches. Prop ert y R ust Clef Lifetime sp eciﬁcation Mandatory at function b oundaries Inferred; three levels of explicitness Allo cation strategy Ownership-determined Co eﬀect-determined Design-time feedback A ccept/reject with error diagnostics Escap e diagnostics with restructuring al- ternativ es Annotation cost Every function with references Only where inference is insuﬃcient Analysis depth Mid-lev el IR from surface syntax PSG after t yp e c hecking, SR TP , and di- mensional inference Multi-target implications Single compilation target Strategy may v ary p er target Third, the design-time to oling pro vides graduated feedbac k. When the coeﬀect system promotes a v alue’s allo cation, the language server displa ys the escap e path and prop oses concrete restructuring alternativ es (Section 3.4 ). In an accept/reject mo del, the engineer diagnoses the escap e path and restructures the co de indep endently; the Clef mo del in v ests the compiler’s escap e analysis as a design-time resource, surfacing the r e asons for the allo cation decision alongside actionable alternativ es. The static guaran tee is preserved in b oth cases; the diﬀerence lies in the feedbac k granularit y during developmen t. A further distinction emerges in multi-target compilation. When a single co debase targets m ultiple back ends with diﬀerent memory hierarc hies, a ﬁxed o wnership mo del applies the same allo cation strategy everywhere. The co eﬀect mo del allo ws the same function’s allo cation decisions to v ary by target: a v alue that is stack-allocated on a general-purp ose CPU might b e placed in a scratc hpad region on an embedded MCU, or mapp ed to a diﬀerent memory tier on an accelerator. The escap e classiﬁcation is target-inv arian t; the allo cation r esp onse to that classiﬁcation is target-sp eciﬁc. This separation is consistent with the represen tation selection mo del of Section 2.6 , where the dimensional annotation constrains the v alue seman tics and the target determines the concrete represen tation. 16 5.4 K oka Eﬀects and Co eﬀects Our review of K oka [ 16 ] show ed that its eﬀect trac king in the type system allows the compiler to sp ecialize eﬀect handling (e.g., eliminating heap allo cation for eﬀects that can b e handled on the stac k). The co eﬀect mo del in Clef extends this to memory placement: allo cation strategy is a co eﬀect that ﬂo ws through the seman tic graph and is resolved at each call site. The in tegration with dimensional types is nov el: a v alue’s ph ysical dimension and its memory placement are join tly trac ked in the same graph, enabling diagnostics that relate dimensional correctness to memory b eha vior. 5.5 P osit Arithmetic and Domain-A w are Represen tation Gustafson’s p osit arithmetic [ 7 ] addresses the numeric represen tation problem from the hardware and arithmetic side: tap ered precision allo cates more mantissa bits to v alue ranges near 1.0, where most computations concentrate, and few er bits to extreme ranges. The Posit Standard (2022) [ 19 ] uniﬁed the exp onen t size ( es = 2 ) across all precisions and formalized the quire accum ulator at n 2 / 2 bits for n -bit p osits, providing exact accumulation for dot pro ducts and fused m ultiply-add sequences. Gustafson’s comprehensive treatment [ 6 ] extends this foundation with parameterizable formats, including b ounded p osits (b- posits) where the regime ﬁeld is constrained to a maximum size rs , and asymmetric conﬁgurations where the precision proﬁle can diﬀer for magnitudes ab o v e and b elo w 1. Jonnalagadda, Thotli, and Gustafson [ 9 ] provide the ﬁrst hardware eﬃciency analysis of b ounded p osits, demonstrating that the b ounded regime constraint eliminates the v ariable-length ﬁeld deco ding ov erhead that has historically been the primary ob jection to p osit hardw are. The b-p osit deco der matches IEEE ﬂoat hardware in area and latency while preserving p osit’s sup erior accuracy prop erties. This result is directly relev ant to DTS: the represen tation selection function of Section 2.6 can no w include b-p osit conﬁgurations in its candidate set with conﬁdence that the hardw are cost is comp etitiv e with IEEE 754. P osit arithmetic implicitly assumes that the compiler or engineer knows which v alue ranges matter for a giv en computation. DTS makes this knowledge explicit and formal: the dimensional annotation constrains the v alue range, and the representation selection function (Section 2.6 ) could use this constraint to choose among a v ariet y of representations, including IEEE 754, p osit, b-p osit, or ﬁxed-p oin t formats. The t w o systems are complemen tary: p osit provides the represen tation with domain-matched precision distribution; DTS pro vides the formal mec hanism for determining whic h domain applies. The quire accumulator illustrates this complementarit y at the DMM lev el. The quire is a memory resource whose allo cation, lifetime, and target av ailabilit y are co eﬀect prop erties (Section 3.5 ). Without the co eﬀect framework, quire managemen t is ad ho c; with it, the compiler can v erify that quire lifetime is correct, that the target supp orts exact accumulation, and that the allo cation strategy matc hes the accum ulation pattern. The deterministic quire size ( n 2 / 2 bits for a giv en p osit precision) mak es this analysis straightforw ard. 5.6 MLIR and Multi-Lev el Compilation MLIR [ 15 ] provides the infrastructure for m ulti-stage compilation with extensible dialects and attributes. The DTS preserv ation model relies on MLIR’s attribute system to carry dimensional metadata through dialect low ering. The contribution is not to MLIR itself but to the demonstration that dimensional t yp e metadata can b e preserved through the full MLIR compilation pip eline without loss, using standard MLIR extension mec hanisms. 17 5.7 Rank P olymorphism and Shap e-Indexed T yp es Slepak, Shiv ers, and Manolios develop Remora [ 25 ], a rank-p olymorphic arra y language whose t yp e system tracks arra y shap e as a sequence of natural-n umber indices. The system uses restricted dep endent t yp es to verify that rank-p olymorphic lifting pro duces shap e-consisten t results, with decidable type c hecking and a pro of of t yp e soundness. Slepak et al. formalize rank-p olymorphic t ype inference as constrain t satisfaction ov er string equations [ 24 ]; DTS inference op erates ov er in teger linear constraints in ab elian groups (Section 2.2 ). The dimensional indices in Remora are elements of the free monoid ov er N (arra y shap es); the dimensional indices in DTS are elemen ts of Z k (ph ysical quantities). Both systems demonstrate that enco ding dimensional information at the t yp e level enables veriﬁcation that conv entional t yp e systems cannot express. The architectural diﬀerence is that Remora’s shap e indices require dep enden t types with existential quantiﬁcation for dynamic shap es, while DTS dimensional indices are fully inferrable within extended Hindley–Milner uniﬁcation. This distinction reﬂects the underlying algebraic complexity: shap e concatenation in the free monoid admits less structure for inference than in teger linear constraints in an ab elian group. 6 F uture W ork 6.1 F ormal Decidabilit y Pro of The decidability claim for DTS inference rests on the reduction to linear algebra ov er Z . A formal pro of of decidability , including the interaction b etw een ph ysical dimensions and memory dimensions (whic h use diﬀerent algebraic structures within the same constraint system), would strengthen the theoretical foundation. 6.2 Uniﬁed Shap e and Quan tit y Indices The orthogonalit y of array-shape indices (as in rank-p olymorphic systems such as Remora [ 25 ]) and physical-quan tity dimensions (as in DTS) suggests that b oth can co exist as indep enden t axes in a uniﬁed type-level index structure. A matrix of forces has b oth a shap e (e.g., 3 × 4 , from the domain of rank p olymorphism) and a physical dimension ( newtons , from the domain of DTS). Neither system alone captures b oth. A system combining shap e-indexed rank p olymorphism with physical dimensional inference would verify b oth geometric compatibility and quantit y consistency , prop erties that are currently chec k ed by separate systems or not chec k ed at all. The algebraic structures in volv ed, the free monoid o v er N for shap es and Z k for quan tities, are indep enden t and comp ose as a direct pro duct. Whether inference ov er this pro duct structure preserv es the decidability and principal-t yp e prop erties of either comp onent is an op en question. 6.3 Quan tiﬁed Design-Time F eedbac k The cac he lo calit y estimates and p erformance pro jections describ ed in Section 4.4 are currently heuristic. In tegration with hardw are p erformance mo dels (cac he hierarc hy simulators, memory bandwidth mo dels, PCIe latency tables) would pro duce quantiﬁed estimates with conﬁdence in terv als, further grounding the restructuring guidance in measurable costs. 6.4 Incremen tal A doption Through P orting A practical adoption path for Clef would b e the incremental p orting of existing co debases. Co de arriving from R ust carries lifetime annotations but no dimensional discipline; the p orting pro cess w ould preserve the lifetime structure while the PSG infers dimensional constrain ts ov er the existing con trol ﬂow. Co de from TypeScript or Go carries neither dimensional annotations nor explicit lifetime management; p orting from these languages would b e a deep er reﬁnement, where 18 the design-time to oling w ould surface b oth dimensional and lifetime information that the PSG infers from an initial unadorned translation. Python and C would represent a similar starting p oin t, with the additional c hallenge of weak or absent static typing at the source. In each case, the p orting pro cess would b e a multi-pass reﬁnement: an initial translation w ould pro duce v alid Clef source with minimal annotations, and the design-time to oling would guide the engineer tow ard progressiv ely stronger constraints. Each pass through the feedbac k lo op w ould add annotations that the compiler can verify , tightening the program’s static guarantees incremen tally . The goal w ould be a “pit of success” mo del where the to oling makes the w ell-t yp ed, lifetime-correct version of the co de easier to reach than the under-sp eciﬁed v ersion. F or engineers accustomed to garbage-collected or dynamically t yp ed environmen ts, this graduated path could reduce the friction of adopting a statically typed, lo w-level compilation target. The design of this reﬁnemen t w orkﬂow, including ho w the language server would prioritize suggestions and ho w partial annotation would interact with inference, warran ts dedicated study . 6.5 P osit Hardw are Co-Design and Dimensional Range Analysis The represen tation selection function in Section 2.6 is currently a compile-time decision. F or reconﬁgurable targets (FPGAs), the compiler could go further: giv en the dimensional ranges of all v alues in a computation, the compiler could determine whether a non-standard b-p osit conﬁguration [ 9 ] (e.g., 20-bit with es = 2 and rs = 5 , or an asymmetric conﬁguration with diﬀeren t precision proﬁles for magnitudes ab ov e and b elo w 1 [ 6 ]) would provide b etter precision- p er-bit than an y standard conﬁguration. The b ounded regime ﬁeld makes this searc h tractable: rs v alues b etw een 2 and 6 combined with es v alues b etw een 1 and 5 pro duce a small, enumerable parameter space. This would require extending the CIR CT compilation path to parameterize the p osit arithmetic pip eline based on dimensional analysis results, a form of t yp e-directed hardware syn thesis. 6.6 Dataﬂo w Arc hitectures and Con trol-Flo w/Data-Flo w P artitioning The DTS+DMM mo del as presented in this pap er assumes a control-ﬂo w execution mo del, but the PSG’s structure ma y also be relev an t to the growing class of dataﬂo w and spatial arc hitectures. Coarse-Grained Reconﬁgurable Arrays (CGRAs), spatial dataﬂow accelerators, and other non-V on Neumann compute fabrics are proliferating as alternativ es to GPU-cen tric approac hes for HPC and AI inference workloads. These architectures execute computation graphs spatially across arrays of pro cessing elements with explicit data mo vemen t b etw een them. The PSG’s co eﬀect annotations, which already describ e data dep endencies, escap e b eha vior, and memory placemen t, carry information that could inform the partitioning of a computation graph across spatial hardw are. A longer-term question is whether the DTS+DMM framework could ev en tually supp ort inference ab out which sections of a co debase would b eneﬁt from control-ﬂo w execution and whic h w ould b e b etter suited to dataﬂow mapping. The PSG’s saturation phase computes dep endency structure, memory access patterns, and dimensional constraints for ev ery subgraph; this information could, in principle, inform a partitioning heuristic that routes compute-b ound, regular subgraphs to ward spatial targets and irregular, branch-hea vy su bgraphs tow ard V on Neumann cores. This is a substantial op en problem that the current pap er do es not address, but the PSG’s structure app ears to pro vide a natural starting p oin t for in vestigating it. The PSG’s binary edge structure is suﬃcien t for the claims presented here, but certain compilation decisions for spatial dataﬂo w targets would expose its limits. As a concrete example, AMD’s XDNA 2 NPU arranges AI Engine tiles in a tw o-dimensional grid with explicit, programmer-managed data mov emen t via DMA and conﬁgurable in terconnect [ 21 ]. Mapping op erations to this architecture requires co-lo cating sets of op erations on tiles, conﬁguring sets of data routes b etw een tiles, and partitioning sets of columns into spatial workload contexts. These 19 are constraints ov er sets of no des, and their natural formalism is the h yp eredge. A heterogeneous w orkstation combining a V on Neumann host, a spatial dataﬂo w accelerator, and a reconﬁgurable fabric w ould present multiple targeting strategies with distinct transfer b oundaries and memory hierarc hies. The co eﬀect in teractions at these b oundaries, where dimensional constrain ts, escap e analysis, capability requirements, and transfer ﬁdelity conv erge on a single partitioning decision, are already implicitly m ulti-wa y in the current PSG; a Program Hypergraph (PHG) generalization w ould make them ﬁrst-class. W e defer this generalization to a subsequent pap er, noting that h yp ergraph partitioning for spatial mapping is an established problem in VLSI placement [ 12 ] and that MLIR’s AIE dialect [ 1 ] provides infrastructure for spatial dataﬂow targeting within the existing Fidelit y compilation pip eline. 6.7 Delimited Con tin uations and In teraction Nets A separate line of inv estigation concerns the PSG’s p oten tial role as a transparent compute graph that mediates b etw een con trol-ﬂo w and data-ﬂow execution mo dels at a ﬁner granularit y than target-level partitioning. Clef adopts computation expressions from the F# tradition, and under analysis these decomp ose into t wo fundamental patterns: delimited con tin uations (DCon t) for sequential, eﬀectful computations, and in teraction nets (Inet) for pure, parallelizable computations. If the PSG’s co eﬀect annotations could classify subgraphs along this axis, the compiler would hav e a basis for routing eﬀectful regions tow ard stack-based contin uation implemen tations and pure regions tow ard parallel execution, whether on SIMD units, GPU w arps, or spatial dataﬂow tiles. Both sides of this duality are now represented in the MLIR ecosystem. Kang et al. [ 11 ] at Carnegie Mellon Univ ersity introduce a DCont dialect for MLIR that models delimited con tinuations as ﬁrst-class operations, targeting W ebAssembly’s emerging stack switching primitiv es. Coll [ 5 ] at the Univ ersit y of Buenos Aires in tro duces an Inet dialect that implements the three Symmetric In teraction Com binators (Erase, Construct, Duplicate) from Lafon t’s in teraction net formalism [ 14 ] as MLIR op erations with declarative rewrite rules. T ogether, these t wo dialects demonstrate that b oth con tin uation-based sequential control ﬂow and interaction- net-based parallel graph reduction can b e represented and lo w ered within the same MLIR infrastructure that the Fidelit y compilation pip eline uses for co de generation. The implications for DTS+DMM are sp eculativ e but w orth noting. A PSG that carries b oth dimensional/co eﬀect annotations and DCon t/Inet classiﬁcation would b e a compilation artifact that sim ultaneously describ es what a computation means (dimensions, types), how it manages resources (escap e analysis, allo cation), and whether its execution is inherently sequential or parallelizable. This w ould extend the design-time feedback mo del: the language server could surface not only escap e diagnostics and allo cation strategies but also the con tinuation structure of eﬀectful co de and the parallelism opp ortunities in pure regions. W e consider this a promising direction for future w ork. 6.8 F ormal V eriﬁcation In tegration V eriﬁcation is a central commitment of the Fidelit y framework, driven b y the goal of pro ducing systems suitable for high-reliability domains: real-time control, embedded systems, safety-critical infrastructure. The PSG’s dimensional and co eﬀect annotations provide the foundation f or a dual-phase veriﬁcation mo del that the framework implements across the design-time and compile-time b oundaries. The ﬁrst phase op erates at design time. Because the DTS constraints reduce to quan tiﬁer-free linear integer arithmetic (QF_LIA), the dimensional pro of obligations that the PSG generates are decidable and solv able in b ounded time b y SMT solvers such as Z3. The language serv er derives these obligations automatically from PSG structure during elab oration, verifying dimensional consistency and memory safety prop erties without requiring dev elop er annotations. The b ounded 20 decidabilit y of QF_LIA is essential: it means the veriﬁcation feedback meets real-time resp onse requiremen ts for interactiv e design-time to oling, providing con tinuous pro of status as the engineer w orks. The second phase op erates at compile time, using MLIR’s SMT dialect to embed veriﬁcation conditions directly in the intermediate representation. As mlir-opt transformations execute, the embedded SMT assertions v alidate that each lo wering pass preserves the semantic prop erties established during design-time v eriﬁcation. This creates translation v alidation across the full compilation pipeline: the pro ofs generated at the PSG level are carried through each transformation and re-veriﬁed after low ering, providing end-to-end assurance that the prop erties the engineer observ es at design time are preserv ed in the emitted co de. The t w o phases reinforce eac h other: design-time veriﬁcation establishes the prop erties, and compile-time v eriﬁcation conﬁrms their preserv ation. T o our knowledge, this combination of SMT-back ed design-time pro of generation with MLIR-level translation v alidation has not b een assem bled in existing compilation framew orks. The bounded decidabilit y of the underlying constraint theories (QF_LIA for dimensional algebra, co eﬀect lattices for memory safet y) is what mak es the dual-phase mo del tractable, and is the basis for our conﬁdence in this arc hitectural direction. 6.9 Information A ccrual and Deferred Optimization The PSG’s p ersistence as a design-time resource raises a question ab out when optimization decisions should b e made. Let I k represen t the information a v ailable at compilation stage k . The stages common to all targets (source parsing, PSG elab oration, MLIR emission, MLIR optimization) form a shared preﬁx; the bac kend-speciﬁc stages diverge at the fan-out p oin t: I source ⊂ I PSG ⊂ I MLIR ⊂ I MLIR-opt ⊂ I back end ⊂ I native (7) A t the source level, the compiler knows t yp es and dimensions. At the PSG lev el, it additionally kno ws co eﬀects, escap e classiﬁcations, and saturated annotations. A t the MLIR level, it knows the full program structure in SSA form. At the MLIR optimization lev el, it knows call frequencies and loop nesting. Bey ond this p oint, the information set is back end-sp eciﬁc: the LL VM path adds target-sp eciﬁc parameters for CPU, GPU, MCU, or W ebAssem bly (cache line sizes, pip eline depths, SIMD widths, memory constraints); the CIR CT path adds FPGA resource budgets, timing constraints, and routing top ology; other back ends contribute their o wn target-sp eciﬁc con text. The qualit y of optimization decisions Q improv es with av ailable information: Q ( I k ) < Q ( I k +1 ) for all k (8) This formalizes an arc hitectural principle: decisions that can b e deferred to later compilation stages should b e, b ecause later stages hav e strictly more information. DTS annotations exemplify this. Dimensional information preserved through early stages enables representation selection at the MLIR lev el, where the target architecture is known. Had the dimensions b een discarded at the source lev el (the early-erasure mo del of F#’s Units of Measure), the represen tation selection decision w ould b e imp ossible at the p oin t where it can b e made optimally . The principle extends to memory managemen t. Escap e classiﬁcation (Section 3.2.1 ) is computed during PSG elab oration b ecause it requires type and scop e information. Allocation strategy is resolved during MLIR emission b ecause it requires target memory top ology . Cac he alignmen t, register allo cation, and hardware resource mapping are determined during bac kend- sp eciﬁc lo w ering b ecause they require target-sp eciﬁc parameters (microarc hitectural details for CPU targets via LL VM, resource budgets and timing for FPGA targets via CIR CT). Each decision is made at the stage where its inputs are ﬁrst av ailable, whic h is the stage where the decision can b e made with maxim um con text. 21 6.10 Implications for Numerically Disciplined Mac hine Learning The formal prop erties of DTS ha ve implications for machine learning that the present pap er iden tiﬁes but do es not fully develop. W e note four sp eciﬁc connections that w arran t indep endent in vestigation. Dimensional algebra under di ﬀeren tiation. The dimensional algebra is closed under diﬀeren tiation. If f : R ⟨ d 1 ⟩ → R ⟨ d 2 ⟩ , where ⟨ d ⟩ denotes the dimensional annotation, then: ∂ f ∂ x : R ⟨ d 2 · d − 1 1 ⟩ (9) The gradien t of a loss function with dimension ⟨ loss ⟩ with resp ect to a parameter with dimension ⟨ d ⟩ carries dimension ⟨ loss · d − 1 ⟩ . This prop erty follo ws from the ab elian group structure: diﬀeren tiation is division in the dimensional algebra, and division is closed in Z n . The inference algorithm of Section 2.2 extends to auto-diﬀerentiation graphs without mo diﬁcation: eac h gradient no de inherits a dimension from the chain rule, and dimensional consistency of the full gradien t computation is veriﬁed by the same Gaussian elimination that veriﬁes the forward pass. The practical consequence: in a physics-informed mo del where the loss function includes terms with ph ysical units (force residuals in newtons, energy conserv ation violations in joules), DTS can v erify that gradient accumulation resp ects dimensional consistency . A gradien t with dimension ⟨ newtons / meters ⟩ cannot b e accum ulated with a gradient of dimension ⟨ joules / seconds ⟩ without a dimensional error. This veriﬁcation is decidable, requires no annotation b eyond the physical dimensions already presen t in the forward computation, and has zero runtime cost. F orw ard-mo de diﬀerentiation as a co eﬀect prop erty . Baydin, Pearlm utter, Syme, W o o d, and T orr [ 3 ] demonstrated that the forward gradient, an unbiased estimate of the gradien t computed via forw ard-mo de automatic diﬀerentiation, can replace bac kpropagation en tirely . The forw ard gradient is ev aluated in a single forward pass, eliminating the backw ard pass and the activ ation tap e it requires. This has a speciﬁc coeﬀect signature within the DMM framework. Reverse-mode AD (bac kpropagation) requires storing intermediate activ ations for the bac kw ard pass, imp osing an O ( L ) auxiliary memory requirement where L is the num b er of lay ers. This is a co eﬀect: the bac kward pass r e quir es the activ ation tap e as a con textual resource. T able 8 summarizes the co eﬀect signatures of the t w o mo des. T able 8: Co eﬀect signatures of rev erse-mo de and forw ard-mo de automatic diﬀerentiation. AD Mo de Auxiliary Memory Gradient A ctiv ation T ap e Rev erse-mo de O ( L · B ) Exact (full Jacobian ⊤ ) Required; spans backw ard pass F orw ard-mo de [ 3 ] O (1) per lay er Unbiased estimate Not required The forw ard-mo de co eﬀect signature (no activ ation tap e, O (1) auxiliary memory p er lay er) means the escap e analysis of Section 3.2 is trivially satisﬁed: no intermediate v alues escap e their la yer’s scop e, and the en tire gradient computation is stack-eligible. The co eﬀect system can v erify this prop ert y at compile time: given a computation graph annotated with AD mo de, the lifetime analysis conﬁrms that forward-mode imp oses no lifetime obligations b ey ond the current la yer’s scop e. The quire accumulator (Section 3.5 ) comp ounds this adv an tage. F orw ard-mo de computes a directional deriv ative ∇ v f ( θ ) = ⟨∇ f ( θ ) , v ⟩ for a random p erturbation vector v . The inner pro duct is an accum ulation of pro ducts, exactly the op eration the quire makes exact. The co eﬀect system trac ks the quire’s lifetime through the forward pass iden tically to how it tracks 22 quire lifetime in an y accumulation lo op: allocation at lo op entry , accumulation within the lo op b ody , conv ersion at lo op exit. The con vergence of these three prop erties (DTS verifying dimensional consistency of the gradien t graph, forward-mode eliminating the activ ation tap e co eﬀect, and the quire providing exact accum ulation) pro duces a system where gradient computation is dimensionally v eriﬁed, memory-minimal, and n umerically exact. Each prop erty is indep endently established; their comp osition within the PSG is the no v el contribution. Represen tation selection for neural net w ork v alue distributions. Neural net w ork activ ations and gradien ts hav e well-c haracterized v alue distributions, typically concentrated near zero with heavy tails. The representation selection function of Section 2.6 applies: giv en the dimensional range of activ ations in a sp eciﬁc lay er (inferrable from training statistics or dimensional constraints on the input domain), the compiler can select p osit widths that concen trate precision where the v alues cluster. The quire (Section 3.5 ) pro vides exact gradient accum ulation, eliminating the rounding errors that comp ound across millions of parameters during training. This connection b et w een DTS (which pro vides the dimensional range) and p osit arithmetic (which provides domain-matched precision) is an instance of the representation selection framew ork applied to a sp eciﬁc computational domain. The b ounded p osit (b-p osit) format [ 9 ] extends this connection. ML workloads op erate ov er a narrow er dynamic range than general scientiﬁc computing, typically [10 − 14 , 10 1 ] , whic h p ermits smaller exp onent and regime ﬁeld sizes than the es = 2 , rs = 6 conﬁguration suited to HPC. Gustafson [ 6 ] describ es asymmetric b-p osit conﬁgurations where the precision proﬁle diﬀers for magnitudes b elow and ab o v e 1: a steep er tap er on the left half of the p osit ring (magnitudes < 1 , where most activ ations reside) paired with a ﬂatter, higher-accuracy proﬁle on the right half. An exp onen t bias shift from 2 0 to 2 − 2 or 2 − 3 cen ters the high-precision region on the activ ation distribution’s mo de. Research at the National Univ ersity of Singap ore has demonstrated that suc h conﬁgurations maintain classiﬁcation accuracy down to 5-bit representations, with a sharp accuracy degradation threshold at 4 bits. W e see DTS as a formal mec hanism that could mak e these conﬁgurations selectable at compile time. The dimensional range annotation on a neural net work lay er’s activ ations constrains the v alue distribution; the representation selection function ev aluates candidate b- p osit parameterizations ( es , rs , exp onen t bias) against that distribution. The b-p osit’s b ounded regime ﬁeld ensures that the hardware cost of the selected conﬁguration is predictable, and the format’s cross-precision hardw are reuse prop ert y (Section 2.6 ) means a single deco de unit can serv e 8-bit, 16-bit, and 32-bit b-p osit op erations in a mixed-precision training pip eline. Ph ysics-informed loss term v eriﬁcation. Physics-informed neural netw orks [ 20 ] enco de ph ysical la ws as diﬀerentiable loss terms. A loss term that p enalizes violations of Newton’s second la w would compute F − ma and minimize the squared residual. DTS can verify that F , m , and a carry dimensions ⟨ newtons ⟩ , ⟨ kg ⟩ , and ⟨ m · s − 2 ⟩ resp ectiv ely , and that the subtraction F − ma is dimensionally consistent. This veriﬁcation is a compile-time chec k on the loss function’s structure, not a runtime constraint on the trained mo del’s outputs. It ensures that the physics constrain ts imp osed during training are dimensionally well-formed, a prop ert y that existing ML framew orks cannot verify b ecause dimensional information is nev er enco ded. 7 Conclusion Dimensional T yp e Systems are not a restricted form of dep endent types. They are a distinct formal category with distinct algebraic structure (ﬁnitely generated ab elian groups), distinct computational prop erties (decidable, fully inferrable, principal types), and distinct practical 23 applications (preserv ation through m ulti-stage compilation, m ulti-target resolution, domain-a w are represen tation selection, integration with memory management co eﬀects). The in tegration of DTS with Deterministic Memory Management through a shared co eﬀect discipline in the Program Semantic Graph pro duces a uniﬁed framework for design-time semantic analysis. The compiler’s in ternal representation b ecomes the engineer’s design to ol. Escap e classiﬁcation, allo cation promotion, cache lo calit y estimation, representation ﬁdelity diagnostics, and cross-target transfer analysis are all views o ver the same graph that enforces dimensional consistency . The escap e classiﬁcation taxonom y (Section 3.2.1 ) demonstrates that escape analysis need not b e binary: distinguishing closure capture from return escap e from byref escap e enables targeted allo cation strategies and precise engineering diagnostics. The con v ergence of DTS with p osit arithmetic demonstrates that the framework’s implications extend b ey ond type theory . Gustafson’s p osit representation [ 7 , 6 ] presupp oses that the compiler kno ws which v alue ranges matter; DTS provides the formal mechanism for that knowledge. The b ounded p osit format [ 9 ] resolv es the hardw are eﬃciency concern that has historically limited p osit adoption, making p osit conﬁgurations viable candidates in the represen tation selection function. The quire accum ulator presupp oses that memory management is deterministic and v eriﬁable; DMM as a co eﬀect discipline provides that guarantee. Neither system was designed with the other in mind, yet they comp ose naturally within the PSG b ecause b oth formalize prop erties of numeric computation that existing t yp e systems lea v e implicit. The information accrual principle (Section 6 ) formalizes why preserv ation matters: eac h compilation stage has strictly more information than its predecessor, and decisions made at later stages are strictly b etter informed. Dimensional annotations preserved through early stages enable represen tation selection, escap e-aw are allo cation, and cross-target transfer analysis at the stages where those decisions can b e made optimally . Early erasure forecloses these p ossibilities; dimensional p ersistence enables them. The practical consequence is that the compiler’s in ternal analysis (escape classiﬁcation, allo cation strategy , representation ﬁdelity , cac he residency) is av ailable as design-time feedback without a separate to oling lay er. The PSG serv es b oth roles b ecause the information required for compilation and the information useful for soft ware design are the same information. This pap er has presented three claims. First, that dimensional annotations persisting through compilation enable the compiler to join tly resolve represen tation selection and deterministic memory management, and that this coupling is the reason DTS and DMM b elong in a single framew ork (Sections 1 – 4 ). Second, that the inference machinery derives comp osition-dep enden t prop erties, including dimensional range, escap e classiﬁcation, and representation compatibility , that emerge from constraint in teraction across the program graph and cannot b e replaced b y p er-v alue annotation regardless of prov enance (Sections 2 – 3 ). Third, that the uniﬁed graph enables design-time analysis, including representation ﬁdelity diagnostics and cross-target transfer analysis, that early-erasure systems cannot provide (Sections 4 – 6 ). The p osit quire case study (Section 3.5 ) and the forw ard-mo de auto-diﬀerentiation analysis (Section 6.10 ) illustrate sp eciﬁc applications; the formal properties on whic h they dep end are established in the referenced literature [ 7 , 19 , 6 , 9 , 3 ]. A c knowledgmen ts This pap er ow es a particular debt to John L. Gustafson, whose detailed corresp ondence on p osit arithmetic, b ounded p osit parameterization, and domain-sp eciﬁc precision tuning shap ed ho w the author thinks ab out representation selection. The treatment of asymmetric b-p osit conﬁgurations and hardw are reuse in Sections 2.6 and 6.10 reﬂects his inﬂuence directly . Don Syme’s F# and its Units of Measure system are the type-theoretic substrate from whic h DTS draws its inference architecture. His feedback on this manuscript sharp ened the framing of dimensional p ersistence and the relationship b et ween annotation prov enance and 24 compilation-stage decisions. P aul Snively provided early guidance on veriﬁcation reference materials that op ened a line of in v estigation the author w ould not hav e pursued otherwise; the formal veriﬁcation asp ects of the Fidelit y framework researc h b ear his mark. Martin Coll’s w ork on the Inet dialect for MLIR and his ongoing engagement with the Fidelity pro ject ha ve b een a consistent source of b oth technical insigh t and encouragemen t. Soft w are A v ailability The Clef language, Comp oser compiler, and supp orting libraries describ ed in this pap er are dev elop ed u nder the Fidelity F ramew ork pro ject. Source rep ositories are a v ailable at https: //github.com/FidelityFramework . The language sp eciﬁcation, design rationale, and compiler do cumen tation are published at https://clef- lang.com . Central comp onen ts of the framew ork are dual-licensed; terms are detailed in eac h rep ository . All comp onents referenced in this pap er, including the DTS inference engine, escap e analysis pip eline, and BAREWire in terc hange proto col, are under activ e dev elopment. References [1] AMD/Xilinx. MLIR-AIE: An MLIR-based to olchain for AMD AI engines, 2024. gith ub.com/Xilinx/mlir-aie. [2] Clark Barrett, Aaron Stump, and Cesare Tinelli. The SMT-LIB standard: V ersion 2.0. In Pr o c e e dings of the 8th International W orkshop on Satisﬁability Mo dulo The ories (SMT) , 2010. [3] A tilim Günes Baydin, Barak A. Pearlm utter, Don Syme, F rank W o od, and Philip T orr. Gradien ts without backpropagation. arXiv pr eprint arXiv:2202.08587 , 2022. [4] Edwin Brady . Idris, a general-purp ose dep endently t yp ed programming language: Design and implemen tation. Journal of F unctional Pr o gr amming , 23(5), 2013. [5] Martin Coll. Inet dialect: Declarative rewrite rules for interaction nets. MLIR Op en Design Meeting, 2025. April 10, 2025. Universit y of Buenos Aires. [6] John L. Gustafson. Every Bit Counts: Posit Computing . Chapman & Hall/CR C Computa- tional Science. CR C Press, 2024. [7] John L. Gustafson and Isaac T. Y onemoto. Beating ﬂoating p oint at its o wn game: P osit arithmetic. Sup er c omputing F r ontiers and Innovations , 4(2), 2017. [8] Houston Ha ynes. The program hypergraph: Multi-w a y relational structure for geometric algebra, spatial compute, and ph ysics-a ware compilation, 2026. Companion pap er. Sp eakEZ T echnologies. [9] A dity a Anirudh Jonnalagadda, Rishi Thotli, and John L. Gustafson. Closing the gap b et w een ﬂoat and p osit hardw are eﬃciency . arXiv pr eprint arXiv:2603.01615 , 2025. [10] Ralf Jung, Jacques-Henri Jourdan, Robb ert Krebb ers, and Derek Drey er. R ustBelt: Securing the foundations of the Rust programming language. In Pr o c e e dings of the 45th A CM SIGPLAN-SIGA CT Symp osium on Principles of Pr o gr amming L anguages (POPL) , 2018. [11] By eong jee Kang, Harsh Desai, Limin Jia, and Brandon Lucia. W AMI: Compilation to W ebAssembly through MLIR without losing abstraction. arXiv pr eprint arXiv:2506.16048 , 2025. 25 [12] George Karypis and Vipin Kumar. Multilev el k-wa y hypergraph partitioning. VLSI Design , 11(3):285–300, 2000. [13] Andrew Kennedy . T yp es for units-of-measure: Theory and practice. In Centr al Eur op e an F unctional Pr o gr amming Scho ol , volume 6299 of LNCS . Springer, 2009. [14] Y ves Lafon t. Interaction nets. Pr o c e e dings of the 17th A CM SIGPLAN-SIGA CT Symp osium on Principles of Pr o gr amming L anguages (POPL) , pages 95–108, 1990. [15] Chris Lattner, Mehdi Amini, Uda y Bondh ugula, Albert Cohen, Andy Da vis, Jacques Pienaar, Riv er Riddle, T atiana Shp eisman, Nicolas V asilache, and Oleksandr Zinenko. MLIR: Scaling compiler infrastructure for domain sp eciﬁc computation. In Pr o c e e dings of the IEEE/A CM International Symp osium on Co de Gener ation and Optimization (CGO) , 2021. [16] Daan Leijen. K oka: Programming with row p olymorphic eﬀect t yp es. In Pr o c e e dings of the 5th W orkshop on Mathematic al ly Structur e d F unctional Pr o gr amming (MSFP) , 2014. [17] Ulf Norell. T owar ds a pr actic al pr o gr amming language b ase d on dep endent typ e the ory . PhD thesis, Chalmers Univ ersity of T ec hnology , 2007. [18] T omas Petricek, Dominic Orchard, and Alan Mycroft. Co eﬀects: A calculus of con text- dep enden t computation. In Pr o c e e dings of the 19th A CM SIGPLAN International Confer enc e on F unctional Pr o gr amming (ICFP) , 2014. [19] P osit W orking Group. Standard for p osit arithmetic (2022), 2022. posithub.org. [20] Maziar Raissi, Paris P erdikaris, and George Em Karniadakis. Physics-informed neural net works: A deep learning framework for solving forward and in v erse problems in volving nonlinear partial diﬀeren tial equations. Journal of Computational Physics , 378:686–707, 2019. [21] Alejandro Rico, Saurabh P areek, Javier Cab ezas, Da vid Clarke, et al. AMD XDNA NPU in R yzen AI pro cessors. IEEE Micr o , 44(6):73–83, 2024. [22] Dipan wita Sarkar, Oscar W addell, and R. Kent Dybvig. A nanopass infrastructure for compiler education. In Pr o c e e dings of the Ninth A CM SIGPLAN International Confer enc e on F unctional Pr o gr amming (ICFP ’04) , pages 201–212. A CM, 2004. [23] Matthias Sc hab el and Steven W atanabe. Bo ost.units: Zero-o verhead dimensional analysis and unit/quan tity manipulation and conv ersion, 2008. Bo ost C++ Libraries. [24] Justin Slepak, Panagiotis Manolios, and Olin Shivers. Rank p olymorphism viewed as a constrain t problem. In Pr o c e e dings of the 5th A CM SIGPLAN International W orkshop on Libr aries, L anguages and Compilers for A rr ay Pr o gr amming (ARRA Y@PLDI) , 2018. [25] Justin Slepak, Olin Shiv ers, and Panagiotis Manolios. An array-orien ted language with static rank p olymorphism. In Pr o c e e dings of the 23r d Eur op e an Symp osium on Pr o gr amming (ESOP) , v olume 8410 of LNCS , pages 27–46. Springer, 2014. [26] Nikhil Sw amy , Cătălin Hriţcu, Chantal Keller, Aseem Rastogi, Antoine Delignat-La v aud, Si- mon F orest, Karthik eyan Bhargav an, Cédric F ournet, Pierre-Y v es Strub, Markulf K ohlw eiss, Jean-Karim Zinzindohoué, and San tiago Zanella-Béguelin. Dep endent t yp es and multi- monadic eﬀects in F*. In Pr o c e e dings of the 43r d A CM SIGPLAN-SIGA CT Symp osium on Principles of Pr o gr amming L anguages (POPL) , 2016. 26 A DTS Inference Example Consider the follo wing unannotated Clef function: let computeForce mass1 mass2 distance = let g = 6.674e-11 g * mass1 * mass2 / (distance * distance) The DTS inference pro ceeds as follo ws: 1. g is assigned dimension v ariable ’d_g . 2. mass1 is assigned ’d_m1 , mass2 is assigned ’d_m2 . 3. distance is assigned ’d_dist . 4. g * mass1 generates constraint: d ( result 1 ) = ’d_g + ’d_m1. 5. result_1 * mass2 generates constraint: d ( result 2 ) = ’d_g + ’d_m1 + ’d_m2. 6. distance * distance generates constraint: d ( denom ) = 2 · ’d_dist. 7. result_2 / denom generates constraint: d ( return ) = ’d_g + ’d_m1 + ’d_m2 − 2 · ’d_dist. A t this p oin t, the function is dimensionally p olymorphic: it accepts any com bination of dimen- sions that satisﬁes the algebraic constraints. If the function is called with mass1 : float , mass2 : float , distance : float , uniﬁcation resolves: • ’d_m1 = kg, ’d_m2 = kg, ’d_dist = m • ’d_g = m 3 · kg − 1 · s − 2 (inferred from the kno wn v alue of the gra vitational constant, or from the return t yp e if annotated as float ) • Return dimension: m 3 · kg − 1 · s − 2 + kg + kg − 2 · m = kg · m · s − 2 = newtons ✓ The inference is complete without an y dimensional annotations in the source co de. B Escap e Analysis and Restructuring Example Consider: let processReadings (sensors: Span< float >) = let readings = sensors ▷ Span.map ( fun s → s * calibrationFactor) let summary = summarize readings (readings, summary) The co eﬀect analysis determines: 1. readings is created from a Span.map op eration. T entativ e lifetime: lexical scop e of processReadings . 2. readings is used in summarize readings . Required lifetime: lexical scop e of processReadings . No promotion needed for this usage. 3. readings app ears in the return tuple (readings, summary) . Required lifetime: caller’s scop e. This exceeds the tentativ e lifetime. 4. Promotion: readings lifetime is promoted from stac k (lexical scop e) to arena (caller’s scop e). The language serv er surfaces the promotion and prop oses three alternativ es: 27 Alternativ e 1: Caller-pro vided buﬀer. let processReadings (sensors: Span< float >) (output: Span< float >) = sensors ▷ Span.mapInto output ( fun s → s * calibrationFactor) summarize output Co eﬀect: no escap e, stack-eligible. Allo cation cost: zero (caller o wns the buﬀer). Alternativ e 2: Con tin uation st yle. let processReadings (sensors: Span< float >) (k: Span< float > → Summary → ’a) = let readings = sensors ▷ Span.map ( fun s → s * calibrationFactor) k readings (summarize readings) Co eﬀect: no escap e, stack-eligible. Allo cation cost: zero (con tin uation runs within frame). Alternativ e 3: Explicit annotation. let processReadings [] (sensors: Span< float >) = let readings = sensors ▷ Span.map ( fun s → s * calibrationFactor) let summary = summarize readings (readings, summary) Co eﬀect: declared arena allo cation. Allo cation cost: arena allocation (amortized). PSG annotation: conﬁrmed inten t, stable under dep endency changes. C Represen tation Selection with P osit Arithmetic Consider a gravitational force computation compiled for tw o targets: x86_64 (CPU) and a Xilinx FPGA with a p osit arithmetic pip eline. let computeForce (m1: float ) (m2: float ) (r: float ) : float = let g = 6.674e-11 g * m1 * m2 / (r * r) The DTS inference resolves the return dimension as newtons ( kg · m · s − 2 ). The compiler’s represen tation selection pro ceeds p er target: x86_64 target. The platform binding sp eciﬁes IEEE 754 float64 as the default numeric represen tation. The dimensional range of the gra vitational constan t ( 6 . 674 × 10 − 11 ) com bined with plausible mass and distance ranges (planetary: 10 22 to 10 30 kg, 10 6 to 10 11 m) pro duces force v alues spanning roughly 10 − 2 to 10 25 newtons. IEEE 754 float64 co vers this range with uniform relativ e error of ≈ 1 . 11 × 10 − 16 , w ell within engineering precision. Selection: float64 . Xilinx FPGA target. The platform binding sp eciﬁes posit32 ( es = 2 ) as the preferred represen tation. The dynamic range of p osit32 extends to approximately 10 ± 36 . The dimensional range [10 − 2 , 10 25 ] newtons falls well within this b ound. Posit32 with es = 2 provides approxi- mately 2 − 27 relativ e error near 1.0, degrading to 2 − 8 at the regime extremes. F or forces near 10 0 newtons (the most common case in n-b o dy simulation), p osit32 pro vides b etter precision than ﬂoat32 and comparable precision to ﬂoat64. 28 The compiler selects posit32 for the FPGA target and emits the force computation into the p osit arithmetic pip eline: regime extraction, fraction m ultiplication in DSP48 slices, accum ulation in the quire. The quire p ersists for exactly the duration of the accumulation lo op, a 512-bit v alue in the FPGA fabric. The language serv er displays the cross-target resolution: computeForce: float → float → float → float +-- x86_64: float64 → float64 → float64 → float64 | Precision: 1.11e-16 relative error (uniform) | Quire: not used (no accumulation loop detected) +-- xilinx: posit32 → posit32 → posit32 → posit32 | Precision: ~1.5e-9 in [0.01, 100], ~3.9e-3 at regime extremes | Quire: available, 512-bit fabric pipeline | Dynamic range: [1e-36, 1e36] covers [1e-2, 1e25] +-- Transfer (xilinx → x86_64): posit32 → float64 Protocol: BAREWire over PCIe Fidelity: 1.0 (lossless; float64 range exceeds posit32 range) The cross-target transfer ﬁdelit y of 1.0 (lossless) is a consequence of the dimensional analysis: ev ery p osit32 v alue within its represen table range is exactly represen table in ﬂoat64, which co v ers 10 ± 308 . The compiler pro ves this at compile time from the representation sp eciﬁcations. A transfer in the opp osite direction (ﬂoat64 → p osit32) w ould show ﬁdelity < 1 . 0 with a precision loss estimate deriv ed from the dimensional range. This example illustrates the full DTS+DMM pip eline for p osit arithmetic: dimensional inference determines the v alue range, representation selection chooses the n umeric format p er target, the quire’s allo cation and lifetime are resolved as co eﬀects, and the language server presen ts the complete picture as an interactiv e design-time diagnostic. 29

Dimensional Type Systems and Deterministic Memory Management: Design-Time Semantic Preservation in Native Compilation

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment