On Decision-Valued Maps and Representational Dependence
A computational engine applied to different representations of the same data can produce different discrete outcomes, with some representations preserving the result and others changing it entirely. A decision-valued map records which representations preserve the outcome and which change it, associating each member of a declared representation family with the discrete result it produces. This paper formalizes decision-valued maps and describes DecisionDB, an infrastructure that logs, replays and audits these relationships using identifiers computed from content and artifacts stored in write-once form. Deterministic replay recovers each recorded decision identifier exactly from stored artifacts, with all three identifying fields matching their persisted values. The contribution partitions representation space into persistence regions and boundaries, and treats decision reuse as a mechanically checkable condition.
💡 Research Summary
The paper introduces a formalism called a Decision‑Valued Map (DVM) to capture how different deterministic representations of the same underlying data affect the discrete outcome produced by a fixed computational engine. A DVM is a function f : R → D that maps each representation r in a family R (generated from a frozen snapshot s by varying explicit parameters) to a decision identity d in a set D, after the engine E processes r and an equivalence policy π reduces the raw output to an identifier. The authors define three structural concepts: persistence regions (connected subsets of R where f is constant), boundaries (where f changes), and fractures (boundaries where an infinitesimal parameter change already flips the decision).
To materialize and audit such maps, the authors built DecisionDB, a Python‑SQLite package that stores every artifact (snapshot, representation, engine run, policy, decision) as an immutable, content‑addressed object. Identifiers are derived from a canonical JSON serialization and a truncated SHA‑256 hash, guaranteeing that identical content always yields the same ID regardless of when or where it is computed. The relational schema consists of five tables (snapshots, representations, engine_runs, decisions, f_map) linked by foreign‑key constraints, forming a complete provenance chain from the original world state to the final decision. All writes are append‑only and idempotent.
The “sweep protocol” operationalizes the DVM construction: (1) freeze a snapshot and assign a content‑addressed ID; (2) declare a representation family via a deterministic factory and define a parameter grid; (3) create a sweep plan that includes the engine configuration and the equivalence policy; (4) run the engine independently for each representation, persisting raw outputs as immutable files; (5) apply the policy to extract decision IDs and populate the f_map table. Because every step is content‑addressed, a replay verification step can reload the stored raw output and policy, recompute the three identifiers (policy ID, payload hash, decision ID), and confirm they match the persisted values. Successful replay proves the entire pipeline is deterministic and self‑consistent.
The authors demonstrate the approach on a graph‑routing task. A directed graph with 564 nodes and fixed edge attributes serves as the snapshot. Two representation parameters control edge‑cost construction: an “neighbor weight” and a “second‑order weight”. The engine is a Dijkstra shortest‑path solver, and the policy hashes the ordered list of node IDs in the resulting path, thereby defining decision identities as distinct routes. Sweeping the neighbor weight from 0.5 to 1.0 leaves the path unchanged (Decision A), revealing a persistence region. Sweeping the second‑order weight from 0.25 to 0.5 causes the path to switch from a 16‑node route (Decision A) to a 14‑node route (Decision B), exposing a fracture boundary between the two parameter values. All raw outputs and decisions are stored immutably, and replay verification confirms that policy ID, payload hash, and decision ID are reproduced exactly.
In the related‑work discussion, the paper positions DVM against specification‑curve analysis, multiverse analysis, sensitivity analysis, and reproducibility efforts in machine learning. While those methods focus on continuous effect estimates, variance decomposition, or training‑run variability, DVM uniquely addresses discrete decision outcomes and maps the topology of representation space. It can be combined with sensitivity indices (by binarizing the decision map) but does not require a continuous output metric. Moreover, the content‑addressed provenance model provides stronger auditability than typical reproducibility checklists, which often omit deterministic replay and immutable storage.
Overall, the contribution is threefold: (1) a formal definition of decision‑valued maps that partitions representation space into persistence regions and boundaries; (2) an infrastructure (DecisionDB) that records the full provenance chain using content‑addressed identifiers, enabling deterministic replay and audit; (3) an empirical illustration showing how small representation changes can either preserve or fracture decision identity. The framework is applicable to any analytical pipeline where discrete outcomes depend on representation choices—routing, classification, resource allocation, etc.—and offers a systematic way to assess decision robustness, support reuse, and satisfy regulatory or legal audit requirements.
Comments & Academic Discussion
Loading comments...
Leave a Comment