Associative Array Model of SQL, NoSQL, and NewSQL Databases
The success of SQL, NoSQL, and NewSQL databases is a reflection of their ability to provide significant functionality and performance benefits for specific domains, such as financial transactions, internet search, and data analysis. The BigDAWG polystore seeks to provide a mechanism to allow applications to transparently achieve the benefits of diverse databases while insulating applications from the details of these databases. Associative arrays provide a common approach to the mathematics found in different databases: sets (SQL), graphs (NoSQL), and matrices (NewSQL). This work presents the SQL relational model in terms of associative arrays and identifies the key mathematical properties that are preserved within SQL. These properties include associativity, commutativity, distributivity, identities, annihilators, and inverses. Performance measurements on distributivity and associativity show the impact these properties can have on associative array operations. These results demonstrate that associative arrays could provide a mathematical model for polystores to optimize the exchange of data and execution queries.
💡 Research Summary
The paper proposes a unified mathematical framework for the three dominant database paradigms—SQL (relational), NoSQL (key‑value/graph), and NewSQL (matrix‑oriented)—by modeling them all as associative arrays. An associative array is defined as a sparse matrix whose row and column keys can be any totally ordered set (strings, numbers, etc.), not just integer indices. The authors first recast the relational model in this language: a relation becomes an associative array, and classic relational operators (selection, projection, join, union, intersection) are expressed as element‑wise addition (⊕), element‑wise multiplication (⊗), and matrix multiplication (·). They then show that graph operations used in NoSQL and linear‑algebraic operations used in NewSQL map to the same three primitives, establishing a common algebraic core. Crucially, the associative‑array algebra satisfies key algebraic properties—associativity, commutativity, distributivity, identities, annihilators, and inverses—providing a solid basis for query optimization and operator reordering across heterogeneous systems. Experimental results demonstrate that exploiting associativity and distributivity can dramatically reduce memory consumption and execution time for large‑scale matrix multiplications and element‑wise computations. The paper also describes practical integration via the D4M (Dynamic Distributed Dimensional Data Model) library and the GraphBLAS specification, showing how these tools expose associative arrays as a lingua franca for interacting with PostgreSQL, Accumulo, SciDB, and other engines. Within the BigDAWG polystore architecture, associative arrays serve as the intermediate representation that enables transparent data casting, analytic translation, and automatic routing to the most suitable engine for a given workload. The authors conclude that associative arrays not only bridge the mathematical gap among SQL, NoSQL, and NewSQL but also lay the groundwork for future work on cost modeling, transactional semantics, and security considerations in polystore environments.
Comments & Academic Discussion
Loading comments...
Leave a Comment