Guided Grammar Convergence

Reading time: 5 minute
...

📝 Original Info

  • Title: Guided Grammar Convergence
  • ArXiv ID: 1503.08476
  • Date: 2015-03-31
  • Authors: Vadim Zaytsev

📝 Abstract

Relating formal grammars is a hard problem that balances between language equivalence (which is known to be undecidable) and grammar identity (which is trivial). In this paper, we investigate several milestones between those two extremes and propose a methodology for inconsistency management in grammar engineering. While conventional grammar convergence is a practical approach relying on human experts to encode differences as transformation steps, guided grammar convergence is a more narrowly applicable technique that infers such transformation steps automatically by normalising the grammars and establishing a structural equivalence relation between them. This allows us to perform a case study with automatically inferring bidirectional transformations between 11 grammars (in a broad sense) of the same artificial functional language: parser specifications with different combinator libraries, definite clause grammars, concrete syntax definitions, algebraic data types, metamodels, XML schemata, object models.

💡 Deep Analysis

📄 Full Content

# Introduction

Modern grammar theory has shifted its focus from general purpose programming languages to a broader scope of software languages that comprise programming languages, domain specific languages, markup languages, API libraries, interaction protocols, etc . Such software languages are specified by grammars in a broad sense that still rely on the familiar infrastructure of terminals, nonterminals and production rules, but specify general commitment to grammatical structure found in software systems. In that sense, a type safe program commits to a particular type system; a program that uses a library, commits to using its exposed interface; an XML document commits to the structure defined by its schema — failure to commit in any of these cases would mean errors in interpretation of the language entity. These, and many other, scenarios can be expressed and resolved in terms of grammar technology, but not all structural commitments profit from grammatical approach (as the most remarkably problematic ones we can note indentation policies and naming conventions).

One of the problems of multiple implementations of the same language, which is known for many years, is having an abstract syntax definition and a concrete syntax definition . Basically, the abstract syntax defines the kind of entities that inhabit the language and must be handled by semantics specification. A concrete syntax shows how to write down language entities and how to read them back. It is not uncommon for a programming language to have several possible concrete syntaxes: for example, any binary operation may use prefix, infix or postfix notation, without any changes to the language semantics. Indeed, we have seen infix dialects of postfix Forth (Forthwrite, InfixForth) and prefix dialects of infix REBOL (Boron). For software languages, the problem is broader: we can speak of one intended language specification and a variety of abstract and concrete syntaxes, data models, class dictionaries, metamodels, ontologies and similar contracts that conform to it.

Our definition of the intended language relies on bidirectional transformations  and in particular on their notation by Meertens , which we redefine here for the sake of completeness and clarity:

For a relation $`R \subseteq S \times T`$, a semi-maintainer is a function $`\updr:S \times T \to T`$, such that $`\forall x\in S, \forall y \in T, \langle x, x \updr y \rangle \in R`$, and $`\forall x\in S, \forall y \in T, \langle x, y \rangle \in R \Rightarrow x \updr y = y`$.

The first property is called correctness and ensures that the update caused by the semi-maintainer restores the relation. The second property is hippocraticness and states that an update has no effect (“does no harm”), if the original pair is already in the relation . Other properties of bidirectional transformations such as undoability are often unachievable. A maintainer is a pair of semi-maintainers $`\updr`$ and $`\updl`$. A bidirectional mapping is a relation and its maintainer.

A grammar $`G`$ conforms to the language intended by the master grammar $`M`$, if there exists a bidirectional mapping between instances of their languages.

MATH
\begin{align*}
            G \models L(M) \iff\:
            &\exists R \subseteq L(G) \times L(M)\\
            &\exists \updr:L(G)\times L(M) \to L(M)\\
            &\exists \updl:L(G)\times L(M) \to L(G)
\end{align*}
Click to expand and view more

Naturally, for any grammar holds $`G\models L(G)`$.

For example, consider a concrete syntax $`G_c`$ of a programming language used by programmers and an abstract syntax $`M=G_a`$ used by a software reengineering tool. We would need  to produce abstract syntax trees from parse trees and  to propagate changes done by a reengineering tool, back to parse trees. If those can be constructed — examples of algorithms have been seen , — then $`G_c`$ conforms to the language intended by $`G_a`$. As another example, consider an object model used in a tool that stores its objects in an external database (XML or relational): the existence of a bidirectional mapping between entries (trees or tables) in the database and the objects in memory, means that they represent the same intended language, even though they use very different ways to describe it and one may be a superlanguage of the other. For a more detailed formalisation and discussion of bidirectional mappings and grammars, a reader is redirected elsewhere .

Roadmap. In the following sections, we will briefly present the following milestones of relationships between languages:

§7. Grammar identity: structural equality of grammars

§14. *Nominal eq

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut