Relating formal grammars is a hard problem that balances between language equivalence (which is known to be undecidable) and grammar identity (which is trivial). In this paper, we investigate several milestones between those two extremes and propose a methodology for inconsistency management in grammar engineering. While conventional grammar convergence is a practical approach relying on human experts to encode differences as transformation steps, guided grammar convergence is a more narrowly applicable technique that infers such transformation steps automatically by normalising the grammars and establishing a structural equivalence relation between them. This allows us to perform a case study with automatically inferring bidirectional transformations between 11 grammars (in a broad sense) of the same artificial functional language: parser specifications with different combinator libraries, definite clause grammars, concrete syntax definitions, algebraic data types, metamodels, XML schemata, object models.
# Introduction
Modern grammar theory has shifted its focus from general purpose
programming languages to a broader scope of software languages that
comprise programming languages, domain specific languages, markup
languages, API libraries, interaction protocols, etc . Such software
languages are specified by grammars in a broad sense that still rely
on the familiar infrastructure of terminals, nonterminals and production
rules, but specify general commitment to grammatical structure found in
software systems. In that sense, a type safe program commits to a
particular type system; a program that uses a library, commits to using
its exposed interface; an XML document commits to the structure defined
by its schema — failure to commit in any of these cases would mean
errors in interpretation of the language entity. These, and many other,
scenarios can be expressed and resolved in terms of grammar technology,
but not all structural commitments profit from grammatical approach (as
the most remarkably problematic ones we can note indentation policies
and naming conventions).
One of the problems of multiple implementations of the same language,
which is known for many years, is having an abstract syntax definition
and a concrete syntax definition . Basically, the abstract syntax
defines the kind of entities that inhabit the language and must be
handled by semantics specification. A concrete syntax shows how to write
down language entities and how to read them back. It is not uncommon for
a programming language to have several possible concrete syntaxes: for
example, any binary operation may use prefix, infix or postfix notation,
without any changes to the language semantics. Indeed, we have seen
infix dialects of postfix Forth (Forthwrite, InfixForth) and prefix
dialects of infix REBOL (Boron). For software languages, the problem is
broader: we can speak of one intended language specification and a
variety of abstract and concrete syntaxes, data models, class
dictionaries, metamodels, ontologies and similar contracts that conform
to it.
Our definition of the intended language relies on bidirectional
transformations and in particular on their notation by Meertens , which
we redefine here for the sake of completeness and clarity:
For a relation
$`R \subseteq S \times T`$, a semi-maintainer is a function
$`\updr:S \times T \to T`$, such that
$`\forall x\in S, \forall y \in T, \langle x, x \updr y \rangle \in R`$,
and
$`\forall x\in S, \forall y \in T, \langle x, y \rangle \in R \Rightarrow x \updr y = y`$.
The first property is called correctness and ensures that the update
caused by the semi-maintainer restores the relation. The second property
is hippocraticness and states that an update has no effect (“does no
harm”), if the original pair is already in the relation . Other
properties of bidirectional transformations such as undoability are
often unachievable. A maintainer is a pair of semi-maintainers
$`\updr`$ and $`\updl`$. A bidirectional mapping is a relation and its
maintainer.
A grammar $`G`$
conforms to the language intended by the master grammar $`M`$, if
there exists a bidirectional mapping between instances of their
languages.
\begin{align*}
G \models L(M) \iff\:
&\exists R \subseteq L(G) \times L(M)\\
&\exists \updr:L(G)\times L(M) \to L(M)\\
&\exists \updl:L(G)\times L(M) \to L(G)
\end{align*}
Click to expand and view more
Naturally, for any grammar holds $`G\models L(G)`$.
For example, consider a concrete syntax $`G_c`$ of a programming
language used by programmers and an abstract syntax $`M=G_a`$ used by a
software reengineering tool. We would need to produce abstract syntax
trees from parse trees and to propagate changes done by a reengineering
tool, back to parse trees. If those can be constructed — examples of
algorithms have been seen , — then $`G_c`$ conforms to the language
intended by $`G_a`$. As another example, consider an object model used
in a tool that stores its objects in an external database (XML or
relational): the existence of a bidirectional mapping between entries
(trees or tables) in the database and the objects in memory, means that
they represent the same intended language, even though they use very
different ways to describe it and one may be a superlanguage of the
other. For a more detailed formalisation and discussion of bidirectional
mappings and grammars, a reader is redirected elsewhere .
Roadmap. In the following sections, we will briefly present the
following milestones of relationships between languages:
§7. Grammar identity: structural
equality of grammars
§14. *Nominal eq
This content is AI-processed based on open access ArXiv data.