A Language for Generic Programming in the Large

A Language for Generic Programming in the Lar ge Jeremy G. Siek a and Andre w Lumsdaine b a Deptartment of Electrical and Computer Engineering, University of Color ado at Boulder , USA b Computer Science Department, Indiana University , USA Abstract Generic programming is an effecti v e methodology for dev eloping reusable software libraries. Many pro- gramming languages provide generics and ha ve features for describing interfaces, b ut none completely support the idioms used in generic programming. T o address this need we dev eloped the language G . The central feature of G is the concept , a mechanism for or ganizing constraints on generics that is inspired by the needs of modern C ++ libraries. G provides modular type checking and separate compilation (even of generics). These characteristics support modular software dev elopment, especially the smooth integration of independently de veloped components. In this article we present the rationale for the design of G and demonstrate the expressi veness of G with two case studies: porting the Standard T emplate Library and the Boost Graph Library from C ++ to G . The design of G shares much in common with the concept e xtension proposed for the next C ++ Standard (the authors participated in its design) but there are important dif ferences described in this article. K ey wor ds: programming language design, generic programming, generics, polymorphism, concepts, associated types, software reuse, type classes, modules, signatures, functors, virtual types 1. Introduction The 1968 N A TO Conference on Software Engineering identiﬁed a software crisis affect- ing lar ge systems such as IBM’ s OS/360 and the SABRE airline reservation system [1, 2]. At this conference McIlro y gave an invited talk entitled Mass-pr oduced Softwar e Components [3] proposing the systematic creation of reusable software components as a solution to the software crisis. He observed that most software is created from similar building blocks, so programmer productivity would be increased if a standard set of blocks could be shared among many soft- ware products. W e are beginning to see the beneﬁts of softw are reuse; Douglas McIlroy’ s vision is gradually becoming a reality . The number of commercial and open source software libraries Email addr esses: jeremy.siek@colorado.edu (Jeremy G. Siek), lums@osl.iu.edu (Andrew Lumsdaine). Preprint submitted to Elsevier 30 October 2018 is steadily growing and application b uilders often turn to libraries for user -interface components, database access, report creation, numerical routines, and network communication, to name a few . Furthermore, larger software companies have beneﬁted from the creation of in-house domain- speciﬁc libraries which they use to support entire software product lines [4]. As the ﬁeld of software engineering progresses, we learn better techniques for building reusable software. In the 1980s Musser and Stepanov dev eloped a methodology for creating highly reusable algorithm libraries [5, 6, 7, 8], using the term generic pr ogr amming for their work. 1 Their ap- proach was nov el in that the y wrote algorithms not in terms of particular data structures b ut rather in terms of abstract requirements on structures based on the needs of the algorithm. Such generic algorithms could operate on any data structure provided that it meet the speciﬁed requirements. Preliminary versions of their generic algorithms were implemented in Scheme, Ada, and C. In the early 1990s Stepanov and Musser took advantage of the template feature of C ++ [9] to con- struct the Standard T emplate Library (STL) [10, 11]. The STL became part of the C ++ Standard, which brought generic programming into the mainstream. Since then, the methodology has been successfully applied to the creation of libraries in numerous domains [12, 13, 14, 15, 16]. The ease with which programmers dev elop and use generic libraries varies greatly depend- ing on the language features av ailable for expressing polymorphism and requirements on type parameters. In 2003 we performed a comparativ e study of modern language support for generic programming [17]. The initial study included C ++ , SML, Haskel, Eiffel, Jav a, and C#, and we ev aluated the languages by porting a representative subset of the Boost Graph Library [13] to each of them. W e recently updated the study to include OCaml and Cecil [18]. While some languages performed quite well, none were ideal for generic programming. Unsatisﬁed with the state of the art, we beg an to in vestigate ho w to improv e language support for generic programming. In general we wanted a language that could express the idioms of generic programming while also providing modular type chec king and separate compilation . In the context of generics, modular type checking means that a generic function or class can be type checked independently of any instantiation and that the type check guarantees that any well- typed instantiation will produce well-typed code. Separate compilation is the ability to compile a generic function to nativ e assembly code that can be linked into an application in constant time. Our desire for modular type checking was a reaction to serious problems that plague the de- velopment and use of C ++ template libraries. A C ++ template deﬁnition is not type checked until after it is instantiated, making templates difﬁcult to validate in isolation. Even worse, clients of template libraries are exposed to confusing error messages when they accidentally misuse the library . For example, the following code tries to use stable_sort with the iterators from the list class. std::list< int > l; std::stable_sort(l.begin(), l.end()); Fig. 1 sho ws a portion of the error message from GNU C ++ . The error message includes functions and types that the client should not have to know about such as __inplace_stable_sort and _List_iterator . It is not clear from the error message who is responsible for the error . The error message points inside the STL so the client might conclude that there is an error in the STL. This problem is not speciﬁc to the GNU C ++ implementation, but is instead a symptom of the delayed type checking mandated by the C ++ language deﬁnition. 1 The term generic programming is often used to mean an y use of generics, i.e., an y use of parametric polymorphism or templates. The term is also used in the functional programming community for function generation based on algebraic datatypes, i.e., polytypic programming. Here, we use generic programming solely in the sense of Musser and Stepanov . 2 stl algo.h: In function ‘void std:: inplace stable sort( RandomAccessIter, RandomAccessIter) [with RandomAccessIter = std:: List iterator < int, int&, int ∗ > ]’: stl algo.h:2565: instantiated from ‘void std::stable sort( RandomAccessIter, RandomAccessIter) [with RandomAccessIter = std:: List iterator < int, int&, int ∗ > ]’ stable sort error.cpp:5: instantiated from here stl algo.h:2345: error: no match for ‘std:: List iterator < int, int&, int ∗ > & std:: List iterator < int, int&, int ∗ > &’ operator stl algo.h:2565: instantiated from ‘void std::stable sort( RandomAccessIter, RandomAccessIter) [with RandomAccessIter = std:: List iterator < int, int&, int ∗ > ]’ stable sort error.cpp:5: instantiated from here stl algo.h:2349: error: no match for ‘std:: List iterator < int, int&, int ∗ > & std:: List iterator < int, int&, int ∗ > &’ operator stl algo.h:2352: error: no match for ‘std:: List iterator < int, int&, int ∗ > & std:: List iterator < int, int&, int ∗ > &’ operator Fig. 1. A portion of the error message from a misuse of stable_sort . Our desire for separate compilation was driven by the increasingly long compile times we (and others) were e xperiencing when composing sophisticated template libraries. W ith C ++ templates, the compilation time of an application is a function of the amount of code in the application plus the amount of code in all template libraries used by the application (both directly and indirectly). W e would much prefer a scenario where generic libraries can be separately compiled so that the compilation time of an application is just a function of the amount of code in the application. W ith these desiderata in hand we began laying the theoretical groundwork by developing the calculus F G [19]. F G is based on System F [20, 21], the standard calculus for parametric poly- morphism, and like System F , F G has a modular type checker and can be separately compiled. In addition, F G provides features for precisely expressing constraints on generics, introducing the concept feature with support for associated types and same-type constraints. The main tech- nical challenge overcome in F G is dealing with type equality inside of generic functions. One of the key design choices in F G is that models are lexically scoped, making F G more modular than Haskell in this regard. (W e discuss this in more detail in Section 3.6.1.) Concurrently with our work on F G , Chakrav arty , Keller , and Peyton Jones responded to our comparati ve study by dev eloping an extension to Haskell to support associated types [22, 23]. The next step after F G was to add two more features needed to express generic libraries: concept-based overloading (used for algorithm specialization) and implicit argument deduc- tion. Fully general implicit argument deduction is non-trivial in the presence of ﬁrst-class poly- mophism (which is present in G ), but some mild restrictions make the problem tractable (Sec- tion 3.5). Ho wever , we discovered a a deep tension between concept-based ov erloading and sep- arate compilation [24]. At this point our work bifurcated into two language designs: the lan- guage G which supports separate compilation and only a basic form of concept-based overload- ing [25, 26], and the concepts extension to C ++ [27], which provides full support for concept- based ov erloading but not separate compilation. For the next re vision of the C ++ Standard, popu- larly referred to as C ++ 0X, separate compilation for templates was not practical because the lan- guage already included template specialization, a feature that is also deeply incompatible with separate compilation. Thus, for C ++ 0X it made sense to provide full support for concept-based ov erloading. F or G we placed separate compilation as a higher priority , leaving out template spe- cialization and requiring programmers to work around the lack of full concept-based overloading (see Section X). T able 1 shows the results of our comparativ e study of language support for generic program- ming [18] augmented with new columns for C ++ 0X and G and augmented with three new rows: modular type checking (pre viously part of “separate compilation”), lexically scoped models, and concept-based ov erloading. T able 2 giv es a brief description of the e valuation criteria. The rest of this article describes the design of G in detail. W e revie w the essential ideas of generic programming and surve y of the idioms used in the Standard T emplate Library (Sec- 3 T able 1 The level of support for generic programming in se veral languages. A black circle indicates full support for the feature or characteristic whereas a white circle indices lack of support. The rating of “-” in the C ++ column indicates that while C ++ does not explicitly support the feature, one can still program as if the feature were supported. C ++ SML OCaml Haskell Jav a C# Cecil C ++ 0X G Multi-type concepts - # ∗ # # G # Multiple constraints - G # G # Associated type access G # † G # G # G # Constraints on assoc. types - † G # G # Retroactiv e modeling - # # T ype aliases # # # Separate compilation # G # # Implicit arg. deduction # G # Modular type checking # G # G # G # Lexically scoped models # # # # # # # Concept-based overloading # # # # # G # ∗ Using the multi-parameter type class extension to Haskell [28]. † Using the proposed associated types extension to Haskell [23]. T able 2 Glossary of Evaluation Criteria Criterion Deﬁnition Multi-type concepts Multiple types can be simultaneously constrained. Multiple constraints More than one constraint can be placed on a type parameter . Associated type access T ypes can be mapped to other types within the context of a generic function. Constraints on associated types Concepts may include constraints on associated types. Retroactiv e modeling The ability to add new modeling relationships after a type has been deﬁned. T ype aliases A mechanism for creating shorter names for types is provided. Separate compilation Generic functions can be compiled independently of calls to them. Implicit argument deduction The arguments for the type parameters of a generic function can be deduced and do not need to be explicitly provided by the programmer . Modular type checking Generic functions can be compiled independently of calls to them. Lexically scoped models Model declarations are treated like any other declaration, and are in scope for the remainder of enclosing namespace. Models may be explicitly imported from other namespaces. Concept-based overloading There can be multiple generic functions with the same name but differing constraints. For a particular call, the most speciﬁc ov erload is chosen. tion 2). This provides the motiv ation for the design of the language features in G (Section 3). W e then e valuate G with respect to a port of the Standard T emplate Library (Section 4) and the Boost Graph Library (Section 5). W e conclude with a surve y of related work (Section 6) and with the future directions for our work (Section 7). This article is an updated and greatly extended version of [26], providing a more detailed 4 Generic programming is a sub-discipline of computer science that deals with ﬁnding abstract represen- tations of efﬁcient algorithms, data structures, and other software concepts, and with their systematic organization. The goal of generic programming is to e xpress algorithms and data structures in a broadly adaptable, interoperable form that allows their direct use in softw are construction. Ke y ideas include: – Expressing algorithms with minimal assumptions about data abstractions, and vice versa, thus making them as interoperable as possible. – Lifting of a concrete algorithm to as general a le vel as possible without losing ef ﬁciency; i.e., the most abstract form such that when specialized back to the concrete case the result is just as efﬁcient as the original algorithm. – When the result of lifting is not general enough to cov er all uses of an algorithm, additionally pro- viding a more general form, but ensuring that the most efﬁcient specialized form is automatically chosen when applicable. – Providing more than one generic algorithm for the same purpose and at the same level of abstrac- tion, when none dominates the others in efﬁciency for all inputs. This introduces the necessity to provide suf ﬁciently precise characterizations of the domain for which each algorithm is the most efﬁcient. Fig. 2. Deﬁnition of Generic Programming from Jazayeri, Musser , and Loos[29] rationale for the design of G and extending our previous comparati ve study to include G by ev aluating a port of the Boost Graph Library to G . 2. Generic Programming and the STL Fig. 2 reproduces the standard deﬁnition of generic programming from Jazayeri, Musser , and Loos [29]. The generic programming methodology always consists of the following steps: 1) identify a family of useful and efﬁcient concrete algorithms with some commonality , 2) resolve the differences by forming higher-lev el abstractions, and 3) lift the concrete algorithms so they operate on these new abstractions. When applicable, there is a fourth step to implement automatic selection of the best algorithm, as described in Fig. 2. 2.1. T ype r equir ements, concepts, and models The merge algorithm from the STL, sho wn in Fig. 3, serv es as a good example of generic pro- gramming. The algorithm does not directly work on a particular data structure, such as an array or linked list, but instead operates on an abstract entity , a concept. A concept is a collection of requirements on a type, or to look at it a dif ferent way , it is the set of all types that satisfy the re- quirements. F or e xample, the Input Iterator concept requires that the type hav e an increment and dereference operation, and that both are constant-time operations. (W e italicize concept names.) A type that meets the requirements is said to model the concept. (It helps to read “models” as “implements”.) For example, the models of the Input Iter ator concept include the b uilt-in pointer types, such as int* , the iterator type for the std::list class, and the istream_iterator adaptor . Constraints on type parameters are primarily expressed by requiring the corresponding type arguments to model certain concepts. In the merge template, the argument for InIter1 is required to model the Input Iterator concept. T ype requirements are not expressible in C ++ , so the con vention is to specify type requirements in comments or documentation as in Fig. 3. 5 Fig. 3. The merge algorithm in C++. template < typename InIter1, typename InIter2, typename OutIter> // wher e InIter1 models Input Iterator, InIter2 models Input Iterator. // OutIter models Output Iterator, writing the value type of InIter1. // The value type of InIter1 and InIter2 are the same type. // The value type of InIter1 is Less Than Comparable. OutIter merge(InIter1 first1, InIter1 last1, InIter2 first2, InIter2 last2, OutIter result) { while (first1 != last1 && first2 != last2) { if (*first2 < *first1) { *result = *first2; ++first2; } else { *result = *first1; ++first1; } ++result; } return copy(first2, last2, copy(first1, last1, result)); } The type requirements for merge refer to relationships between types, such as the value_type of InIter1 . This is an example of an associated type , which maps between types that are part of a concept. The merge algorithm also needs to express that the value_type of InIter1 and InIter2 are the same, which we call same-type constraints . Furthermore, the merge algorithm includes an example of how associated types and modeling requirements can be combined: the value_type of the input iterators is required to be Less Than Comparable . Fig. 4 shows the deﬁnition of the Input Iterator concept following the presentation style used in the SGI STL documentation [30, 31]. In the description, the variable X is used as a place holder for the modeling type. The Input Iterator concept requires se veral associated types: value_type , difference_type , and iterator_category . Associated types change from model to model. For example, the associated value_type for int * is int and the associated value_type for list< char >::iterator is char . The Input Iter ator concept requires that the associated types be accessible via the iterator_traits class. (Traits classes are discussed in Section 2.4). The count algorithm, which computes the number of occurrences of a v alue within a sequence, is a simple e xample for the need of this access mechanism, for it needs to access the difference_type to specify its return type: template < typename Iter, typename T> typename iterator_traits::difference_type count(Iter first, Iter last, const T& value); The reason that count uses the iterator-speciﬁc difference type instead of int is to accom- modate iterators that trav erse sequences that may be too long to be measured with an int . In general, a concept may consist of the following kinds of requirements. reﬁnements are analogous to inheritance. They allow one concept to include the requirements from another concept. operations specify the functions that must be implemented for the modeling type. 6 associated types specify mappings between types, and in C ++ are provided using traits classes, which we discuss in Section 2.4. nested requir ements include requirements on associated types such as modeling a certain con- cept or being the same-type as another type. For example, the Input Iterator concept requires that the associated difference_type be a signed integral type. semantic in variants specify behavioral e xpectations about the modeling type. complexity guarantees specify constraints on how much time or space may be used by an op- eration. 2.2. Overview of the STL The high-lev el structure of the STL is sho wn in Fig. 5. The STL contains over ﬁfty generic algorithms and 18 container classes. The generic algorithms are implemented in terms of a family of iterator concepts, and the containers each provide iterators that model the appropriate iterator concepts. As a result, the STL algorithms may be used with any of the STL containers. In fact, the STL algorithms may be used with any data structure that exports iterators with the required capabilities. Fig. 6 shows the hierarchy of STL ’ s iterator concepts. An arrow indicates that the source con- cept is a reﬁnement of the target. The iterator concepts arose from the requirements of algorithms: the need to express the minimal requirements for each algorithm. For example, the merge algo- rithm passes through a sequence once, so it only requires the basic requirements of Input Iter ator for the two ranges it reads from and Output Iterator for the range it writes to. The search algorithm, which ﬁnds occurrences of a particular subsequence within a larger sequence, must make multiple passes through the sequence so it requires F orwar d Iterator . The inplace_merge algorithm needs to move backwards and forwards through the sequence, so it requires Bidirec- tional Iterator . And ﬁnally , the sort algorithm needs to jump arbitrary distances within the sequence, so it requires Random Access Iterator . (The sort function uses the introsort algo- rithm [32] which is partly based on quicksort [33].) Grouping type requirements into concepts enables signiﬁcant reuse of these speciﬁcations: the Input Iterator concept is directly used as a type requirement in over 28 of the STL algorithms. The F orwar d Iterator , which reﬁnes Input Iterator , is used in the speciﬁcation of o ver 22 STL algorithms. The STL includes a handful of common data structures. When one of these data structures does not fulﬁll some specialized purpose, the programmer is encouraged to implement the appropriate specialized data structure. All of the STL algorithms can then be made a vailable for the new data structure at the small cost of implementing iterators. Many of the STL algorithms are higher -order: they take functions as parameters, allo wing the user to customize the algorithm to their o wn needs. The STL deﬁnes o ver 25 function objects for creating and composing functions. The STL also contains a collection of adaptor classes, which are parameterized classes that implement some concept in terms of the type parameter (which is the adapted type). For example, the back_insert_iterator adaptor implements Output Iterator in terms of any model of Back Insertion Sequence . The generic copy algorithm can then be used with back_insert_iterator to append some integers to a list. Adaptors play an important role in the plug-and-play nature of the STL and enable a high degree of reuse. 7 Input Iterator Description An Input Iter ator is an iterator that may be dereferenced to refer to some object, and that may be incremented to obtain the next iterator in a sequence. Input Iterators are not required to be mutable. The underlying sequence elements is not required to be persistent. For e xample, an Input Iterator could be reading input from the terminal. Thus, an algorithm may not make multiple passes through a sequence using an Input Iterator . Reﬁnement of T rivial Iterator . Notation X A type that is a model of Input Iterator T The value type of X i, j Objects of type X t Object of type T Associated types iterator_traits::value_type The type of the value obtained by dereferencing an Input Iter ator iterator_traits::difference_type A signed inte gral type used to represent the distance from one iterator to another , or the number of elements in a range. iterator_traits::iterator_category A type con vertible to input_iterator_tag . Deﬁnitions An iterator is past-the-end if it points beyond the last element of a container . Past-the-end v alues are nonsingular and nondereferenceable. An iterator is valid if it is dereferenceable or past-the-end. An iterator i is incr ementable if there is a ”ne xt” iterator, that is, if ++i is well-deﬁned. Past-the-end iterators are not incrementable. An Input Iterator j is r eachable from an Input Iterator i if, after applying operator++ to i a ﬁnite number of times, i == j . The notation [i,j) refers to a range of iterators be ginning with i and up to b ut not including j . The range [i,j) is a valid range if both i and j are valid iterators, and j is reachable from i . V alid expressions In addition to the expressions in T rivial Iterator , the following expressions must be v alid. expression return type semantics, pre/post-conditions *i Con vertible to T pre: i is incrementable ++i X& pre: i is dereferenceable, post: i is dereferenceable or past the end i++ Equiv alent to ( void )++i . *i++ Equiv alent to {T t = *i; ++i; return t;} Complexity guarantees All operations are amortized constant time. Models istream_iterator , int * , list::iterator , ... Fig. 4. Documentation for the Input Iterator concept. 8 Iterator Interfaces Algorithms Containers partition merge stable_sort sort_heap binary_search Forward Bidirectional Random Access list vector map set T[] Adaptors Function Objects multiplies binder1st mem_fun reverse_iterator back_insert_iterator stack priority_queue ... ... ... ... ... Fig. 5. High-lev el structure of the STL. Random Access Bidirectional Forward Input Output Fig. 6. The reﬁnement hierarchy of iterator concepts. 2.3. The pr oblem of argument dependent name lookup in C++ In C ++ , uses of names inside of a template deﬁnition, such as the use of operator < inside of merge , are resolved after instantiation. F or example, when merge is instantiated with an iterator whose value_type is of type foo::bar , ov erload resolution looks for an operator < deﬁned for foo::bar . If there is no such function deﬁned in the scope of merge , the C ++ compiler also searches the namespace where the arguments’ types are deﬁned, so looks for operator< in namespace foo . This rule is known as ar gument dependent lookup (ADL). The combination of implicit instantiation and ADL makes it con venient to call generic func- tions. This is a nice improv ement over passing concept operations as explicit arguments to a generic function, as in the inc example from Section 1. Ho wev er, ADL has two ﬂaws. The ﬁrst problem is that the programmer calling the generic algorithm no longer has control over which functions are used to satisfy the concept operations. Suppose that namespace foo is a third party library and the application programmer writing the main function has deﬁned his own operator < for foo::bar . ADL does not ﬁnd this new operator < . The second and more sev ere problem with ADL is that it opens a hole in the protection that namespaces are suppose to provide. ADL is applied uniformly to all name lookup, whether or not the name is associated with a concept in the type requirements of the template. Thus, it is possible for calls to helper functions to get hijacked by functions with the same name in other namespaces. Fig. 7 shows an example of how this can happen. The function template lib::generic fun calls load with the intention of in voking lib::load . In main we call generic fun with an object of type foo::bar , so in the call to load , x also has type foo::bar . Thus, argument dependent lookup also consider namespace foo when searching for load . There happens to be a function named load in namespace foo , and it is a slightly better match than lib::foo , so it is called instead, thereby hijacking the call to load . 9 Fig. 7. Example problem caused by ADL. namespace lib { template < typename T> void load(T x, string) { std::cout << "Proceeding as normal!\n"; } template < typename T> void generic_fun(T x) { load(x, "file"); } } namespace foo { struct bar { int n; }; template < typename T> void load(T x, const char *) { std::cout << "Hijacked!\n"; } } int main() { foo::bar a; lib::generic_fun(a); } // Output: Hijack ed! 2.4. T raits classes, template specialization, and separ ate type chec king The traits class idiom plays an important role in writing generic algorithms in C ++ . Unfortu- nately there is a deep incompatibility between the underlying language feature, template special- ization, and our goal of separate type checking. A traits class [34] maps from a type to other types or functions. T raits classes rely on C ++ tem- plate specialization to perform this mapping. For e xample, the follo wing is the primary template deﬁnition for iterator_traits . template < typename Iterator> struct iterator_traits { ... }; A specialization of iterator_traits is deﬁned by specifying particular type arguments for the template parameter and by specifying an alternate body for the template. The code below shows a user -deﬁned iterator class, named my_iter , and a specialization of iterator_traits for my_iter . class my_iter { float operator *() { ... } ... }; template <> struct iterator_traits { typedef float value_type; typedef int difference_type; typedef input_iterator_tag iterator_category; }; When the type iterator_traits is used in other parts of the program it refers to the abov e specialization. In general, a template use refers to the most speciﬁc specialization that matches the template ar guments, if there is one, or else it refers to an instantiation of the primary 10 template deﬁnition. The use of iterator_traits within a template (and template specialization) represents a problem for separate compilation. Consider how a compiler might type check the following unique_copy function template. template < typename InIter, typename OutIter> OutIter unique_copy(InIter first, InIter last, OutIter result) { typename iterator_traits::value_type value = *first; // ... } T o check the ﬁrst line of the body , the compiler needs to kno w that the type of *first is the same type as (or at least conv ertible to) the value_type member of iterator_traits . Howe ver , prior to instantiation, the compiler does not know what type InIter will be instantiated to, and which specialization of iterator_traits to choose (and dif ferent specializations may hav e different deﬁnitions of the value_type ). Thus, if we hope to provide modular type checking, we must dev elop and alternative to using traits classes for accessing associated types. 2.5. Concept-based overloading using the tag dispatc hing idiom One of the main points in the deﬁnition of generic programming in Fig. 2 is that it is sometimes necessary to pro vide more than one generic algorithm for the same purpose. When this happens, the standard approach in C ++ libraries is to provide automatic dispatching to the appropriate algorithm using the tag dispatching idiom or enable_if [35]. Fig. 8 shows the advance algo- rithm of the STL as it is typically implemented using the tag dispatching idiom. The advance algorithm mov es an iterator forward (or backward) n positions. There are three overloads of advance_dispatch , each with an e xtra iterator tag parameter . The C ++ Standard Library deﬁnes the follo wing iterator tag classes, with their inheritance hierarchy mimicking the reﬁnement hi- erarchy of the corresponding concepts. struct input_iterator_tag {}; struct output_iterator_tag {}; struct forward_iterator_tag : public input_iterator_tag {}; struct bidirectional_iterator_tag : public forward_iterator_tag {}; struct random_access_iterator_tag : public bidirectional_iterator_tag {}; The main advance function obtains the tag for the particular iterator from iterator_traits and then calls advance_dispatch . Normal static ov erload resolution then chooses the appro- priate ov erload of advance_dispatch . Both the use of traits and the overload resolution rely on knowing actual argument types of the template and the late type checking of C ++ templates. So the tag dispatching idiom provides another challenge for designing a language for generic programming with separate type checking. 2.6. Reverse iter ators and conditional models The reverse_iterator class template adapts a model of Bidirectional Iterator and imple- ments Bidirectional Iterator , ﬂipping the direction of trav ersal so operator ++ goes backwards and operator -- goes forwards. An excerpt from the reverse_iterator class template is shown belo w . 11 Fig. 8. The advance algorithm and the tag dispatching idiom. template < typename InIter, typename Distance> void advance_dispatch(InIter& i, Distance n, input_iterator_tag ) { while (n--) ++i; } template < typename BidirIter, typename Distance> void advance_dispatch(BidirIter& i, Distance n, bidirectional_iterator_tag ) { if (n > 0) while (n--) ++i; else while (n++) --i; } template < typename RandIter, typename Distance> void advance_dispatch(RandIter& i, Distance n, random_access_iterator_tag ) { i += n; } template < typename InIter, typename Distance> void advance(InIter& i, Distance n) { typename iterator_traits::iterator_category cat; advance_dispatch(i, n, cat ); } template < typename Iter> class reverse_iterator { protected : Iter current; public : explicit reverse_iterator(Iter x) : current(x) { } reference operator *() const { Iter tmp = current; return *--tmp; } reverse_iterator& operator ++() { --current; return * this ; } reverse_iterator& operator --() { ++current; return * this ; } reverse_iterator operator +(difference_type n) const { return reverse_iterator(current - n); } ... }; The reverse_iterator class template is an example of a type that models a concept con- ditionally: if Iter models Random Access Iterator , then so does reverse_iterator . The deﬁnition of reverse_iterator deﬁnes all the operations, such as operator + , required of a Random Access Iterator . The implementations of these operations rely on the Random Ac- cess Iterator operations of the underlying Iter . One might wonder why reverse_iterator can be used on iterators such as list< int >::iterator that are bidirectional but not random access. The reason this works is that a member function such as operator + is type check ed and compiled only if it is used. For G we need a different mechanism to handle this, since function deﬁnitions are always type check ed. 12 2.7. Summary of language r equir ements In this section we surveyed how generic programming is accomplished in C ++ , taking note of the variety of language features and idioms that are used in current practice. In this section we summarize the ﬁndings as a list of requirements for a language to support generic programming. (i) The language provides type parameterized functions with the ability to express constraints on the type parameters. The deﬁnitions of parameterized functions are type checked inde- pendently of how the y are instantiated. (ii) The language provides a mechanism, such as “concepts”, for naming and grouping re- quirements on types, and a mechanism for composing concepts (reﬁnement). (iii) T ype requirements include: – requirements for functions and parameterized functions – associated types – requirements on associated types – same-type constraints (iv) The language provides an implicit mechanism for providing type-speciﬁc operations to a generic function, but this mechanism should maintain modularity (in contrast to argument dependent lookup in C ++ ). (v) The language implicitly instantiates generic functions when they are used. (vi) The language provides a mechanism for concept-based dispatching between algorithms. (vii) The language provides function e xpressions and function parameters. (viii) The language supports conditional modeling. 3. The Design of G G is a statically typed imperativ e language with syntax and memory model similar to C ++ . W e ha ve implemented a compiler that translates G to C ++ , but G could also be interpreted or compiled to byte-code. Compilation units are separately type checked and may be separately compiled, relying only on forward declarations from other compilation units (ev en compilation units containing generic functions and classes). The languages features of G that support generic programming are the following: – Concept and model deﬁnitions, including associated types and same-type constraints; – Constrained polymorphic functions, classes, structs, and type-safe unions; – Implicit instantiation of polymorphic functions; and – Concept-based function ov erloading. In addition, G includes the basic types and control constructs C ++ . 3.1. Concepts The following grammar deﬁnes the syntax for concepts. de cl ← concept cid < tyid , . . . > { cmem . . . }; cmem ← funsig | fundef // Requir ed operations | type tyid ; // Associated types | typ e == typ e ; // Same type constraints | refines cid < typ e , . . . >; | require cid < typ e , . . . >; 13 Fig. 9. The deﬁnition of the Input Iterator concept in G . concept InputIterator { type value; type difference; refines EqualityComparable; refines Regular; // Re gular reﬁnes Assignable and CopyConstructible require SignedIntegral; fun operator *(X b) -> value@; fun operator ++(X! c) -> X!; }; The grammar variable cid is for concept names and tyid is for type variables. The type v ariables are place holders for the modeling type (or a list of types for multi-type concepts). funsig and fundef are function signatures and deﬁnitions, whose syntax we introduce later in this section. In a concept, a function signature says that a model must deﬁne a function with the speciﬁed signature. A function deﬁnition in a concept provides a default implementation. The syntax type tyid ; declares an associated type; a model of the concept must provide a type deﬁnition for the gi ven type name. The syntax typ e == typ e introduces a same type constraint. In the conte xt of a model deﬁnition, the two type e xpressions must refer to the same type. When the concept is used in the type requirements of a polymorphic function or class, this type equality may be assumed. T ype equality in G is non-trivial, and is explained in Section 3.9. Concepts may be composed with refines and require . The distinction is that reﬁnement brings in the associated types from the “super” concept. Fig. 9 sho ws an example of a concept deﬁnition in G , the deﬁnition of InputIterator . 3.2. Models The modeling relation between a type and a concept is established with a model deﬁnition using the following syntax. de cl ← model [ < tyid , . . . > ] [ where { c onstr aint , . . . } ] cid < typ e , . . . > { de cl . . . }; The following shows an example of the Monoid concept and a model deﬁnition that makes int a model of Monoid , using addition for the binary operator and zero for the identity element. concept Monoid { fun identity_elt() -> T@; fun binary_op(T,T) -> T@; }; model Monoid< int > { fun binary_op( int x, int y) -> int@ { return x + y; } fun identity_elt() -> int@ { return 0; } }; A model deﬁnition must satisfy all requirements of the concept. Requirements for associated types are satisﬁed by type deﬁnitions. Requirements for operations may be satisﬁed by function deﬁnitions in the model, by the where clause, or by functions in the lexical scope preceding the 14 Fig. 10. reverse_iterator conditionally models the Random Access Iterator concept. model where { RandomAccessIterator } RandomAccessIterator< reverse_iterator > { fun operator +(reverse_iterator r, difference n) -> reverse_iterator@ { return @reverse_iterator(r.current + n); } fun operator -(reverse_iterator r, difference n) -> reverse_iterator@ { return @reverse_iterator(r.current - n); } fun operator -(reverse_iterator a, reverse_iterator b) -> difference { return a.current - b.current; } }; model deﬁnition. Reﬁnements and nested requirements are satisﬁed by preceding model deﬁni- tions in the lexical scope or by the where clause. A model may be parameterized by placing type variables inside <> ’ s after the model keyw ord. The following deﬁnition establishes that all pointer types are models of InputIterator . model InputIterator { type value = T; type difference = ptrdiff_t; }; The optional where clause in a model deﬁnition can be used to introduce constraints on the type variables. Constraints are either modeling constraints or same-type constraints. c onstr aint ← cid < typ e , . . . > | typ e == typ e Using the where clause we can express conditional modeling. As mentioned in Section 2.6, we need conditional modeling to say that reverse_iterator is a model of Random Access Iterator whenev er the underlying iterator is. Fig. 10 shows is a model deﬁnition that says just this. The rules for type checking parameterized model deﬁnitions with constraints is essentially the same as for generic functions, which we discuss in Section 3.4. 3.3. Nominal versus structural conformance One of the fundamental design choices of G was to include model deﬁnitions. After all, it is possible to instead hav e the compiler ﬁgure out when a type has implemented all of the re- quirements of a concept. W e refer to the approach of using explicit model deﬁnitions nominal conformance whereas the implicit approach we call structural conformance . An example of the nominal versus structural distinction can be seen in the example below . Do the concepts create two ways to refer to the same concept or are they different concepts that happen to have the same constraints? 15 concept A { fun foo(T x) -> T; }; concept B { fun foo(T x) -> T; }; W ith nominal conformance, the above are two different concepts, whereas with structural con- formance, A and B are two names for the same concept. Examples of language mechanisms providing nominal conformance include Jav a interfaces and Haskell type classes. Examples of language mechanisms providing structural conformance include ML signatures [36], Objective Caml object types [37], CLU type sets [38], and Cforall speciﬁcations [39]. Choosing between nominal and structural conformance is difﬁcult because both options have good arguments in their f av or . Structural conformance is more conv enient than nominal conformance W ith nominal con- formance, the modeling relationship is established by an explicit declaration. For example, a Ja va class declares that it implements an interface. In Haskell, an instance declaration establishes the conformance between a particular type and a type class. When the compiler sees the explicit declaration, it checks whether the modeling type satisﬁes the requirements of the concept and, if so, adds the type and concept to the modeling relation. Structural conformance, on the other hand, requires no e xplicit declarations. Instead, the com- piler determines on a need-to-kno w basis whether a type models a concept. The adv antage is that programmers need not spend time writing explicit declarations. Nominal conformance is safer than structural conformance The usual argument against structural conformance is that it is prone to accidental conformance . The classic example of this is a cowbo y object being passed to something expecting a Window [40]. The Window interface includes a draw() method, which the cowboy has, so the type system does not complain even though something wrong has happened. This is not a particularly strong argument because the programmer has to make a big mistake for this kind accidental conformance to occur . Howe ver , the situation changes for languages that support concept-based ov erloading. For example, in Section 2.5 we discussed the tag-dispatching idiom used in C ++ to select the best advance algorithm depending on whether the iterator type models Random Access Iterator or only Input Iterator . With concept-based overloading, it becomes possible for accidental confor- mance to occur without the programmer making a mistake. The follo wing C ++ code is an e xample where an error would occur if structural conformance were used instead of nominal. std::vector< int > v; std::istream_iterator< int > in(std::cin), in_end; v.insert(v.begin(), in, in_end); The vector class has two versions of insert , one for models of Input Iterator and one for models of F orwar d Iter ator . An Input Iterator may be used to tra verse a range only a single time, whereas a F orwar d Iterator may traverse through its range multiple times. Thus, the version of insert for Input Iterator must resize the vector multiple times as it progresses through the input range. In contrast, the version of insert for F orwar d Iterator is more efﬁcient because it ﬁrst discov ers the length of the range (by calling std::distance , which trav erses the input range), resizes the vector to the correct length, and then initializes the v ector from the range. The problem with the abov e code is that istream_iterator fulﬁlls the syntactic require- ments for a F orward Iterator but not the semantic requirements: it does not support multiple passes. That is, with structural conformance, there is a false positiv e and insert dispatches to 16 the version for F orward Iterator s. The program resizes the vector to the appropriate size for all the input but it does not initialize the v ector because all of the input has already been read. Why not both? It is conceiv able to provide both nominal and structural conformance on a concept-by-concept basis, which is in fact the approach used in the concept extension for C ++ 0X. Concepts that are intended to be used for dispatching could be nominal and other concepts could be structural. This matches the current C ++ practice: some concepts come with traits classes that provide nominal conformance whereas other concepts do not (the default situation with C ++ tem- plates is structural conformance). Howe ver , providing both nominal conformance and structural conformance complicates the language, especially for programmers new to the language, and de- grades its uniformity . Therefore, with G we provide only nominal conformance, giving priority to safety and simplicity ov er con venience. 3.4. Generic Functions The syntax for generic functions is shown below . The name of the function is the identiﬁer after fun , the type parameters are between the <> ’ s and are constrained by the requirement in the where clause. A function’ s parameters are between the () ’ s and the return type of a function comes after the -> . fundef ← fun id [ < tyid , . . . > ] [ where { c onstr aint , . . . } ] ( typ e p ass [ id ] , . . . ) -> typ e p ass { stmt . . . } funsig ← fun id [ < tyid , . . . > ] [ where { c onstr aint , . . . } ] ( typ e p ass [ id ] , . . . ) -> typ e p ass ; de cl ← fundef | funsig p ass ← mut r ef // pass by r eference | @ // pass by value mut ← const |  // constant | ! // mutable r ef ← & |  The default parameter passing mode in G is read-only pass-by-reference. Read-write pass-by- reference is indicated by ! and pass-by-v alue is indicated by @ . The merge algorithm, implemented as a generic function in G , is shown in Fig. 11. The func- tion is parameterized on three types: Iter1 , Iter2 , and Iter3 . The dot notation is used to refer to a member of a model, including associated types such as the value type of an iterator . asso c ← cid < typ e , . . . >. id | cid < typ e , . . . >. asso c typ e ← asso c The Output Iterator concept used in the merge function is an example of a multi-parameter concept. It has a type parameter X for the iterator and a type parameter T for the type that can be written to the iterator . The following is the deﬁnition of the Output Iter ator concept. concept OutputIterator { refines Regular; fun operator <<(X! c, T t) -> X!; }; In general the body of a generic function contains a sequence of statements. Syntax for some of the statements in G is deﬁned in the following grammar . stmt ← let id = expr ; | while ( expr ) stmt | return expr ; | expr ; | . . . 17 Fig. 11. The merge algorithm in G . fun merge where { InputIterator, InputIterator, LessThanComparable.value>, InputIterator.value == InputIterator.value, OutputIterator.value> } (Iter1@ first1, Iter1 last1, Iter2@ first2, Iter2 last2, Iter3@ result) -> Iter3@ { while (first1 != last1 and first2 != last2) { if (*first2 < *first1) { result << *first2; ++first2; } else { result << *first1; ++first1; } } return copy(first2, last2, copy(first1, last1, result)); } The let form introduces local variables, deducing the type of the variable from the right-hand side expression (similar to the auto proposal for C ++ 0X [41]). The body of a generic function is type checked separately from any instantiation of the func- tion. The type parameters are treated as abstract types so no type-speciﬁc operations may be applied to them unless otherwise speciﬁed by the where clause. The where clause introduces surrogate model deﬁnitions and function signatures (for all the required concept operations) into the scope of the function. Multiple functions with the same name may be deﬁned, and static overload resolution is per- formed by G to decide which function to inv oke at a particular call site depending on the argument types and also depending on which model deﬁnitions are in scope. When more than one overload may be called, the most speciﬁc ov erload is called (if one e xists) according to the rules described in Section 3.10. 3.5. Function calls and implicit instantiation The syntax for calling functions (or polymorphic functions) is the C-style notation: expr ← expr ( expr , . . . ) Arguments for the type parameters of a polymorphic function need not be supplied at the call site: G will deduce the type arguments by unifying the types of the ar guments with the types of the parameters and then implicitly instantiate the polymorphic function. The design issues surrounding implicit instantiation are described below . All of the requirements in the where clause must be satisﬁed by model deﬁnitions in the lexical scope preceding the function call, as described in Section 3.6. The follo wing is an example of calling the generic accumulate function. In this case, the generic function is implicitly instantiated with type argument int * . fun main() -> int@@ { 18 let a = new int [8]; a[0] = 1; a[1] = 2; a[2] = 3; a[3] = 4; a[4] = 5; let s = accumulate(a, a + 5); if (s == 15) return 0; else return -1; } A polymorphic function may be explicitly instantiated using this syntax: expr ← expr <| ty , . . . |> Follo wing Mitchell [42] we view implicit instantiation as a kind of coercion that transforms an expression of one type to another type. In the example above, the accumulate function was coerced from fun where { InputIterator, Monoid.value> } (Iter@, Iter) -> InputIterator.value@ to fun ( int *@, int *) -> InputIterator< int *>.value@ There are sev eral kinds of implicit coercions in G , and together they form a subtyping relation ≤ . The subtyping relation is reﬂexi ve and transitiv e. Like C ++ , G contains some bidirectional implicit coercions, such as float ≤ double and double ≤ float , so ≤ is not anti-symmetric. The subtyping relation for G is deﬁned by a set of subtyping rules. The follo wing is the subtyping rule for generic function instantiation. (I N S T ) Γ satisﬁes c Γ ` fun< α >where { c } ( σ )-> τ ≤ [ ρ/α ]( fun( σ )-> τ ) The type parameters α are substituted for type arguments ρ and the constraints in the where clause must be satisﬁed in the current en vironment. T o apply this rule, the compiler must choose the type arguments. W e call this type ar gument deduction and discuss it in more detail momen- tarily . Constraint satisfaction is discussed in Section 3.6. The subtyping relation allo ws for coercions during type checking according to the subsump- tion rule: (S U B ) Γ ` e : σ Γ ` σ ≤ τ Γ ` e : τ The ( S U B ) rule is not syntax-directed so its addition to the type system would result in a non- deterministic type checking algorithm. The standard workaround is to omit the above rule and instead allow coercions in other rules of the type system such as the rule for function application. The following is a rule for function application that allows coercions in both the function type and in the argument types. (A P P ) Γ ` e 1 : τ 1 Γ ` e 2 : σ 2 Γ ` τ 1 ≤ fun( σ 3 )-> τ 2 Γ ` σ 2 ≤ σ 3 Γ ` e 1 ( e 2 ) : τ 2 As mentioned above, the type checker must guess the type arguments ρ to apply the ( I N S T ) rule. In addition, the ( A P P ) rule includes sev eral types that appear from nowhere: σ 3 and τ 2 . The problem of deducing these types is equiv alent to trying to ﬁnd solutions to a system of inequalities. Consider the following e xample program. fun apply( fun (T)->T f, T x) -> T { return f(x); } fun id(U a) -> U { return a; } fun main() -> int@ { return apply(id, 0); } 19 The application apply(id, 0) type checks if there is a solution to the following system: fun ( fun (T)->T, T) -> T ≤ fun ( α , β ) -> γ fun (U)->U ≤ α int ≤ β The following type assignment is a solution to the abo ve system. α = fun ( int )-> int β = int γ = int Unfortunately , not all systems of inequalities are as easy to solve as the abov e example. In f act, with Mitchell’ s original set of subtyping rules, the problem of solving systems of inequalities was prov ed undecidable by Tiuryn and Urzyczyn [43]. There are several approaches to dealing with this undecidability . Remove the (A R R OW ) rule. Mitchell’ s subtyping relation included the usual co/contrav ariant rule for functions. (A R R OW ) σ 2 ≤ σ 1 τ 1 ≤ τ 2 fun( σ 1 )-> τ 1 ≤ fun( σ 2 )-> τ 2 The ( A R R OW ) rule is nice to hav e because it allo ws a function to be coerced to a dif ferent type so long as the parameter and return types are coercible in the appropriate way . In the following example the standard ilogb function is passed to foo ev en though it does not match the expected type. The (A R R OW ) rule allows for this coercion because int is coercible to double . include "math.h"; // fun ilogb(double x) > int; fun foo( fun ( int )->int@ f) -> int@ { return f(1); } fun main() -> int@ { return foo(ilogb); } Howe ver , the ( A R RO W ) rule is one of the culprits in the undecidability of the subtyping problem; removing it makes the problem decidable [43]. The language ML F of Le Botlan and Remy [44] takes this approach, and for the time being, so does G . W ith this restriction, type ar - gument deduction is reduced to the v ariation of uniﬁcation deﬁned in [44]. Instead of working on a set of variable assignments, this uniﬁcation algorithm keeps track of either a type assignment or the tightest lo wer bound seen so far for each v ariable. The ( A P P ) rule for G is reformulated as follows to use this unify algorithm. (A P P ) Γ ` e 1 : τ 1 Γ ` e 2 : σ 2 Q = { τ 1 ≤ α, σ 2 ≤ β } Q 0 = unify ( α, fun( β )-> γ , Q ) Γ ` e 1 ( e 2 ) : Q 0 ( γ ) In languages where functions are often written in curried form, it is important to provide e ven more ﬂexibility than in the above ( A P P ) rule by postponing instantiation, as is done in ML F . Consider the apply example ag ain, but this time written in curried form. fun apply( fun (T)->T f) -> ( fun (T)->T)@ { return fun (T x) { return f(x); }; } fun id(U a) -> U { return a; } fun main() -> int@ { return apply(id)(0); } In the ﬁrst application apply(id) we do not yet kno w that T should be bound to int . The instantiation needs to be delayed until the second application apply(id)(0) . In general, each application contributes to the system of inequalities that needs to be solved to instantiate the generic function. In ML F , the return type of each application encodes a partial system of in- 20 equalities. The inequalities are recorded in the types as lower bounds on type parameters. The following is an e xample of such a type. fun where { fun (T)->T ≤ U } (U) -> U Postponing instantiation is not as important in G because functions take multiple parameters and currying is seldom used. Remov al of the arrow rule means that, in some circumstances, the programmer would hav e to wrap a function inside another function before passing the function as an argument. Restrict the language to predicative polymorphism Another alternativ e is to restrict the language so that only monotypes (non-generic types) may be used as the type arguments in an instantiation. This approach is used in by Odersky and L ¨ aufer [45] and also by Pe yton Jones and Shields [46]. Howe ver , this approach reduces the expressi veness of the language for the sake of the con venience of implicit instantiation. Restrict the language to second-class polymorphism Restricting the language of types to disallow polymorphic types nested inside other types is another way to make the subtyping problem decidable. With this restriction the subtyping problem is solved by normal uniﬁcation. Languages such as SML and Haskell 98 use this approach. Like the restriction to predicativ e polymorphism, this approach reduces the expressi veness of the language for the sake of implicit instantiation (and type inference). Howe ver , there are many motiv ating use cases for ﬁrst-class polymorphism [47], so throwing out ﬁrst-class polymorphism is not our preferred alternati ve. Use a semi-decision procedur e Y et another alternativ e is to use a semi-decision procedure for the subtyping problem. The advantage of this approach is that it allows implicit instantiation to work in more situations, though it is not clear whether this e xtra ﬂexibility is needed in practice. The do wn side is that there are instances of the subtyping problem where the procedure di verges and nev er returns with a solution. 3.6. Model lookup (constraint satisfaction) The basic idea behind model lookup is simple although some of the details are a bit compli- cated. Consider the follo wing program containing a generic function foo with a requirement for C . concept C { }; model C< int > { }; fun foo where { C } (T x) -> T { return x; } fun main() -> int@ { return foo(0); // lookup model C < int > } At the call foo(0) , the compiler deduces the binding T= int and then seeks to satisfy the where clause, with int substituted for T . In this case the constraint C< int > must be satisﬁed. In the scope of the call foo(0) there is a model deﬁnition for C< int > , so the constraint is satisﬁed. W e call C< int > the model head . 3.6.1. Lexical scoping of models The design choice to look for models in the lexical scope of the instantiation is an important choice for G , and differentiates it from both Haskell and the concept extension for C ++ . This 21 Fig. 12. Intentionally overlapping models. module A { model Monoid< int > { fun binary_op( int x, int y) -> int@ { return x + y; } fun identity_elt() -> int@ { return 0; } }; fun sum(Iter first, Iter last) -> int { return accumulate(first, last); } } module B { model Monoid< int > { fun binary_op( int x, int y) -> int@ { return x * y; } fun identity_elt() -> int@ { return 1; } }; fun product(Iter first, Iter last) -> int { return accumulate(first, last); } } choice impro ves the modularity of G by pre venting model declarations in separate modules from accidentally conﬂicting with one another . For example, in Fig. 12 we create sum and product functions in modules A and B respec- tiv ely by instantiating accumulate in the presence of different model declarations. This exam- ple would not type check in Haskell, ev en if the two instance declarations were to be placed in different modules, because instance declarations implicitly leak out of a module when anything in the module is used by another module. This example would be ille gal in C ++ 0X concept exten- sion because 1) model deﬁnitions must appear in the same namespace as their concept, and 2) if placed in the same namespace, the two model deﬁnitions would violate the one-deﬁnition-rule. It is also quite possible for separately developed modules to include model deﬁnitions that accidentally overlap. In G , this is not a problem, as the model deﬁnitions will each apply within their own module. Model deﬁnitions may be explicitly imported from one module to another . The syntax for modules and import declarations is sho wn belo w . An interesting extension would be parameterized modules, but we lea ve that for future work. de cl ← module mid { de cl . . . } // module | scope mid = sc op e ; // scope alias | import sc op e . cid < typ e , . . . >; // import model | public : de cl . . . // public r e gion | private : de cl . . . // private r e gion 3.6.2. Constrained models In G , a model deﬁnition may itself be parameterized and the type parameters constrained by a where clause. Fig. 13 shows a typical example of a parameterized model. The model deﬁnition 22 Fig. 13. Example of parameterized model deﬁnition. concept Comparable { fun operator ==(T,T)->bool@; }; model Comparable< int > { }; struct list { / ∗ ... ∗ / }; model where { Comparable } Comparable< list > { fun operator ==(list x, list y) -> bool@ { / ∗ ... ∗ / } }; fun generic_foo where { Comparable } (C a, C b) -> bool@ { return a == b; } fun main() -> int@ { let l1 = @list< int >(); let l2 = @list< int >(); generic_foo(l1,l2); return 0; } in the example says that for any type T , list is a model of Comparable if T is a model of Comparable . Thus, a model deﬁnition is like an inference rule or a Horn clause [48] in logic programming. For e xample, a model deﬁnition of the form model where { P1, ..., Pn } Q { ... }; corresponds to the Horn clause: ( P 1 and . . . and P n ) implies Q The model deﬁnitions from the example in Fig. 13 could be represented in Prolog with the following tw o rules: comparable( int ). comparable(list(T)) :- comparable(T). The algorithm for model lookup is essentially a logic programming engine: it performs uni- ﬁcation and backward chaining (similar to how instance lookup is performed in Haskell). Uni- ﬁcation is used to determine when the head of a model deﬁnition matches. For example, in Fig. 13, in the call to generic_foo the constraint Comparable< list< int > > needs to be sat- isﬁed. There is a model deﬁnition for Comparable< list > and uniﬁcation of list< int > and list succeeds with the type assignment T = int . Howe ver , we hav e not yet satis- ﬁed Comparable< list< int > > because the where clause of the parameterized model must also be satisﬁed. The model lookup algorithm therefore proceeds recursively and tries to satisfy Comparable< int > , which in this case is tri vial. This process is called bac kwar d chaining : it 23 starts with a goal (a constraint to be satisﬁed) and then applies matching rules (model deﬁnitions) to reduce the goal into subgoals. Eventually the subgoals are reduced to facts (model deﬁnitions without a where clause) and the process is complete. As is typical of Prolog implementations, G processes subgoals in a depth-ﬁrst manner . It is possible for multiple model deﬁnitions to match a constraint. When this happens the most speciﬁc model deﬁnition is used, if one exists. Otherwise the program is ill-formed. W e say that deﬁnition A is a mor e speciﬁc model than deﬁnition B if the head of A is a substitution instance of the head of B and if the where clause of B implies the where clause of A . In this context, implication means that for ev ery constraint c in the where clause of A , c is satisﬁed in the current en vironment augmented with the assumptions from the where clause of B . G places v ery fe w restrictions on the form of a model deﬁnition. The only restriction is that all type parameters of a model must appear in the head of the model. That is, they must appear in the type arguments to the concept being modeled. For example, the following model deﬁnition is ill formed because of this restriction. concept C { }; model C { }; // ill formed, U is not in an argument to C This restriction ensures that unifying a constraint with the model head alw ays produces assign- ments for all the type parameters. Horn clause logic is by nature powerful enough to be T urning-complete. For example, it is possible to express general recursive functions. The program in Fig. 14 computes the Acker- mann function at compile time by encoding it in model deﬁnitions. This po wer comes at a price: determining whether a constraint is satisﬁed by a set of model deﬁnitions is in general unde- cidable. Thus, model lookup is not guaranteed to terminate and programmers must take some care in writing model deﬁnitions. W e could restrict the form of model deﬁnitions to achiev e de- cidability howe ver there are two reasons not to do so. First, restrictions would complicate the speciﬁcation of G and make it harder to learn. Second, there is the danger of ruling out useful model deﬁnitions. 3.7. Impr oved err or messages In the introduction we showed how users of generic libraries in C ++ are plagued by hard to understand error messages. The introduction of concepts and where clauses in G solves this problem. The following is the same misuse of the stable_sort function, but this time written in G . 4 fun main() -> int@{ 5 let v = @list< int >(); 6 stable_sort(begin(v), end(v)); 7 return 0; 8 } In contrast to long C ++ error message (Fig. 1), in G we get the following: test/stable_sort_error.hic:6: In application stable_sort(begin(v), end(v)), Model MutableRandomAccessIterator> needed to satisfy requirement, but it is not defined. 24 Fig. 14. The Ackermann function encoded in model deﬁnitions. struct zero { }; struct suc { }; concept Ack { type result; }; model Ack { type result = suc; }; model where { Ack > } Ack, zero> { type result = Ack >.result; }; model where { Ack,y>, Ack,y>.result > } Ack< suc,suc > { type result = Ack,y>.result >.result; }; fun foo( int ) { } fun main() -> int@ { type two = suc< suc >; type three = suc; foo(@Ack.result()); // err or: T ype (suc < suc < suc < suc < suc < suc < suc < suc < suc < zero >>>>>>>>> ) // does not match type (int) } A related problem that plagues authors of generic C ++ libraries is that type errors often go unnoticed during library de velopment. Again, this is because C ++ delays type checking templates until instantiation. One of the reasons for such type errors is that the implementation of a template is not consistent with its documented type requirements. This problem is directly addressed in G : the implementation of a generic function is type- checked with respect to its where clause, independently of any instantiations. Thus, when a generic function successfully compiles, it is guaranteed to be free of type errors and the imple- mentation is guaranteed to be consistent with the type requirements in the where clause. Interestingly , while implementing the STL in G , the type checker caught sev eral errors in the STL as deﬁned in C ++ . One such error was in replace_copy . The implementation below was translated directly from the GNU C ++ Standard Library , with the where clause matching the requirements for replace_copy in the C ++ Standard [49]. 196 fun replace_copy 197 where { InputIterator, Regular, EqualityComparable, 198 OutputIterator.value>, 199 OutputIterator, 200 EqualityComparable2.value,T> } 201 (Iter1@ first, Iter1 last, Iter2@ result, T old, T neu) -> Iter2@ { 202 for ( ; first != last; ++first) 203 result << *first == old ? neu : *first; 204 return result; 205 } 25 The G compiler gives the follo wing error message: stl/sequence_mutation.hic:203: The two branches of the conditional expression must have the same type or one must be coercible to the other. This is a subtle bug, which explains why it has gone unnoticed for so long. The type requirements say that both the value type of the iterator and T must be writable to the output iterator , but the requirements do not say that the value type and T are the same type, or coercible to one another . 3.8. Generic classes, structs, and unions The syntax for generic classes, structs, and unions is deﬁned below . The grammar v ariable clid is for class, struct, and union names. de cl ← class clid p olyhdr { classmem . . . }; de cl ← struct clid p olyhdr { mem . . . }; de cl ← union clid p olyhdr { mem . . . }; mem ← typ e id ; classmem ← mem | p olyhdr clid ( typ e p ass [ id ] , . . . ) { stmt . . . } | ~ clid () { stmt . . . } poly hdr ← [ < tyid , . . . > ] [ where { c onstr aint , . . . } ] Classes consist of data members, constructors, and a destructor . There are no member func- tions; normal functions are used instead. Data encapsulation ( public / private ) is speciﬁed at the module lev el instead of inside the class. Class, struct, and unions are used as types using the syntax below . Such a type is well-formed if the type arguments are well-formed and if the requirements in its where clause are satisﬁed. typ e ← clid [ < typ e , . . . > ] 3.9. T ype equality There are sev eral language constructions in G that make it difﬁcult to decide when two types are equal. Generic functions complicate type equality because the names of the type parameters do not matter . So, for example, the follo wing two function types are equal: fun (T)->T = fun (U)->U The order of the type parameters does matter (because a generic function may be explicitly instantiated) so the following tw o types are not equal. fun (S,T)->T 6 = fun (S,T)->T Inside the scope of a generic function, type parameters with different names are assumed to be different types (this is a conserv ati ve assumption). So, for example, the follo wing program is ill formed because variable a has type S whereas function f is e xpecting an argument of type T . fun foo(S a, fun (T)->T f) -> T { return f(a); } Associated types and same-type constraints also af fect type equality . First, if there is a model deﬁnition in the current scope such as: model C< int > { type bar = bool ; }; 26 then we hav e the equality C< int >.bar = bool . Inside the scope of a generic function, same-type constraints help determine when two types are equal. For e xample, the following v ersion of foo is well formed: fun foo_1 where { T == S } ( fun (T)->T f, S a) -> T { return f(a); } There is a subtle dif ference between the abov e version of foo and the following one. The reason for the difference is that same-type constraints are check ed after type argument deduction. fun foo_2( fun (T)->T f, T a) -> T { return f(a); } fun id( double x) -> double { return x; } fun main() -> int@ { foo_1(id, 1.0); // ok foo_1(id, 1); // err or: Same type requir ement violated, double != int foo_2(id, 1.0); // ok foo_2(id, 1); // ok } In the ﬁrst call to foo 1 the compiler deduces T=double and S=double from the arguments id and 1.0 . The compiler then checks the same-type constraint T == S , which in this case is satisﬁed. For the second call to foo 1 , the compiler deduces T=double and S=int and then the same-type constraint T == S is not satisﬁed. The ﬁrst call to foo 2 is straightforward. For the second call to foo 2 , the compiler deduces T=double from the type of id and the ar gument 1 is implicitly coerced to double . T ype equality is a congruence r elation , which means sev eral things. First it means type equality is an equivalence r elation , so it is reﬂexiv e, transitive, and symmetric. Thus, for any types ρ , σ , and τ we have – τ = τ – σ = τ implies τ = σ – ρ = σ and σ = τ implies ρ = τ For e xample, the following function is well formed: fun foo where { R == S, S == T} ( fun (T)->S f, R a) -> T { return f(a); } The type expression R (the type of a ) and the type expression T (the parameter type of f ) both denote the same type. The second aspect of type equality being a congruence is that it propagates in certain ways with respect to type constructors. For example, if we know that S = T then we also know that fun (S)->S = fun (T)->T . Similarly , if we hav e deﬁned a generic struct such as: struct bar { }; then S = T implies bar = bar . The propagation of equality also goes in the other direc- tion. For example, bar = bar implies that S = T . The congruence extends to associated types. So S = T implies C.bar = C.bar . Howe ver , for associated types, the propagation does not go in the rev erse direction. So C.bar = C.bar does not imply that S = T . For example, gi ven the model deﬁnitions model C< int > { type bar = bool ; }; model C< float > { type bar = bool ; }; we hav e C< int >.bar = C< float >.bar but this does not imply that int = float . 27 Like type parameters, associated types are in general assumed to be dif ferent from one another . So the following program is ill-formed: concept C { type bar; }; fun foo where { C, C } (C.bar a, fun (C.bar)->T f) -> T { return f(a); } The next program is also ill formed. concept D { type bar; type zow; }; fun foo where { D } (D.bar a, fun (D.zow)->T f) -> T { return f(a); } In the compiler for G we use the congruence closure algorithm by Nelson and Oppen [50] to keep track of which types are equal. The algorithm is efﬁcient: O ( n log n ) time complexity on av erage, where n is the number of types. It has O ( n 2 ) time complexity in the worst case. This can be improv ed by instead using the Do wney-Sethi-T arjan algorithm which is O ( n log n ) in the worst case [51]. 3.10. Function overloading and concept-based overloading Multiple functions with the same name may be deﬁned and static overload resolution is per- formed to decide which function to in vok e at a particular call site. The resolution depends on the ar gument types and on the model deﬁnitions in scope. When more than one ov erload may be called, the most speciﬁc overload is called if one exists. The basic ov erload resolution rules are based on those of C ++ . In the following simple e xample, the second foo is called. fun foo() -> int@ { return -1; } fun foo( int x) -> int@ { return 0; } fun foo( double x) -> int@ { return -1; } fun foo(T x) -> int@ { return -1; } fun main() -> int@ { return foo(3); } The ﬁrst foo has the wrong number of arguments, so it is immediately dropped from consid- eration. The second and fourth are giv en priority ov er the third because they can exactly match the argument type int (for the fourth, type argument deduction results in T= int ), whereas the third foo requires an implicit coercion from int to double . The second foo is fa vored ov er the fourth because it is more speciﬁc. A function f is a more speciﬁc overload than function g if g is callable from f but not vice versa. A function g is callable from function f if you could call g from inside f , forwarding all the parameters of f as arguments to g , without causing a type error . More formally , if f has type fun< t f >where C f ( σ f ) -> τ f and g has type fun< t g >where C g ( σ g ) -> τ g then g is callable from f if σ f ≤ [ t g /ρ ] σ g and C f implies [ t g /ρ ] C g for some ρ . In general there may not be a most speciﬁc ov erload in which case the program is ill-formed. In the following example, both foo ’ s are callable from each other and therefore neither is more speciﬁc. fun foo( double x) -> int@ { return 1; } 28 Fig. 15. The advance algorithms using concept-based overloading. fun advance where { InputIterator } (Iter! i, InputIterator.difference@ n) { for (; n != zero(); --n) ++i; } fun advance where { BidirectionalIterator } (Iter! i, InputIterator.difference@ n) { if (zero() < n) for (; n != zero(); --n) ++i; else for (; n != zero(); ++n) --i; } fun advance where { RandomAccessIterator } (Iter! i, InputIterator.difference@ n) { i = i + n; } fun foo( float x) -> int@ { return -1; } fun main() -> int@ { return foo(3); } In the next e xample, neither foo is callable from the other so neither is more speciﬁc. fun foo(T x, int y) -> int@ { return 1; } fun foo( int x, T y) -> int@ { return -1; } fun main() -> int@ { return foo(3, 4); } In Section 2.5 we showed how to accomplish concept-based o verloading of se veral versions of advance using the tag dispatching idiom in C ++ . Fig. 15 shows three overloads of advance im- plemented in G . The signatures for these overloads are the same except for their where clauses. The concept BidirectionalIterator is a reﬁnement of InputIterator , so the second ver- sion of advance is more speciﬁc than the ﬁrst. The concept RandomAccessIterator is a re- ﬁnement of BidirectionalIterator , so the third advance is more speciﬁc than the second. The code in Fig. 16 shows two calls to advance . The ﬁrst call is with an iterator for a singly- linked list. This iterator is a model of InputIterator but not RandomAccessIterator ; the ov erload resolution chooses the ﬁrst version of advance . The second call to advance is with a pointer which is a RandomAccessIterator so the second version of advance is called. Concept-based overloading in G is entirely based on static information available during the type checking and compilation of the call site. This presents some difﬁculties when trying to resolve to optimized versions of an algorithm from within another generic function. Section ?? discusses the issues that arise and presents an idiom that ameliorates the problem. 3.11. Function expr essions The following is the syntax for function e xpressions and function types. 29 Fig. 16. Example calls to advance and overload resolution. use "slist.g"; use "basic_algorithms.g"; // for copy use "iterator_functions.g"; // for advance use "iterator_models.g"; // for iterator models for int ∗ fun main() -> int@ { let sl = @slist< int >(); push_front(1, sl); push_front(2, sl); push_front(3, sl); push_front(4, sl); let in_iter = begin(sl); advance(in_iter, 2); // calls version 1, linear time let rand_iter = new int [4]; copy(begin(sl), end(sl), rand_iter); advance(rand_iter, 2); // calls version 3, constant time if (*in_iter == *rand_iter) return 0; else return -1; } The body of a function expression may be either a sequence of statements enclosed in braces or a single expression following a colon. The return type of a function expression is deduced from the return statements in the body , or from the single expression. The following e xample computes the sum of an array using for_each and a function expres- sion. 2 fun main() -> int@ { let n = 8; let a = new int [n]; for ( let i = 0; i != n; ++i) a[i] = i; let sum = 0; for_each(a, a + n, fun ( int x) p=&sum { *p = *p + x; }); return sum - (n * (n-1))/2; } The expression fun ( int x) p=&sum { *p = *p + x; } creates a function object. The body of a function expression is not lexically scoped, so a direct use of sum in the body would be an error . The initialization p=&sum declares a data member inside the function object with type int * and copy constructs the member with the address &sum . 2 Of course, the accumulate function is the appropriate algorithm for this computation, but then the example would not demonstrate the use of function expressions. 30 The primary motiv ation for non-lexically scoped function expressions is to keep the design close to C ++ so that function expressions can be directly compiled to function objects in C ++ . Howe ver , this design has some drawbacks as we discov ered while porting the STL to G . Most STL implementations implement two separate versions of find_subsequence , one written in terms of operator == and the other in terms of a function object. The version us- ing operator == could be written in terms of the one that takes a function object, but it is not written that way . The original reason for this was to improve efﬁcienc y , but with with a modern optimizing compiler there should be no difference in ef ﬁciency: all that is needed to erase the difference is some simple inlining. The G implementation we write the operator == version of find_subsequence in terms of the higher-order v ersion. The follo wing code sho ws how this is done and is a bit more complicated than we would ha ve liked. fun find_subsequence where { ForwardIterator, ForwardIterator, ForwardIterator.value == ForwardIterator.value, EqualityComparable.value> } (Iter1 first1, Iter1 last1, Iter2 first2, Iter2 last2) -> Iter1@@ { type T = ForwardIterator.value; let cmp = model EqualityComparable. operator ==; return find_subsequence(first1, last1, first2, last2, fun (T a,T b) c=cmp: c(a, b)); } It would ha ve been simpler to write the function e xpression as fun (T a, T b): a == b Howe ver , this is an error in G because the operator == from the EqualityComparable<..> requirement is a local name, not a global one, and is therefore not in scope for the body of the function expression. The workaround is to store the comparison function as a data member of the function object. The expression model EqualityComparable. operator == accesses the operator == member from the model of EqualityComparable for type T . Examples such as these are a con vincing argument that lexical scoping should be allowed in function expressions, and the ne xt generation of G will support this feature. 3.12. F irst-class polymorphism In the introduction we mentioned that G is based on System F . One of the hallmarks of System F is that it provides ﬁrst class polymorphism. That is, polymorphic objects may be passed to and returned from functions. This is in contrast to the ML family of languages, where polymorphism is second class. In Section 3.5 we discussed how the restriction to second-class polymorphism simpliﬁes type argument deduction, reducing it to normal uniﬁcation. Howe ver , we prefer to retain ﬁrst-class polymorphism and use the somewhat more complicated v ariant of uniﬁcation from ML F . One of the reasons to retain ﬁrst-class polymorphism is to retain the expressi veness of function objects in C ++ . A function object may ha ve member function templates and may therefore by used polymorphically . The following program is a simple use of ﬁrst-class polymorphism in G . Note that f is applied to ar guments of different types. 31 Fig. 17. Some STL Algorithms in G . fun find where { InputIterator } (Iter@ first, Iter last, fun (InputIterator.value)->bool@ pred) -> Iter@ { while (first != last and not pred(*first)) ++first; return first; } fun find where { InputIterator, EqualityComparable.value> } (Iter@ first, Iter last, InputIterator.value value) -> Iter@ { while (first != last and not (*first == value)) ++first; return first; } fun remove where { MutableForwardIterator, EqualityComparable.value> } (Iter@ first, Iter last, InputIterator.value value) -> Iter@ { first = find(first, last, value); let i = @Iter(first); return first == last ? first : remove_copy(++i, last, first, value); } fun foo( fun (T)->T f) -> int@ { return f(1) + d2i(f(-1.0)); } fun id(T x) -> T { return x; } fun main() -> int@ { return foo(id); } 4. Analysis of G and the STL In this section we analyze the interdependence of the language features of G and generic li- brary design in light of implementing the STL. A primary goal of generic programming is to express algorithms with minimal assumptions about data abstractions, so we ﬁrst look at ho w the generic functions of G can be used to accomplish this. Another goal of generic programming is efﬁcienc y , so we inv estigate the use of function overloading in G to accomplish automatic algo- rithm selection. W e conclude this section with a brief look at implementing generic containers and adaptors in G . 4.1. Algorithms Fig. 17 depicts a few simple STL algorithms implemented using generic functions in G . The STL provides two versions of most algorithms, such as the overloads for find in Fig. 17. The ﬁrst version is higher-order , taking a predicate function as its third parameter while the sec- ond version relies on operator == . Functions are ﬁrst-class in G , so the higher-order version is straightforward to express. As is typical in the STL, there is a high-degree of internal reuse: remove uses remove_copy and and find . 32 Fig. 18. The STL Iterator Concepts in G (Part I). concept InputIterator { type value; type difference; refines EqualityComparable; refines Regular; require SignedIntegral; fun operator *(X) -> value@; fun operator ++(X!) -> X!; }; concept OutputIterator { refines Regular; fun operator <<(X!, T) -> X!; }; concept ForwardIterator { refines DefaultConstructible; refines InputIterator; fun operator *(X) -> value; }; concept MutableForwardIterator { refines ForwardIterator; refines OutputIterator; require Regular; fun operator *(X) -> value!; }; 4.2. Iterators Figures 18 and 19 show the STL iterator hierarchy as represented in G . Required operations are expressed in terms of function signatures, and associated types are expressed with a nested type requirement. The reﬁnement hierarchy is established with the refines clauses and nested model requirements with require . The semantic in variants and complexity guarantees of the iterator concepts are not expressible in G as they are be yond the scope of its type system. 4.3. Automatic Algorithm Selection T o realize the generic programming efﬁcienc y goals, G provides mechanisms for automatic algorithm selection. The follo wing code shows two ov erloads for copy . (W e omit the third o ver- load to save space.) The ﬁrst version is for input iterators and the second for random access, which uses an integer counter thereby allowing some compilers to better optimize the loop. The two signatures are the same except for the where clause. fun copy where { InputIterator, OutputIterator.value> } (Iter1@ first, Iter1 last, Iter2@ result) -> Iter2@ { for (; first != last; ++first) result << *first; 33 Fig. 19. The STL Iterator Concepts in G (Part II). concept BidirectionalIterator { refines ForwardIterator; fun operator --(X!) -> X!; }; concept MutableBidirectionalIterator { refines BidirectionalIterator; refines MutableForwardIterator; }; concept RandomAccessIterator { refines BidirectionalIterator; refines LessThanComparable; fun operator +(X, difference) -> X@; fun operator -(X, difference) -> X@; fun operator -(X, X) -> difference@; }; concept MutableRandomAccessIterator { refines RandomAccessIterator; refines MutableBidirectionalIterator; }; return result; } fun copy where { RandomAccessIterator, OutputIterator.value> } (Iter1@ first, Iter1 last, Iter2@ result) -> Iter2@ { for (n = last - first; n > zero(); --n, ++first) result << *first; return result; } The use of dispatching algorithms such as copy inside other generic algorithms is challenging because overload resolution is based on the surrogate models from the where clause and not on models deﬁned for the instantiating type arguments. (This rule is needed for separate type checking and compilat ion). Thus, a call to an ov erloaded function such as copy may resolve to a non-optimal overload. Consider the following implementation of merge . The Iter1 and Iter2 types are required to model InputIterator and the body of merge contains two calls to copy . fun merge where { InputIterator, InputIterator, LessThanComparable.value>, InputIterator.value == InputIterator.value, OutputIterator.value> } (Iter1@ first1, Iter1 last1, Iter2@ first2, Iter2 last2, Iter3@ result) -> Iter3@ { ... return copy(first2, last2, copy(first1, last1, result)); } 34 This merge function always calls the slow version of copy ev en though the actual iterators may be random access. In C ++ , with tag dispatching, the fast version of copy is called because the ov erload resolution occurs after template instantiation. Ho wev er, C ++ does not hav e separate type checking for templates. T o enable dispatching for copy , the type information at the instantiation of merge must be carried into the body of merge (suppose it is instantiated with a random access iterator). This can be done with a combination of concept and model declarations. First, deﬁne a concept with a single operation that corresponds to the algorithm. concept CopyRange { fun copy_range(I1,I1,I2) -> I2@; }; Next, add a requirement for this concept to the type requirements of merge and replace the calls to copy with the concept operation copy_range . fun merge where { ..., CopyRange, CopyRange } (Iter1@ first1, Iter1 last1, Iter2@ first2, Iter2 last2, Iter3@ result) -> Iter3@ { ... return copy_range(first2, last2, copy_range(first1, last1, result)); } The ﬁnal step of the idiom is to create parameterized model declarations for CopyRange . The where clauses of the model deﬁnitions match the where clauses of the respecti ve ov erloads for copy . In the body of each copy_range there is a call to copy which will resolve to the appropriate ov erload. model where { InputIterator, OutputIterator.value> } CopyRange { fun copy_range(Iter1 first, Iter1 last, Iter2 result) -> Iter2@ { return copy(first, last, result); } }; model where { RandomAccessIterator, OutputIterator.value> } CopyRange { fun copy_range(Iter1 first, Iter1 last, Iter2 result) -> Iter2@ { return copy(first, last, result); } }; A call to merge with a random access iterator will use the second model to satisfy the re- quirement for CopyRange . Thus, when copy_range is in voked inside merge , the f ast v ersion of copy is called. A nice property of this idiom is that calls to generic algorithms need not change. A disadvantage of this idiom is that the interface of the generic algorithms becomes more complex. 4.4. Containers The containers of the STL are implemented in G using polymorphic classes. Fig. 20 sho ws an excerpt of the doubly-linked list container in G . As usual, a dummy sentinel node is used in the implementation. W ith each STL container comes iterator types that translate between the uni- form iterator interface and data-structure speciﬁc operations. Fig. 20 shows the list_iterator 35 Fig. 20. Excerpt from a doubly-linked list container in G . struct list_node where { Regular, DefaultConstructible } { list_node* next; list_node* prev; T data; }; class list where { Regular, DefaultConstructible } { list() : n( new list_node()) { n->next = n; n->prev = n; } ~list() { ... } list_node* n; }; class list_iterator where { Regular, DefaultConstructible } { ... list_node* node; }; fun operator * where { Regular, DefaultConstructible } (list_iterator x) -> T { return x.node->data; } fun operator ++ where { Regular, DefaultConstructible } (list_iterator! x) -> list_iterator! { x.node = x.node->next; return x; } fun begin where { Regular, DefaultConstructible } (list l) -> list_iterator@ { return @list_iterator(l.n->next); } fun end where { Regular, DefaultConstructible } (list l) -> list_iterator@ { return @list_iterator(l.n); } which implements operator * in terms of x.node->data and implements operator ++ by per - forming the assignment x.node = x.node->next . Not shown in Fig. 20 is the implementation of the mutable iterator for list (the list_iterator provides read-only access). The deﬁnitions of the two iterator types are nearly identical, the only difference is that operator * returns by read-only reference for the constant iterator whereas it returns by read-write reference for the mutable iterator . The code for these two iterators should be reused but G does not yet hav e a language mechanism for this kind of reuse. In C ++ this kind of reuse can be expressed using the Curiously Recurring T emplate Pattern (CR TP) [52] and by parameterizing the base iterator class on the return type of operator * . This approach can not be used in G because the parameter passing mode may not be parameterized. Further , the semantics of polymorphism in G does not match the intended use here, we want to gener ate code for the two iterator types at library construction time. A separate generative mechanism is needed to complement the generic features of G . As a temporary solution, we used the m4 macro system to factor the common code from the iterators. The following is an excerpt from the implementation of the iterator operators. define(‘forward_iter_ops’, ‘fun operator* where { Regular, DefaultConstructible } ($1 x) -> T $2 { return x.node->data; } ...’) forward_iter_ops(list_iterator, &) / ∗ r eadonly ∗ / 36 forward_iter_ops(mutable_list_iter, !) / ∗ r eadwrite ∗ / 4.5. Adaptors The reverse_iterator class is a representati ve e xample of an STL adaptor . class reverse_iterator where { Regular, DefaultConstructible } { reverse_iterator(Iter base) : curr(base) { } reverse_iterator(reverse_iterator other) : curr(other.curr) { } Iter curr; }; The Regular requirement on the underlying iterator is needed for the copy constructor and DefaultConstructible is needed for the default constructor . This adaptor ﬂips the direction of tra versal of the underlying iterator , which is accomplished with the following operator * and operator ++ . There is a call to operator -- on the underlying Iter type we need to include the requirement for Bidir ectional Iterator . fun operator * where { BidirectionalIterator } (reverse_iterator r) -> BidirectionalIterator.value { let tmp = @Iter(r.curr); return *--tmp; } fun operator ++ where { BidirectionalIterator } (reverse_iterator! r) -> reverse_iterator! { --r.curr; return r; } Polymorphic model deﬁnitions are used to establish that reverse_iterator is a model of the iterator concepts, as we discussed in Section 3.2. 5. The Boost Graph Library A group of us at the Open Systems Lab performed a comparati ve study of language sup- port for generic programming [17]. W e ev aluated a half dozen modern programming languages by implementing a subset of the Boost Graph Library [13] in each language. W e implemented a family of algorithms associated with breadth-ﬁrst search, including Dijkstra’ s single-source shortest paths [53] and Prim’ s minimum spanning tree algorithms [54]. This section extends the previous study to include G . W e give a brief ov erview of the BGL, describe the implementation of the BGL in G , and compare the results to those in our earlier study [17]. 5.1. An overview of the BGL gr aph searc h algorithms Figure 21 depicts some graph search algorithms from the BGL, their relationships, and how they are parameterized. Each large box represents an algorithm and the attached small boxes represent type parameters. An arrow labeled from one algorithm to another speciﬁes that one algorithm is implemented using the other . An arrow labeled from a type parameter to an unboxed name speciﬁes that the type parameter must model that concept. For example, the breadth-ﬁrst search algorithm has three type parameters: G , C , and Vis . Each of 37 these has requirements: G must model the V ertex List Graph and Incidence Graph concepts, C must model the Read/Write Map concept, and Vis must model the BFS V isitor concept. The breadth-ﬁrst search algorithm is implemented using the graph search algorithm. Breadth-First Search G Dijkstra Shortest Paths G D W < + Johnson All-Pairs G W < + Prim Min Span Tree G D W < Graph Search G Vis B Incidence Graph Vertex List Graph Bellman-Ford Shortest Paths G D W < + Edge List Graph Read-Map Read/Write-Map Read/Write-Map Read-Map C Read/Write-Map Vertex List Graph Vis BFS Visitor Visitor Bag C Read/Write-Map Fig. 21. Graph algorithm parameterization and reuse within the Boost Graph Library . Arrows for redundant models relationships are not shown. F or example, the type parameter G of breadth-ﬁrst search must also model Incidence Graph because breadth-ﬁrst search uses graph search. The core algorithm of this library is graph search, which traverses a graph and performs user- deﬁned operations at certain points in the search. The order in which vertices are visited is con- trolled by a type argument, B , that models the Bag concept. This concept abstracts a data structure with insert and remo ve operations b ut no requirements on the order in which items are removed. When B is bound to a FIFO queue, the traversal order is breadth-ﬁrst. When it is bound to a priority queue based on distance to a source vertex, the order is closest-ﬁrst, as in Dijkstra’ s single-source shortest paths algorithm. Graph search is also parameterized on actions to take at ev ent points during the search, such as when a vertex is ﬁrst discovered. This parameter, Vis , must model the V isitor concept (which is not to be confused with the V isitor design pattern). The graph search algorithm also takes a type parameter C for mapping each verte x to a color and C must model the Read/Write Map concept. The colors are used as markers to keep track of the progression of the algorithm through the graph. The Read Map and Read/Write Map concepts represent variants of an important abstraction in the graph library: the pr operty map . In practice, graphs represent domain-speciﬁc entities. For example, a graph might depict the layout of a communication network, its vertices repre- senting endpoints and its edges representing direct links. In addition to the number of vertices and the edges between them, a graph may associate v alues to its elements. Each vertex of a communication network graph might have a name and each edge a maximum transmission rate. Some algorithms require access to domain information associated with the graph representation. For example, Prim’ s minimum spanning tree algorithm requires “weight” information associated with each edge in a graph. Property maps provide a conv enient implementation-agnostic means of expressing, to algorithms, relations between graph elements and domain-speciﬁc data. Some graph data structures directly contain associated v alues with each node; others use external asso- ciativ e data structures to implement these relationships. Interfaces based on property maps work equally well with both representations. The graph algorithms are all parameterized on the graph type. Breadth-ﬁrst search takes a type parameter G , which must model two concepts, Incidence Graph and V erte x List Graph . The 38 Incidence Graph concept deﬁnes an interface for accessing out-edges of a vertex. V ertex List Graph speciﬁes an interface for accessing the vertices of a graph in an unspeciﬁed order . The Bellman-Ford shortest paths algorithm [55] requires a model of the Edge List Graph concept, which provides access to all the edges of a graph. That graph capabilities are partitioned among three concepts illustrates generic programming’ s emphasis on minimal algorithm requirements. The Bellman-Ford shortest paths algorithm re- quires of a graph only the operations described by the Edge List Graph concept. Breadth-ﬁrst search, in contrast, requires the functionality of two separate concepts. By partitioning the func- tionality of graphs, each algorithm can be used with any data type that meets its minimum re- quirements. If the three ﬁne-grained graph concepts were replaced with one monolithic concept, each algorithm would require more from its graph type parameter than necessary and w ould thus unnecessarily restrict the set of types with which it could be used. The graph library design is suitable for e valuating generic programming capabilities of lan- guages because its implementation in volves a rich variety of generic programming techniques. Most of the algorithms are implemented using other library algorithms: breadth-ﬁrst search and Dijkstra’ s shortest paths use graph search, Prim’ s minimum spanning tree algorithm uses Dijk- stra’ s algorithm, and Johnson’ s all-pairs shortest paths algorithm [56] uses both Dijkstra’ s and Bellman-Ford shortest paths. Furthermore, type parameters for some algorithms, such as the G parameter to breadth-ﬁrst search, must model multiple concepts. In addition, the algorithms require certain relationships between type parameters. For example, consider the graph search algorithm. The C type argument, as a model of Read/Write Map , is required to have an associ- ated ke y type. The G type argument is required to have an associated vertex type. Graph search requires that these two types be the same. As in our earlier study , we focus the e valuation on the interface of the breadth-ﬁrst search algorithm and the infrastructure surrounding it, including concept deﬁnitions and an example use of the algorithm. 5.2. Implementation in G So far we hav e implemented breadth-ﬁrst search and Dijkstra’ s single-source shortest paths in G . This required deﬁning sev eral of the graph and property map concepts and an implementation of the adjacency list class, a FIFO queue, and a priority queue. The interface for the breadth-ﬁrst search algorithm is straightforward to express in G . It has three type parameters: the graph type G , the color map type C , and the visitor type Vis . The requirements on the type parameters are expressed with a where clause, using concepts that we describe belo w . In the interface of breadth first search , associated types and same-type constraints play an important role in accurately tracking the relationships between the graph type, its verte x descriptor type, and the color property map. type Color = int ; let black = 0; let gray = 1; let white = 2; fun breadth_first_search where { IncidenceGraph, VertexListGraph, ReadWritePropertyMap, PropertyMap.key == IncidenceGraph.vertex_descriptor, 39 PropertyMap.value == Color, BFSVisitor } (G g, IncidenceGraph.vertex_descriptor@ s, C c, Vis vis) { / ∗ ... ∗ / } Figure 22 shows the deﬁnition of se veral graph concepts in G . The Graph concept requires the associated types vertex descriptor and edge descriptor and some basic functionality for those types such as copy construction and equality comparison. This concept also includes the source and target functions. The Graph concept serves to factor common requirements out of the IncidenceGraph and VertexListGraph concepts. The IncidenceGraph concept introduces the capability to access out-edges of a vertex. The access is provided by the out edge iterator associated type. The requirements for the out- edge iterator are slightly more than the standard InputIterator concept and slightly less than the ForwardIterator concept. The out-edge iterator must allo w for multiple passes but deref- erencing an out-edge iterator need not return a reference (for example, it may return by-value instead). Thus we deﬁne the following ne w concept to express these requirements. concept MultiPassIterator { refines DefaultConstructible; refines InputIterator; // semantic r equirement: allow multiple passes thr ough the range }; In Figure 22, the IncidenceGraph concept uses same-type constraints to require that the value type of the iterator to be the same type as the edge descriptor . The VertexListGraph concepts adds the capability of trav ersing all the vertices in the graph using the associated vertex iterator . Figure 23 shows the implementation of a graph in terms of a vector of singly-linked lists. V ertex descriptors are integers and edge descriptors are pairs of integers. The out-edge iterator is implemented with the vg out edge iter class whose implementation is shown in Figure 24. The basic idea behind this iterator is to provide a dif ferent view of the list of target vertices, making it appear as a list of source-target pairs. The property map concepts are deﬁned in Figure 25. The ReadWritePropertyMap is a re- ﬁnement of the ReadablePropertyMap concept, which requires the get function, and the WritablePropertyMap concept, which requires the put function. Both of these concepts re- ﬁne the PropertyMap concept which includes the associated key and value types. Figure 26 shows the deﬁnition of the BFSVisitor concept. This concept is naturally expressed as a multi-parameter concept because the visitor and graph types are independent: a particular visitor may be used with man y dif ferent concrete graph types and vice versa. The use of refines for Graph in BFSVisitor is some what odd, require would be more natural, but the reﬁnement provides direct (and con venient) access to the vertex and edge descriptor types. An alternative would be use to require and some type aliases, but type aliases ha ve not yet been added to concept deﬁnitions. Figure 27 presents an example use of the breadth_first_search function to output vertices in breadth-ﬁrst order . T o do so, the test_vis visitor o verrides the function discover_vertex ; empty implementations of the other visitor functions are provided by default_bfs_visitor . A graph is constructed using the AdjacencyList class, and then breadth_first_search is called. 40 Fig. 22. Graph concepts in G . concept Graph { type vertex_descriptor; require DefaultConstructible; require Regular; require EqualityComparable; type edge_descriptor; require DefaultConstructible; require Regular; require EqualityComparable; fun source(edge_descriptor, G) -> vertex_descriptor@; fun target(edge_descriptor, G) -> vertex_descriptor@; }; concept IncidenceGraph { refines Graph; type out_edge_iterator; require MultiPassIterator; edge_descriptor == InputIterator.value; fun out_edges(vertex_descriptor, G) -> pair@; fun out_degree(vertex_descriptor, G) -> int@; }; concept VertexListGraph { refines Graph; type vertex_iterator; require MultiPassIterator; vertex_descriptor == InputIterator.value; fun vertices(G) -> pair@; fun num_vertices(G) -> int@; }; 6. Related W ork There is a long history of programming language support for polymorphism, dating back to the 1970s [20, 57, 58, 59]. An early precursor to G ’ s concept feature can be seen in CLU’ s type set feature [58]. T ype sets differ from concepts in that they rely on structural conformance whereas concepts use nominal conformance established by a model deﬁnition. Also, G provides a means for composing concepts via reﬁnement whereas CLU does not provide a means for composing type sets. Finally , CLU does not provide support for associated types. In mathematics, the notion of algebraic structure is equiv alent to G ’ s concept, and has been in 41 Fig. 23. Implementation of a graph with a vector of lists. fun source(pair< int , int > e, vector< slist< int > >) -> int@ { return e.first; } fun target(pair< int , int > e, vector< slist< int > >) -> int@ { return e.second; } model Graph< vector< slist< int > > > { type vertex_descriptor = int ; type edge_descriptor = pair< int , int >; }; fun out_edges( int src, vector< slist< int > > G) -> pair@ { return make_pair(@vg_out_edge_iter(src, begin(G[src])), @vg_out_edge_iter(src, end(G[src]))); } fun out_degree( int src, vector< slist< int > > G) -> int@ { return size(G[src]); } model IncidenceGraph< vector< slist< int > > > { type out_edge_iterator = vg_out_edge_iter; }; fun vertices(vector< slist< int > > G) -> pair@ { return make_pair(@counting_iter(0), @counting_iter(size(G))); } fun num_vertices(vector< slist< int > > G) -> int@ { return size(G); } model VertexListGraph< vector< slist< int > > > { type vertices_size_type = int ; type vertex_iterator = counting_iter; }; use for a very long time [60]. T ype classes The concept feature in G is heavily inﬂuenced by the type class feature of Haskell [61], with its nominal conformance and explicit model deﬁnitions. Howe ver , G ’ s sup- port for associated types, same type constraints, and concept-based overloading is novel. Also, G ’ s type system is fundamentally different from Haskell’ s: it is based on System F [20, 57] in- stead of Hindley-Milner type inference [59]. This difference has some repercussions. In G there is more control over the scope of concept operations because where clauses introduce concept operations into the scope of the body . This difference allows Haskell to infer type requirements but induces the restriction that two type classes in the same module may not ha ve operations with the same name. A dif ference we discussed in Section 3.6.1 is that in G , overlapping models may coexist in separate scopes but still be used in the same program, whereas in Haskell ov erlapping models may not be used in the same program. Haskell performed quite well in our compara- tiv e study of support for generic programming [17]. Howe ver , we pointed out that Haskell was 42 Fig. 24. Out-edge iterator for the vector of lists. class vg_out_edge_iter { vg_out_edge_iter() { } vg_out_edge_iter( int src, slist_iterator< int > iter) : src(src), iter(iter) { } vg_out_edge_iter(vg_out_edge_iter x) : iter(x.iter), src(x.src) { } slist_iterator< int > iter; int src; }; fun operator =(vg_out_edge_iter! me, vg_out_edge_iter other) -> vg_out_edge_iter! { me.iter = other.iter; me.src = other.src; return me; } model DefaultConstructible { }; model Regular { }; fun operator ==(vg_out_edge_iter x, vg_out_edge_iter y) -> bool@ { return x.iter == y.iter; } fun operator !=(vg_out_edge_iter x, vg_out_edge_iter y) -> bool@ { return x.iter != y.iter; } model EqualityComparable { }; fun operator *(vg_out_edge_iter x) -> pair< int , int >@ { return make_pair(x.src, *x.iter); } fun operator ++(vg_out_edge_iter! x) -> vg_out_edge_iter! { ++x.iter; return x; } model InputIterator { type value = pair< int , int >; type difference = ptrdiff_t; }; model MultiPassIterator { }; missing support for associated types and work to remedy this has been reported in [22, 23]. W ehr , Lammel, and Thiemann[62] have proposed extending Jav a with generalized interfaces, which bear a close resemblance to G ’ s concepts and Haskell’ s type classes, b ut add the capability of run-time dispatch using existential quantiﬁcation. ( G currently provides only uni versal quan- tiﬁcation, although programmers can workaround this limitation with an tricky encoding [63]). Signatures and functors A rough analogy can be made between SML signatures [36] and G concepts, and between ML structures and G models. Howe ver , there are signiﬁcant dif fer- ences. Functors are module-le vel constructs and therefore provide a more coarse-grained mecha- nism for parameterization than do generic functions. More importantly , functors require explicit instantiation with a structure, thereby making their use more heavyweight than generic func- tions in F G , which perform automatic lookup of the required model or instance. The associated types and same-type constraints of G are roughly equiv alent to types nested in ML signatures and to type sharing respectively . W e reuse some implementation techniques from ML such as a union/ﬁnd-based algorithm for deciding type equality [64]. There are numerous other languages 43 Fig. 25. Property map concepts in G . concept PropertyMap { type key; type value; }; concept ReadablePropertyMap { refines PropertyMap; fun get(Map, key) -> value; }; concept WritablePropertyMap { refines PropertyMap; fun put(Map, key, value); }; concept ReadWritePropertyMap { refines ReadablePropertyMap; refines WritablePropertyMap; }; Fig. 26. Breadth-ﬁrst search visitor concept. concept BFSVisitor { refines Regular; refines Graph; fun initialize_vertex(Vis v, vertex_descriptor d, G g) {} fun discover_vertex(Vis v, vertex_descriptor d, G g) {} fun examine_vertex(Vis v, vertex_descriptor d, G g) {} fun examine_edge(Vis v, edge_descriptor d, G g) {} fun tree_edge(Vis v, edge_descriptor d, G g) {} fun non_tree_edge(Vis v, edge_descriptor d, G g) {} fun gray_target(Vis v, edge_descriptor d, G g) {} fun black_target(Vis v, edge_descriptor d, G g) {} fun finish_vertex(Vis v, vertex_descriptor d, G g) {} }; with parameterized modules [65, 66, 67] that require explicit instantiation with a structure. Recently , Dreyer , Harper, Chakra varty , and Keller proposed an extension of SML signatures/- functors, call modular type classes [68], that provides many of the beneﬁts of Haskell type classes such as implicit instantiation and instance passing. The design for modular type classes differs from concepts in G primarily in that it supports the conv enience of type inference at the price of disallowing o verlapping instances in a giv en scope and ﬁrst-class polymorphism. Subtype-bound polymorphism Less closely related to G are languages based on subtype- bounded polymorphism [69] such as Jav a, C#, and Eiffel. W e found subtype-bounded poly- 44 Fig. 27. Example use of the BFS generic function. struct test_vis { }; fun discover_vertex(test_vis, int v, G g) { printf("%d ", v); } model where { Graph, Graph.vertex_descriptor == int } BFSVisitor { }; fun main() -> int@ { let n = 7; let g = @vector< slist< int > >(n); push_front(1, g[0]); push_front(4, g[0]); push_front(2, g[1]); push_front(3, g[1]); push_front(4, g[3]); push_front(6, g[3]); push_front(5, g[4]); let src = 0; let color = new Color[n]; for ( let i = 0; i != n; ++i) color[i] = white; breadth_first_search(g, src, color, @test_vis()); return 0; } morphism less suitable for generic programming and refer the reader to [17] for an in-depth discussion. Row variable polymorphism OCaml’ s object types[37, 70] and polymorphism o ver row vari- ables provide fairly good support for generic programming. Howe ver , OCaml lacks support for associated types so it suffers from clutter due to extra type parameters in generic functions. PolyTOIL [71], with its match-bound polymorphism, provides similar support for generic pro- gramming as OCaml but also lacks associated types. V irtual types One of the proposed solutions for dealing with binary methods and associated types in object-oriented languages is virtual types , that is, the nesting of abstract types in inter- faces and type deﬁnitions within classes or objects. The beginning of this line of research was the virtual patterns feature of the BET A language [72]. Patterns are a generalization of classes, ob- jects, and procedures. An adaptation of virtual patterns to object-oriented classes, called virtual classes , was created by Madsen and Moller -Pedersen [73] and an adaptation for Jav a was created by Thorup [74]. These early designs for virtual types were not statically type safe, but relied on dynamic type checking. Howe ver , a statically type safe version was created by T orgersen [75]. A statically type safe version of BET A ’ s virtual patterns was de veloped for the gbeta language of Ernst [76, 77]; the Scala programming language also includes type safe virtual types [78, 79]. 7. Conclusion This article presents a ne w programming language named G that is designed to meet the needs of large-scale generic libraries. W e demonstrated this with an implementation of the Standard 45 T emplate Library (STL) and the Boost Graph Library (BGL). W e were able to implement all of the abstractions in the STL and BGL in a straightforward manner . Further , G is particularly well- suited for the development of reusable components due to its support of modular type checking and separate compilation. G ’ s strong type system provides support for the independent v alidation of components and G ’ s system of concepts and constraints allo ws for rich interactions between components without sacriﬁcing encapsulation. The language features present in G promise to in- crease programmer productivity with respect to the de velopment and use of generic components. Acknowledgments W e thank Ronald Garcia, Jeremiah W illcock, Doug Gregor , Jaakko J ¨ arvi, Dave Abrahams, Dav e Musser , and Alexander Stepanov for many discussions and collaborations that informed this work. This work was supported by NSF grants EIA-0131354 and CCF-0702362, and by a grant from the Lilly Endowment. References [1] Randell, B.: Software engineering in 1968. In: ICSE ’79: Proceedings of the 4th interna- tional conference on Software engineering, Piscataw ay , NJ, USA, IEEE Press (1979) 1–10 [2] Frederick P . Brooks, J.: The Mythical Man-Month: Essays on Softw . Addison-W esley Longman Publishing Co., Inc., Boston, MA, USA (1978) [3] McIlroy , M.D.: Mass-produced software components. In Buxton, J.M., Naur , P ., Randell, B., eds.: Proceedings of Software Engineering Concepts and T ech- niques, 1968 N A TO Conference on Software Engineering. (1969) 138–155 http://www.cs.dartmouth.edu/ ~ doug/components.txt . [4] Clements, P ., Northrop, L.: Software Product Lines: Practices and Patterns. Addison W es- ley , Reading, MA (2002) [5] Kapur , D., Musser , D.R., Stepanov , A.: Operators and algebraic structures. In: Proc. of the Conference on Functional Programming Languages and Computer Architecture, Portsmouth, New Hampshire, A CM (1981) [6] Musser , D.R., Stepanov , A.A.: Generic programming. In Gianni, P .P ., ed.: Symbolic and algebraic computation: ISSAC ’88, Rome, Italy , July 4–8, 1988: Proceedings. V olume 358 of Lecture Notes in Computer Science., Berlin, Springer V erlag (1989) 13–25 [7] Musser , D.R., Stepanov , A.A.: A library of generic algorithms in Ada. In: Using Ada (1987 International Ada Conference), New Y ork, NY , A CM SIGAda (1987) 216–225 [8] Kershenbaum, A., Musser , D., Stepanov , A.: Higher order imperati ve programming. T ech- nical Report 88-10, Rensselaer Polytechnic Institute (1988) [9] Stroustrup, B.: Parameterized types for C++. In: USENIX C++ Conference. (1988) [10] Stepanov , A.A., Lee, M.: The Standard T emplate Library. T echnical Report X3J16/94- 0095, WG21/N0482, ISO Programming Language C++ Project (1994) [11] Austern, M.H.: Generic programming and the STL: Using and extending the C++ Standard Template Library . Professional Computing Series. Addison-W esley Longman Publishing Co., Inc., Boston, MA, USA (1998) 46 [12] K ¨ othe, U.: Reusable Software in Computer V ision. In: Handbook on Computer V ision and Applications. V olume 3. Acadamic Press (1999) [13] Siek, J., Lee, L.Q., Lumsdaine, A.: The Boost Graph Library: User Guide and Reference Manual. Addison-W esley Longman Publishing Co., Inc., Boston, MA, USA (2002) [14] Boissonnat, J.D., Cazals, F ., Da, F ., Devillers, O., Pion, S., Rebuf at, F ., T eillaud, M., Yvinec, M.: Programming with CGAL: the example of triangulations. In: Proceedings of the ﬁf- teenth annual symposium on Computational geometry , ACM Press (1999) 421–422 [15] Pitt, W .R., Williams, M.A., Stev en, M., Sweeney , B., Bleasby , A.J., Moss, D.S.: The bioinformatics template library: generic components for biocomputing. Bioinformatics 17 (2001) 729–737 [16] T royer , M., T odo, S., Trebst, S., and, A.F .: (ALPS: Algorithms and Libraries for Physics Simulations) http://alps.comp- phys.org/ . [17] Garcia, R., J ¨ arvi, J., Lumsdaine, A., Siek, J., W illcock, J.: A comparati ve study of lan- guage support for generic programming. In: OOPSLA ’03: Proceedings of the 18th annual A CM SIGPLAN conference on Object-oriented programing, systems, languages, and ap- plications, New Y ork, NY , USA, A CM Press (2003) 115–134 [18] Garcia, R., Jarvi, J., Lumsdaine, A., Siek, J., Willcock, J.: An extended comparative study of language support for generic programming. Journal of Functional Programming 17 (2007) 145–205 [19] Siek, J., Lumsdaine, A.: Essential language support for generic programming. In: PLDI ’05: Proceedings of the A CM SIGPLAN 2005 conference on Programming language design and implementation, New Y ork, NY, USA, ACM Press (2005) 73–84 [20] Girard, J.Y .: Interpr ´ etation Fonctionnelle et ´ Elimination des Coupures de l’Arithm ´ etique d’Ordre Sup ´ erieur . Th ` ese de doctorat d’ ´ etat, Univ ersit ´ e Paris VII, P aris, France (1972) [21] Reynolds, J.C.: T ypes, abstraction and parametric polymorphism. In Mason, R.E.A., ed.: Information Processing 83, Amsterdam, Else vier Science Publishers B. V . (North-Holland) (1983) 513–523 [22] Chakrav arty , M.M.T ., K eller , G., Peyton Jones, S., Marlow , S.: Associated types with class. In: POPL ’05: Proceedings of the 32nd A CM SIGPLAN-SIGA CT symposium on Principles of programming languages, New Y ork, NY , USA, A CM Press (2005) 1–13 [23] Chakrav arty , M.M.T ., Keller , G., Peyton Jones, S.: Associated type synonyms. In: ICFP ’05: Proceedings of the International Conference on Functional Programming, Ne w Y ork, NY , USA, ACM Press (2005) 241–253 [24] Jarvi, J., Gregor , D., Willcock, J., Lumsdaine, A., Siek, J.G.: Algorithm specialization in generic programming - challenges of constrained generics in C++. In: PLDI ’06: Pro- ceedings of the ACM SIGPLAN 2006 conference on Programming language design and implementation, New Y ork, NY, USA, ACM Press (2006) [25] Siek, J.G.: A Language for Generic Programming. PhD thesis, Indiana Uni versity (2005) [26] Siek, J., Lumsdaine, A.: Language requirements for large-scale generic libraries. In: GPCE ’05: Proceedings of the fourth international conference on Generati ve Programming and Component Engineering. (2005) accepted for publication. [27] Gregor , D., J ¨ arvi, J., Siek, J.G., Reis, G.D., Stroustrup, B., Lumsdaine, A.: Concepts: Lin- guistic support for generic programming in C++. In: Proceedings of the 2006 A CM SIG- PLAN conference on Object-oriented programming, systems, languages, and applications (OOPSLA ’06). (2006) [28] Peyton Jones, S., Jones, M., Meijer , E.: T ype classes: an exploration of the design space. In: Proceedings of the Second Haskell W orkshop. (1997) 47 [29] Jazayeri, M., Loos, R., Musser, D., Stepanov , A.: Generic Programming. In: Report of the Dagstuhl Seminar on Generic Programming, Schloss Dagstuhl, Germany (1998) [30] Austern, M.: (draft) technical report on standard library extensions. T echnical Report N1711=04-0151, ISO/IEC JTC 1, Information T echnology , Subcommittee SC 22, Program- ming Language C++ (2004) [31] Silicon Graphics, Inc.: SGI Implementation of the Standard T emplate Library . (2004) http://www.sgi.com/tech/stl/ . [32] Musser , D.R.: Introspectiv e sorting and selection algorithms. Software Practice and Expe- rience 27 (1997) 983–993 [33] Hoare, C.A.R.: Algorithm 64: Quicksort. Communications of the A CM 4 (1961) 321 [34] Myers, N.C.: Traits: a ne w and useful template technique. C++ Report (1995) [35] J ¨ arvi, J., W illcock, J., Lumsdaine, A.: Algorithm specialization and concept constrained genericity . In: Concepts: a Linguistic Foundation of Generic Programming, Adobe Systems (2004) [36] Milner , R., T ofte, M., Harper, R.: The Deﬁnition of Standard ML. MIT Press (1990) [37] Leroy , X., Doligez, D., Garrigue, J., R ´ emy , D., V ouillon, J.: The Objecti ve Caml Documen- tation and User’ s Manual. (2003) [38] Liskov , B., Atkinson, R., Bloom, T ., Moss, E., Schaffert, C., Scheiﬂer , B., Snyder , A.: CLU reference manual. T echnical Report LCS-TR-225, MIT , Cambridge, MA, USA (1979) [39] Ditchﬁeld, G.J.: Overvie w of Cforall. University of W aterloo (1996) [40] Magnusson, B.: Code reuse considered harmful. Journal of Object-Oriented Programming 4 (1991) [41] J ¨ arvi, J., Stroustrup, B., Gregor , D., Siek, J.: Decltype and auto. T echnical Report N1478=03-0061, ISO/IEC JTC 1, Information T ech- nology , Subcommittee SC 22, Programming Language C++ (2003) http://www.open- std.org/jtc1/sc22/wg21/docs/papers/2003/n1478.pdf . [42] Mitchell, J.C.: Polymorphic type inference and containment. Information and Computation 76 (1988) 211–249 [43] T iuryn, J., Urzyczyn, P .: The subtyping problem for second-order types is undecidable. Information and Computation 179 (2002) 1–18 [44] Le Botlan, D., R ´ emy , D.: MLF: Raising ML to the power of System-F. In: Proceedings of the International Conference on Functional Programming (ICFP 2003), Uppsala, Sweden, A CM Press (2003) 27–38 [45] Odersky , M., L ¨ aufer , K.: Putting type annotations to work. In: Proceedings of the 23rd A CM SIGPLAN-SIGACT symposium on Principles of programming languages, ACM Press (1996) 54–67 [46] Jones, S.P ., Shields, M.: Practical type inference for arbitrary-rank types. submitted to the Journal of Functional Programming (2004) [47] chieh Shan, C.: Sexy types in action. SIGPLAN Notices 39 (2004) 15–22 [48] Horn, A.: On sentences which are true of direct unions of algebras. Journal of Symbolic Logic 16 (1951) 14–21 [49] International Organization for Standardization: ISO/IEC 14882:1998: Programming lan- guages — C++, Genev a, Switzerland (1998) [50] Nelson, G., Oppen, D.C.: Fast decision procedures based on congruence closure. J. A CM 27 (1980) 356–364 [51] Downe y , P .J., Sethi, R., T arjan, R.E.: V ariations on the common subexpression problem. Journal of the A CM (J ACM) 27 (1980) 758–771 48 [52] Coplien, J.: Curiously recurring template patterns. C++ Report (1995) 24–27 [53] Dijkstra, E.: A note on two problems in connexion with graphs. Numerische Mathematik 1 (1959) 269–271 [54] Prim, R.: Shortest connection networks and some generalizations. Bell System T echnical Journal 36 (1957) 1389–1401 [55] Bellman, R.: On a routing problem. Quarterly of Applied Mathematics 16 (1958) 87–90 [56] Johnson, D.B.: Ef ﬁcient algorithms for shortest paths in sparse networks. Journal of the A CM 24 (1977) 1–13 [57] Reynolds, J.C.: T owards a theory of type structure. In Robinet, B., ed.: Programming Symposium. V olume 19 of LNCS., Berlin, Springer-V erlag (1974) 408–425 [58] Liskov , B., Snyder , A., Atkinson, R., Schaf fert, C.: Abstraction mechanisms in CLU. Com- munications of the A CM 20 (1977) 564–576 [59] Milner , R.: A theory of type polymorphism in programming. Journal of Computer and System Sciences 17 (1978) 348–375 [60] Bourbaki, N.: Elements of Mathematics. Theory of Sets. Springer (1968) [61] W adler , P ., Blott, S.: Ho w to make ad-hoc polymorphism less ad-hoc. In: A CM Symposium on Principles of Programming Languages, A CM (1989) 60–76 [62] W ehr , S., L ¨ ammel, R., Thiemann, P .: JavaGI: Generalized Interfaces for Jav a. In: ECOOP 2007, Proceedings. LNCS, Springer-V erlag (2007) 25 pages; T o appear . [63] Pierce, B.C.: T ypes and programming languages. MIT Press, Cambridge, MA, USA (2002) [64] MacQueen, D.: An implementation of Standard ML modules. In: Proceedings of the 1988 A CM Conference on LISP and Functional Programming, Snowbird, UT, New Y ork, NY , A CM (1988) 212–223 [65] Poll, E., Thompson, S.: The T ype System of Aldor. T echnical Report 11-99, Computing Laboratory , University of K ent at Canterbury , Kent CT2 7NF , UK (1999) [66] Goguen, J.A., Wink er , T ., Meseguer , J., Futatsugi, K., Jouannaud, J.P .: Introducing OBJ. In: Applications of Algebraic Speciﬁcation using OBJ. Cambridge Uni versity Press (1992) [67] : Ada 95 Reference Manual. (1997) [68] Dreyer , D., Harper , R., Chakrav arty , M.M.T ., Keller , G.: Modular type classes. In: POPL ’07: Proceedings of the 34th annual A CM SIGPLAN-SIGA CT symposium on Principles of programming languages, New Y ork, NY , USA, A CM Press (2007) 63–70 [69] Canning, P ., Cook, W ., Hill, W ., Olthoff, W ., Mitchell, J.C.: F-bounded polymorphism for object-oriented programming. In: FPCA ’89: Proceedings of the fourth international conference on Functional programming languages and computer architecture, New Y ork, NY , USA, ACM Press (1989) 273–280 [70] R ´ emy , D., V ouillon, J.: Objecti ve ML: An ef fectiv e object-oriented extension to ML. The- ory And Practice of Object Systems 4 (1998) 27–50 A preliminary version appeared in the proceedings of the 24th A CM Conference on Principles of Programming Languages, 1997. [71] Bruce, K.B., Schuett, A., van Gent, R.: PolyTOIL: A type-safe polymorphic object-oriented language. In Olthoff, W ., ed.: Proceedings of ECOOP ’95 . Number 952 in Lecture Notes in Computer Science, Springer-V erlag (1995) 27–51 [72] Kristensen, B.B., Madsen, O.L., Møller-Pedersen, B., Nygaard, K.: Abstraction mechanisms in the BET A programming language. In: POPL ’83: Proceedings of the 10th A CM SIGA CT -SIGPLAN symposium on Principles of programming languages, Ne w Y ork, NY , USA, ACM Press (1983) 285–298 [73] Madsen, O.L., Moller-Pedersen, B.: V irtual classes: a po werful mechanism in object- oriented programming. In: OOPSLA ’89: Conference proceedings on Object-oriented pro- 49 gramming systems, languages and applications, New Y ork, NY , USA, A CM Press (1989) 397–406 [74] Thorup, K.K.: Genericity in Java with virtual types. In: ECOOP ’97. V olume 1241 of Lecture Notes in Computer Science. (1997) 444–471 [75] T orgersen, M.: V irtual types are statically safe. In: FOOL 5: The Fifth International W ork- shop on Foundations of Object-Oriented Languages. (1998) [76] Ernst, E.: gbeta – a Language with V irtual Attributes, Block Structure, and Propagating, Dynamic Inheritance. PhD thesis, Department of Computer Science, Univ ersity of Aarhus, ˚ Arhus, Denmark (1999) [77] Ernst, E.: Family polymorphism. In: ECOOP ’01. V olume 2072 of Lecture Notes in Com- puter Science., Springer (2001) 303–326 [78] Odersky , M., Cremet, V ., R ¨ ockl, C., Zenger , M.: A nominal theory of objects with depen- dent types. In: Proc. ECOOP’03. Springer LNCS (2003) [79] Odersky , M., al.: An overvie w of the Scala programming language. T echnical Report IC/2004/64, EPFL Lausanne, Switzerland (2004) 50

A Language for Generic Programming in the Large

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment