Using RDF to Model the Structure and Process of Systems

Many systems can be described in terms of networks of discrete elements and their various relationships to one another. A semantic network, or multi-relational network, is a directed labeled graph consisting of a heterogeneous set of entities connect…

Authors: ** - Marko A. Rodriguez (Los Alamos National Laboratory) - Jennifer H. Watkins (Los Alamos National Laboratory) - Johan Bollen (Los Alamos National Laboratory) - Carlos Gershenson (New Engl, Complex Systems Institute) **

Using RDF to Mo del the Structure and Pro cess of Systems ∗ Mark o A. Ro driguez Jennifer H. W atkins Johan Bollen Los Alamos National Lab oratory { mark o,jh w,jb ollen } @lanl.go v Carlos Gershenson New England Complex Systems Institute carlos@necsi.org Octob er 28, 2018 Abstract Man y systems can b e described in terms of netw orks of discrete el- emen ts and their v arious relationships to one another. A semantic net- w ork, or multi-relational net w ork, is a directed lab eled graph consisting of a heterogeneous set of entities connected by a heterogeneous set of relationships. Seman tic netw orks serve as a promising general-purp ose mo deling substrate for complex systems. V arious standardized formats and to ols are no w av ailable to supp ort practical, large-scale semantic net- w ork mo dels. First, the Resource Description F ramew ork (RDF) offers a standardized seman tic netw ork data mo del that can b e further formal- ized b y ontology mo deling languages suc h as RDF Sc hema (RDFS) and the W eb On tology Language (OWL). Second, the recen t introduction of highly p erformant triple-stores (i.e. semantic netw ork databases) allows seman tic netw ork mo dels on the order of 10 9 edges to b e efficiently stored and manipulated. RDF and its related technologies are currently used extensiv ely in the domains of computer science, digital library science, and the biological sciences. This article will provide an introduction to RDF/RDFS/O WL and an examination of its suitabilit y to mo del discrete elemen t complex systems. ∗ Rodriguez, M.A., W atkins, J.H., Bollen, J., Gershenson, C., “Using RDF to Model the Structure and Process of Systems”, International Conference on Complex Systems, Boston, Massach usetts, Octob er 2007. 1 1 In tro duction The figurehead of the Seman tic W eb initiativ e, Tim Berners-Lee, describes the Semantic W eb as ... an extension of the curren t w eb in whic h information is giv en w ell-defined meaning, b etter enabling computers and p eople to w ork in co operation [2]. Ho wev er, Berners-Lee’s definition assumes an application space that is sp ecific to the “web” and to the interaction b et ween humans and ma- c hines. More generally , the Semantic W eb is actually a conglomeration of standards and tec hnologies that can be used in v arious disparate applica- tion spaces. The Semantic W eb is simply a highly-distributed, standard- ized semantic net work (i.e. directed lab eled netw ork) data mo del and a set of tools to operate on that data model. With resp ect to the purp ose of this article, the Seman tic W eb and its associated technologies can b e lev eraged to mo del and manipulate an y system that can b e represen ted as a heterogeneous set of discrete elements connected to one another by a set of heterogeneous relationships whether those elements are w eb pages, automata, cells, p eople, cities, etc. This article will introduce complexity science researchers to a collection of standards designed for modeling the heterogeneous relationships that comp ose systems and tec hnologies that supp ort large-scale data sets on the order to 10 9 edges. This article has the follo wing outline. Section 2 presen ts a review of the Resource Description F ramework (RDF). RDF is the standardized data model for representing a semantic net work and is the foundational tec hnology of the Semantic W eb. Section 3 presen ts a review of b oth RDF Sc hema (RDFS) and the W eb On tology Language (OWL). RDFS and O WL are languages for abstractly defining the topological features of an RDF netw ork and are analogous, in some wa ys, to the database schemas of relational databases (e.g. MySQL and Oracle). Section 4 presents a review of triple-store tec hnology and its similarities and differences with the relational database. Finally , Section 5 present s the semantic netw ork programming language Neno and the RDF virtual machine Fhat. 2 The Resource Description F ramew ork The Resource Description F ramew ork (RDF) is a standardized data mo del for representing a seman tic netw ork [5]. RDF is not a syn tax (i.e. data format). There exist v arious RDF syn taxes and dep ending on the ap- plication space one syn tax may b e more appropriate than another. An RDF-based semantic net work is called an RDF net work. An RDF net work differs from the directed net work of common kno wledge b ecause the edges in the netw ork are qualified. F or instance, in a directed netw ork, an edge is represented as an ordered pair ( i, j ). This relationship states that i is related to j by some unspecified t yp e of relationship. Because edges are not qualified, all edges hav e a homogenous meaning in a directed netw ork (e.g. a coauthorship netw ork, a friendship net work, a transportation net- w ork). On the other hand, in an RDF net work, edges are qualified suc h that a relationship is represen ted by an ordered triple h i, ω, j i . A triple 2 can b e in terpreted as a statemen t comp osed of a sub ject, a predicate, and an ob ject. The sub ject i is related to the ob ject j by the predicate ω . F or instance, a sc holarly net work can b e represented as an RDF net work where an article cites an article, an author collaborates with an author, and an author is affiliated with an institution. Because edges are qual- ified, a heterogeneous set of elemen ts can interact in multiple different w ays within the same RDF netw ork represen tation. It is the lab eled edge that mak es the Semantic W eb and the semantic netw ork, in general, an appropriate data mo del for systems that require this level of description. In an RDF netw ork, elements (i.e. v ertices, no des) are called resources and resources are identified by Uniform Resource Identifiers (URI) [1]. The purpose of the URI is to provide a standardized, globally-unique naming conv ention for iden tifying an y t yp e of resource, where a “resource” can b e anything (e.g. ph ysical, virtual, conceptual, etc.). The URI allows ev ery v ertex and edge lab el in a semantic net work to be uniquely iden tified suc h that RDF netw orks from disparate organizations can b e unioned to form larger, and p erhaps more complete, mo dels. The Semantic W eb can span institutional b oundaries to supp ort a w orld-scale mo del. The generic syn tax for a URI is : [ # ] Examples of entities that can b e denoted by a URI include: • a physical ob ject (e.g. http://www.lanl.gov/people#marko ) • a ph ysical comp onen t (e.g. http://www.lanl.gov/people#markos arm ) • a virtual ob ject (e.g. http://www.lanl.gov/index.html ) • an abstract class (e.g. http://www.lanl.gov/people#Human ). Ev en though eac h of the URIs presen ted abov e ha ve an http sc hema name, only one is a Uniform Resource Locator (URL) [9] of p opular knowl- edge: namely , http://www.lanl.gov/index.html . The URL is a subclass of the URI. The URL is an address to a particular harvestable resource. While URIs can p oin t to harvestable resources, in general, it is b est to think of the URI as an address (i.e. pointer) to a particular concept. With resp ects to the previously presented URIs, Marko, his arm, and the class of h umans are all concepts that are uniquely iden tified by some prescrib ed globally-unique URI. Along with URI resources, RDF supp orts the concept of a literal. Example literals include the in teger 1, the string “marko”, the float (or double) 1 . 034, the date 2007-11-30, etc. Refer to the XML Sc hema and Datat yp es (XSD) sp ecification for the complete classification of literals [3]. If U is the set of all URIs and L is the set of all literals, then an RDF net work (or the Semantic W eb in general) can b e formally defined as 1 G ⊆ h U × U × ( U ∪ L ) i . (1) T o ease readability and creation, schema and hierarc hies are usually prefixed (i.e. abbreviated). F or example, in the follo wing tw o triples, lanl is the prefix for http://www.lanl.gov/people# : 1 Note that there also exists the concept of a blank no de (i.e. anonymous node). Blank nodes are imp ortan t for creating n -ary relationships in RDF net works. Please refer to the official RDF sp ecification for more information on the role of blank no des. 3 These triples are diagrammed in Figure 1. The union of all RDF triples is the Semantic W eb. lanl:marko lanl:jhw lanl:worksWith lanl:markos_arm lanl:hasBodyPart Figure 1: Two RDF triples as an RDF netw ork. The b enefit of RDF, and p erhaps what is not generally appreciated, is that with RDF it is p ossible to represen t anything in relation to anything b y an y type of qualified relationship. In many cases, this generality can lead to an uncontrolled soup of relationships; how ever, thanks to ontology languages such as RDFS and OWL, it is p ossible to formally constrain the topological features of an RDF net work and th us, subsets of the larger Seman tic W eb. 3 The RDF Sc hema and W eb On tology Language The Resource Description F ramew ork and Sc hema (RDFS) [4] and the W eb Ontology Language (O WL) [6] are b oth RDF languages used to abstractly define resources in an RDF net work. RDFS is simpler than O WL and is useful for creating class hierarchies and for sp ecifying how instances of those classes can relate to one another. It provides three im- p ortan t constructs: rdfs:domain , rdfs:range , and rdfs:subClassOf 2 . While other constructs exist, these three tend to b e the most frequently used when dev eloping an RDFS on tology . Figure 2 pro vides an example of how these constructs are used. With RDFS (and OWL), there is a sharp distinction betw een the ontological- and instance-level of an RDF net work. The on tological-level defines abstract classes (e.g. lanl:Human ) and how they are related to one another. The instance-level is tied to the ontological-lev el using the rdf:type predicate 3 . F or example, any lanl:Human can b e the rdfs:domain (sub ject) of a lanl:worksFor triple that has a lanl:Institution as its rdfs:range (ob ject). Note that the lanl:Laboratory is an rdfs:subClassOf a lanl:Institution . Ac- cording to the prop ert y of subsumption in RDFS reasoning, sub classes inherit their parent class restrictions. Th us, lanl:marko can hav e a lanl:worksFor relationship with lanl:LANL . Note that RDFS is not in- tended to constrain relationships, but instead to infer new relationships based on restrictions. F or instance, if lanl:marko lanl:worksFor some 2 rdfs is a prefix for http://www.w3.org/2000/01/rdf-schema# 3 rdf is a prefix for http://www.w3.org/1999/02/22-rdf-syntax-ns# 4 lanl:marko lanl:LANL lanl:worksFor lanl:Human lanl:worksFor rdfs:range rdfs:domain rdf:type rdf:type ontology instance lanl:Laboratory lanl:Institution rdfs:subClassOf Figure 2: The relationship betw een an instance and its ontology . other organization denoted X , it is inferred that that X is an rdf:type of lanl:Institution . While this is not intuitiv e for those familiar with constrain t-based database sc hemas, such inferencing of new relationships is the norm in the RDFS and OWL world. Bey ond the previously presen ted RDFS constructs, O WL has one pri- mary construct that is used rep eatedly: owl:Restriction 4 . Example owl:Restriction s include, but are note limited to, owl:maxCardinality , owl:minCardinality , owl:cardinality , owl:hasValue , etc. With OWL, it is possible to state that a lanl:Human can work for no more than 1 lanl:Institution . In suc h cases, the owl:maxCardinality restriction w ould b e sp ecified on the lanl:worksFor predicate. If there exist the triples , an OWL reasoner w ould assume that lanl:LANL and lanl:LosAlamos are the same entit y . This reasoning is due to the cardinality restriction on the lanl:worksFor predicate. There are tw o p opular to ols for creating RDFS and OWL ontologies: Prot ´ eg ´ e 5 (op en source) and T op Braid Comp oser 6 (proprietary). 4 The T riple-Store There are many wa ys in which RDF net works are stored and distributed. In the simple situation, an RDF netw ork is enco ded in one of the many RDF syn taxes and made a v ailable through a w eb server (i.e. as a w eb do c- umen t). In other situations, where RDF netw orks are large, a triple-store is used. A triple-store is to an RDF netw ork what a relational database is to a data table. Other names for triple-stores include semantic rep os- itory , RDF store, graph store, RDF database. There are man y differen t 4 owl is a prefix for http://www.w3.org/2002/07/owl# 5 Prot´ eg´ e av ailable at: http://protege.stanford.edu/ 6 T op Braid Comp oser av ailable at: http://www.topbraidcomposer.com/ 5 propriet y and op en-source triple-store pro viders. The most popular pro- prietary solutions include AllegroGraph 7 , Oracle RDF Spatial 8 and the O WLIM seman tic rep ository 9 . The most p opular op en-source solution is Op en Sesame 10 . The primary in terface to a triple-store is SP ARQL [7]. SP ARQL is analogous to the relational database query language SQL. How ever, SP ARQL is p erhaps more similar to the query mo del employ ed by logic languages such as Prolog. The example query SELECT ?x WHERE { ?x . } returns all resources that work with lanl:jhw . The v ariable ?x is a binding v ariable that must hold true for the duration for the query . A more complicated example is SELECT ?x ?y WHERE { ?x ?y . ?x . ?y . ?y . ?x . } The ab o ve query returns all collab orators such that one collab orator works for the Los Alamos National Laboratory (LANL) and the other collab ora- tor works for the New England Complex Systems Institute (NECSI). An example return would b e ------------------------------- | ?x | ?y | ------------------------------- | lanl:marko | necsi:carlos | | lanl:jhw | necsi:carlos | | lanl:jbollen | necsi:carlos | ------------------------------- The previous query w ould require a complex joining of tables in the relational database mo del to yield the same information. Unlik e the rela- tional database index, the triple-store index is optimized for such seman- tic net work queries (i.e. m ulti-relational queries). The triple-store a useful to ol for storing, querying, and manipulating an RDF netw ork. 5 A Seman tic Net w ork Programming Lan- guage and an RDF Virtual Mac hine Neno/Fhat is a semantic netw ork programming language and RDF virtual mac hine (R VM) specification [8]. Neno is an ob ject-oriented language similar to C++ and Jav a. How ever, instead of Neno co de compiling down to mac hine co de or Jav a byte-code, Neno compiles to Fhat triple-code. An example Neno class is 7 AllegroGraph a v ailable at: http://www.franz.com/products/allegrograph/ 8 Oracle RDF Spatial a vailable at: http://www.oracle.com/technology/tec h/semantic technologies/ 9 OWLIM av ailable at: http://www.on totext.com/owlim/ 10 Open Sesame av ailable at: http://www.openrdf.org/ 6 owl:Thing lanl:Human { lanl:Institution lanl:worksFor[0..1]; xsd:nil lanl:quit(lanl:Institution x) { this.worksFor =- x; } } The ab o ve co de defines the class lanl:Human . An y instance of lanl:Human can hav e either 0 or 1 lanl:worksFor relationships (i.e. owl:maxCardinality of 1). F urthermore, when the metho d lanl:quit is executed, it will de- stro y an y lanl:worksFor triple from that lanl:Human instance to the pro vided lanl:Institution x . Fhat is a virtual machine enco ded in an RDF netw ork and pro cesses Fhat triple-co de. This means that a Fhat’s program coun ter, op erand stac k, v ariable frames, etc., are RDF sub-netw oks. Figure 3 denotes a Fhat pro cessor ( A ) pro cessing Neno triple-co de ( B ) and other RDF data ( C ). T riple-Store A B C Figure 3: The Fhat R VM and Neno triple-co de commingle with other RDF data. With Neno it is possible to represen t b oth the system model and its algorithmic processes in a single RDF net work. F urthermore with Fhat, it is p ossible to include the virtual mac hine that executes those algorithms in the same substrate. Given that the Semantic W eb is a distributed data structure, where sub-netw orks of the larger Seman tic W eb RDF netw ork exist in differen t triple-stores or RDF documents around the world, it is p ossible to leverage Neno/Fhat to allow for distributed computing across these v arious data sets. If a particular mo del exists at domain X and a researc her lo cated at domain Y needs to utilize that mo del for a compu- tation, it is not necessary for the researcher at domain Y to do wnload the data set from X . Instead, a Fhat processor and associated Neno code can mov e to domain X to utilize the data and return with results. In Neno/Fhat, the data doesn’t mov e to the process, the pro cess mo ves to the data. 7 6 Conclusion This article presen ted a review of the standards and tec hnologies associ- ated with the Semantic W eb that can b e used for complex systems mo d- eling. The W orld Wide W eb provides a common, standardized substrate whereb y researc hers can easily publish and distribute documents (e.g. w eb pages, sc holarly articles, etc.). Now with the Semantic W eb, researchers can easily publish and distribute models and pro cesses (e.g. data sets, algorithms, computing machines, etc.). References [1] Tim Berners-Lee, , R. Fielding, Da y Soft ware, L. Masin ter, and Adobe Systems. Uniform Resource Iden tifier (URI): Generic Syntax, Jan uary 2005. [2] Tim Berners-Lee, James A. Hendler, and Ora Lassila. The Semantic Web. Scientific A meric an , pages 34–43, May 2001. [3] Paul V. Biron and Ashok Malhotra. XML schema part 2: Datatypes second edition. T echnical rep ort, W orld Wide W eb Consortium, 2004. [4] Dan Brickley and R.V. Guha. RDF v o cabulary description language 1.0: RDF schema. T echnical rep ort, W orld Wide W eb Consortium, 2004. [5] F rank Manola and Eric Miller. RDF primer: W3C recommendation, F ebruary 2004. [6] Deb orah L. McGuinness and F rank v an Harmelen. OWL web ontology language ov erview, F ebruary 2004. [7] Eric Prud’hommeaux and Andy Seab orne. SP ARQL query language for RDF. T ec hnical rep ort, W orld Wide W eb Consortium, October 2004. [8] Marko A. Ro driguez. General-purp ose computing on a seman tic net- w ork substrate. T ec hnical Rep ort LA-UR-07-2885, Los Alamos Na- tional Lab oratory , 2007. [9] W3C/IETF. URIs, URLs, and URNs: Clarifications and recommen- dations 1.0, September 2001. 8

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment