MeLinDa: an interlinking framework for the web of data

The web of data consists of data published on the web in such a way that they can be interpreted and connected together. It is thus critical to establish links between these data, both for the web of data and for the semantic web that it contributes …

Authors: Franc{c}ois Scharffe (LIRMM), Jer^ome Euzenat (INRIA Grenoble Rh^one-Alpes / LIG Laboratoire dInformatique de Grenoble)

MeLinDa: an interlinking framework for the web of data
apport   de recherche ISSN 0249-6399 ISRN INRIA/RR--7691--FR+ENG Knowle dge and Data Representation and Management INSTITUT N A TION AL DE RECHERCHE EN INFORMA TIQUE ET EN A UTOMA TIQUE MeLinDa: an interlinking framew ork f or the we b of data François Scharf fe — Jérôme Euzenat N° 7691 July 2011 Centre de recherche INRIA Grenob le – Rhône- Alpes 655, av enue de l’Europe, 38334 Montbon not Saint Ismier Téléphone : +33 4 76 61 52 00 — Téléco pie +33 4 76 61 52 52 MeLinDa: an interlink ing fram ework f or t he web of data François Scharf fe ∗ , Jérôme Euzenat † Theme : Knowledge and Data Representation and Manage ment Perception, Cognition, Interaction Équipes-Pro jets exmo Rapport de recherche n° 7691 — July 2011 — 21 pages Abstract: Th e web of data co nsists of da ta published on the web in such a way that th ey can be interpreted and connected together . It is thus critical to establish links betwe en th ese data, both for the web o f data and for th e semantic web that it co ntributes to feed. W e consider here the various technique s de veloped for that purp ose and analyze their co mmon alities and differences. W e propo se a general framew ork and show ho w the diverse technique s fit in the frame work. Fro m this fr amew ork we consider the relation between data interlinkin g and on tology match ing. Although, th ey can be considered similar at a certain level ( they bo th relate formal entities), they ser ve d ifferent pu rposes, but w ould find a mutual benefit at collabora ting. W e thus present a sch eme under which it is possible for data linking tools to take advantage of ontolog y alignments. Key-words: Semantic web, data interlinkin g, instance matching , ontology alignment, web of data This work as been parti ally published as [ Scharf fe and Euzenat, 2010 ] . ∗ LIRMM, Montpel lier , France Francois. Scharffe@lirm m.fr ; part of this work was achie ved when this author was at INRIA Grenoble Rhône-Al pes. † INRIA & LIG, Montbonno t, France J erome.Euzena t@inria.fr MeLinDa: un cadr e pour le liage des données du web Résumé : Le web des don nées consiste à publier des donn ées sur le web de telle sorte qu’e lles puis- sent être interprétées et connecttées entre elles. Il est donc vital d’établir les liens entre ces données à la fois p our le web des donn ées et pour le web sémantique qu’il co ntribue à n ourrir . No us c onsidéro ns les d iv erses technique s pro posées à cette fin et analyso ns leurs similarités et différence s. Nous pr o- posons un cadre général dans lequel s’inscrivent les différentes tech niques utilisées et nous montrons comment elles s’y insèrent. Ce cadre p ermet de con sidérer les relations entre le liage de don nées et l’alignemen t d’ontolo gies: b ien que ces a ctivités puissent être co nsidérées comm e similaires (elles trouvent les r elations entre en tités), elles n ’ont pas le même but mais bénéficiera ient à collabo rer . Nous pro posons u ne architectur e permettan t aux o utils de liage de tirer parti des align ements entre ontolog ies. Mots-clés : W eb sémantique, liage de données, alignement d’ontolog ies, web des données an interlinking framework for the web of data 1 1 Introd uction The we b of data is the network resultin g from publishing structured data sources in RDF an d inter- linking these data sources with explicit lin ks. A large quantity of structured data is being published particularly through the Linking Open Data pr oject 1 . W eb data sets are exp ressed ac cording to one or more vocabularies or ontologies, which ran ge from simple d atabase schema exposur e to full-fledged ontolog ies. The web o f data r equires to interlink the various p ublished data source s. Giv en the large amou nt of published data, it is necessary to provide means to automatically link those data. Many tools were recently pro posed in order to s olve this problem, each having its own characteristics (see Section 4). In many cases, d ata sets co ntaining similar resou rces are pu blished using different ontolo gies. Hence, d ata inter linking tools need to reco ncile these o ntolog ies before finding the link s b etween entities. This could be done automatically , but more often this is don e manually and built in the link specifications. Th is has two drawbacks: (a) th is prevents to reuse the work made in onto logy m atching for reconciling ontologies, an d (b) the information about reconciling the ontologies is mixed with the informa tion about how to identify entities. Hence, the goal of th is work is to an alyse existing interlinking to ols an d to determin e (1) how they fit in the same framework, (2 ) if it is possible to d efine a languag e for specifying the link ing technique s to be used, and (3) h ow is data interlinkin g related to o ntolog y matchin g. This repor t contributions are as follows: – A comp rehensive survey of existing data interlinking tools, – A charac terization of task/problem cate gories for web data set interlinking , – A pro posal for improving d ata interlinking tools with ontolog y alignmen ts. For th at purp ose, a fter briefly in troducin g the challenges o f data interlinking and ontology match - ing (Section 2), we provide a g eneral fr amework for data in terlinking in wh ich all these tools can be included (Section 3). Fro m th is a nalysis, we revie w six data inter linking tools and the way they are built (Sectio n 4). This framework clearly separates the data inter linking a nd ontology matching activities and we show h ow these can collabora te throu gh three different langu ages for links, da ta linking specification s and o ntolog y alignmen ts (Section 5). W e provid e examples of an expressive alignment lang uage (Section 6) an d a linking spe cification languag e (Sec tion 7). Finally , we show how these two languag es can be adapted for coop erating (Section 8). 2 W eb of data, data interlink ing, and ontology alignment W e br iefly intr oduce linked data and the data in terlinking problem . W e provide examples of this problem an d why it would requ ire specific linking too ls. W e then pr esent why these to ols could take advantage of ontology matching and alignments. 2.1 Linke d data The web of data is b ased on th e following fou r pr inciples [ Berners-Lee, 2009; Heath and Bizer , 2011 ] : 1. Resources are identified by URIs. 2. URIs are derefe renceable. 3. When a URI is derefere nced, a description of the identified resource should b e r eturned , ideally adapted throug h conten t negotiation. 4. Published web data sets must contain links to other web data sets. 1 http://esw.w3. org/topic/SweoI G/TaskForces/C ommunityProjects/LinkingOpenData RR n° 7691 2 MeLinDa As long as they follow these rules, linked data can be pu blished in various ways (RDF d ata sets, SP ARQL endpo ints, XH TML+RDFa page s [ Adida et al. , 2008 ] , databases exposed throug h HTTP [ Bizer , 2003; Sahoo et al. , 2009 ] ). W eb data sets can also be constructed collabo rativ ely , t hroug h the use of specialized tools [ Völkel et al. , 2006 ] . 2.2 The data interlinking pr oblem and linksets A main prob lem on the web of d ata is to create links between entities of d ifferent data sets. Most often, this consists of id entifying the same entity ac ross different data sets and pub lishing a lin k between them as a owl:sameAs stateme nt (shortened as sa meAs h ereafter). W e call this task data interlinking and summarize it in Figure 1. URI1 URI2 Data interlink ing owl:sameAs Figure 1: T he data interlinking prob lem. Once identified, the link s discovered between tw o data sets must also be pu blished in order to be reused. Th e V oiD vocabulary [ Alexander et al. , 2009 ] allows for describing linksets as spec ial data sets contain ing link s b etween resources o f two g iv en da ta sets. A link set is represented as an RDF named grap h described using V oiD annotation s, as sho wn in the RDF/N3 code below: { a void:Linkset ; void:target ; void:target ; } { owl:sameAs . } Once linksets are constructed, two approache s ar e p ropo sed to retrie ve eq uiv alences between resources: it is possible to assign to each real world e ntity a g lobal ide ntifier that will then be related to e very URIs describing th is entity . This is the app roach taken in the OKKAM pr oject [ Bouquet et al. , 2008 ] that propo ses the usag e of Entity Name Ser vers taking the role o f re source name repositories. Th e oth er ap proach uses equivalence lists maintained with interlin ked resource s across data sets. There is thu s no glo bal identifier in this approac h but equ iv alence lin ks can be followed using a thir d-party web ser vice, e .g., http://sam eas.org , o r a bilatteral proto col [ V olz et al. , 2009 ] . The data interlinking task can b e achieved manu ally o r with the help o f da ta interlinking tools. These tools take as in put two data sets and ultimately pr ovide a linkset. I n addition, they use what we call a linkin g spec ification , i.e., a “ script” specifying how and /or what to link. I ndeed, giv en data set sizes, the sear ch space for r esources interlink ing can reach sev eral billion resource s 2 . It is thus n ecessary to use heuristics g iving hin ts to the interlink ing system about w here to look for the correspo nding resources in the two data sets. These linkin g specifications can be specific to a pair of 2 4.2 billion RDF tripl es related by 142 million links: source W ikiped ia, May 2009. INRIA an interlinking framework for the web of data 3 data sets and can be reused for regeneratin g linksets (we provid e an example of such a specification in the Silk langu age in S ection 7). 2.3 Interlinking data sets Mining for similar resou rces in two web d ata sets raises many pro blems. Ea ch data set having its own namespace, resour ces in different data sets are g iv en d ifferent URIs. Also, altho ugh naming conv entions exist [ Sauermann and Cyganiak, 2008 ] , there is n o f ormal no r stand ard way of nam ing resources. For example, if we take the URI for the famous mu sician Johann Seb astian Bach in various web data sets we ob tain different results tho ugh they all repr esent the same real world object (T able 1). Dataset URI MusicBrainz http://m usicbrainz.o rg/artist/24f1766e-9635-4d58-a4d4-9413f9f98a4c LastFM http://www .last.f m/music/Johan n+Sebastian+Bach DBpedia http://dbpe dia.org/resou rce/Johann_Sebastian_Bach OpenCyc http://sw .op encyc.org/concept/Mx4rwJw6npwpEbGdrcN5Y29ycA T able 1: V arying URIs across dif ferent data sets. As this exam ple d emonstrates, URIs are d ifferent acro ss d ata sets, both becau se of their n ames- paces and because of their fr agments. Fragme nts are generated a ccording to two strategies: an internal ID as for M usicBrainz and OpenCyc, or the concatenatio n of some o f the resour ce properties, as for LastFM an d DBpedia. Whe n the fir st strategy is used, an inter linking system migh t not be ab le to find correspo ndences between tw o resources by looking at URIs only . Fortunately , derefer encing URIs can be used for retrieving more infor mation ab out entities: pr op- erty values and related resour ces can be observed. Bu t for the same real-world entity , the same prop- erty can take different values, mak ing the interlinking proc ess more d ifficult. This can be because of varying value appr oximatio ns across data sets, because of d ifferent units of m easure, becau se of mistakes in the da ta sets, o r because of loo se ontological specifications. For instance, the pro perty foaf:name does n ot spec ify in wh at fo rmat shou ld the name b e g iv en. “J.S. Bach ”, “Bach , J.S. ” o r “Johann Sebastian Bach” ar e possible values fo r this property . Hen ce, data in terlinking tools h av e to compar e prop erty values in or der to d ecide if two entities are the same, and mu st be linked, or not. For that pur pose, tools use similarity measure s based on th e type of values, e.g., strin g, num- bers, dates, and ag gregate the results of these m easures. This acti vity is reminiscent of recor d linkage in database which has been given considerable attention [ Fellegi and S unter, 1 969; Winkler , 2006; Elmagarm id et al. , 200 7 ] . Th e too ls studied in S ection 4 r euse many of th e re cord lin kage tech niques. Another p roblem is caused by th e usag e of hetero geneou s on tologies for describing d ata sets. In this case, a same resour ce is typed accord ing to different classes and de scribed with d ifferent predicates belong ing to dif ferent o ntolog ies. For example, a name in a data set can be attributed u sing the foaf:na me data pro perty fr om the FOAF onto logy while it is attributed using the vca rd:N object proper ty from the VCard onto logy in another data set. Hence, f or th e interlinkin g techniqu es to work, it is nec essary that the data sets use the same ontolog y or that data inter linking tools are a ware of the corresponde nces b etween ontolo gies. 2.4 Ontology matching and alignment Ontology ma tching allows for finding correspo ndence s between onto logy en tities [ Euzenat and Shvaiko, 2 007 ] . The result o f th is p rocess is called an onto logy alig nment . Once the ontologies m atched, the align - RR n° 7691 4 MeLinDa ment can be sto red and retrieved whe n an application n eeds to use data described ac cording to another ontolog y [ Euzenat et al. , 2007 a ] . Matching ontolog ies requires to overcome the mismatch es originating from the different c oncep- tualizations of the domains descr ibed by o ntolog ies [ V isser et al. , 1997; Klein, 2001 ] . These m is- matches may be of different natu re: term inologica l mismatches con cern d ifferences of naming such as the usage of syno nym terms fo r concept labeling; conceptual mismatch es concern different con cep- tualizations of the dom ain such as structur ing alon g different pro perties; structural mismatches con- cern heterogen eous structures, like different g ranular ities in th e class hier archies. Ontology match- ing is similar to d atabase schema match ing [ Rahm and Bernstein, 2001 ] . Specific works on ontol- ogy matching wer e propo sed in th e last ten years [ Noy and Musen, 2000 ] that now reach maturity [ Euzenat and Shvaiko, 2007 ] . It is not the purpo se of this paper to describe any particular technique . While different URI con structions and variations of prope rty values c an find automatic solutions, the pr oblem of h aving h eterogen eous ontolog ies is in most inter linking too ls solved b y manua lly specifying the corr esponde nces (see T able 2, Section 4). This consider ably complexifies the inte r- linking process. Ontolog y matchin g techniq ues can b e used to facilitate the interlink ing task and ontolog y alignme nts reused in linking specifications. The g oal o f th is p aper is to in vestigate the relation ships between data interlinking and ontolo gy matching. In particu lar , we want to under stand if the se two activities should be merged into a single activity and share the same f ormats or if there are good reasons to keep th em separated. In the second perspective, we also want to estab lish how they can benefit f rom one another . For that purpose, we analyzed av ailable systems for data interlinking. 3 A framework f o r data interlin king W e provide, in this section, a gen eral framew ork encompassing the various approaches used to inter- link resources on the web of data. In the m ost gener al case illustrated in Figu re 1, two web da ta sets are in terlinked using a metho d for com paring their resources. W e do n ot specify at this stag e if th e method shou ld b e automa tic or manual. Neither do we specify if the two d ata sets are de scribed usin g a c ommon on tology or if th e ontolog ies describing their resources dif fer . This is the go al of the follo wing subsectio ns to consider this. W e first consider each case that may happen whe n interlink ing data and de scribe them abstractly and throu gh an example. In the end, we unify all this cases in a commo n frame work. 3.1 Manual interlinking In the first case, illustrated in Figure 2, resour ces are manually interlinked. !" #$% &'()% * ++,$ -- '.% /0 12& /34 5! 26 - &2 +/%+- 7 8 9 :; << (=>< ?@ =8A @ B =&8A 8 = >8 :? 9>9 >B &80 * ++,$ -- A 1 , ( A /&5!2 6 - 2( %!. 20 (- C!* &3 3DE (1 &%+/&3DF&0 * G &3 . &#H! 1 %( 2I&+/!3 Figure 2: Example of manua lly linked resources. INRIA an interlinking framework for the web of data 5 Manually lin king resources can b e perform ed u sing co llaborative tools in the case of large data sets. 3.2 URI corr esponden ce. In some cases, illustrated in Figur e 3, resources can be trivially linked using a simple transfor mation of their URIs. URI1 URI2 URI transform ation owl:sameAs Figure 3: Data interlinking through URI transforma tion. A set of rules can be defined t o identify equi valent r esources from their iden tifier . For examp le, in the data set LastFM 3 , the URI representing an artist is b uilt on the pattern “First_name+Last_na me”. Person URIs in DBpedia 4 are built arou nd the patter n “FirstNam e_LastName”. A trivial alg orithm can be d ev eloped to find eq uiv alent artists based on their URIs. Th is is illustrated in Figure 4 f or the composer J.S. Bach. ! ""#$ %% & ' # ( & )*+,- . % -( /,0 -1 (% 2,! *3 345 (' */")*346*1 ! ! ""#$ %% 7 77 +8*/ "9 : +9 -%:0 /)1 % 2,! *3 3; 5 (' */")*3 ; 6*1 ! ,7 8$/ *:(?@*8).3 :(3 " Figure 4: Example of resourc e linking using the cor respond ence betwe en URIs. 3.3 Datasets sharing the same ontologies Further th an URIs, it may be nec essary to co nsider the ontolog ies in order to identify entities. In a first case, illustrated in Figure 5, the two data sets to interlink are described by th e same ontolog y . The role of the interlinkin g system is to a nalyze resources of the same typ e in o rder to d etect the equ iv alent ones. T o do this, th e system com pares resource prop erties with a similarity measure. Systems in this category take as inpu t the p roper ties to compar e, the type of co mparison algorith m to use for each proper ty , a nd the method to a ggregate the similarity measures o f the various p roperties in o rder to construct a measure between two resources. For example, J amendo 5 and MusicBrainz 6 , two data sets co ntaining musicological data, are b oth described acco rding to a comm on music ontology [ Raimond et al. , 2007 ] . The artist J.S. Bach can 3 http://last.fm 4 http://dbpedia .org 5 http://www.jam endo.com 6 http://www.mus icbrainz.org RR n° 7691 6 MeLinDa O1 URI1 URI2 Resource matching of data sets described by the same ontology owl:sameAs Figure 5: Interlink ing two data sets described according to the same ontology . be id entified in bo th data sets by observ ing the first n ame and la st name p roper ties of th e class Mu- sicArtist . It is not p ossible in this case to ide ntify the equivalence of r esources based on their URIs. This example is illustrated in Figure 6. ! "#$ ! "#% & ' ()* +,-./) '0 1 (*')* 23)* & ' ()* 23)* 4,5 36 67 8 9: 3 )*'36 ;30 5 4936 7 8 <: 3 )*'96 ;30 5 "9) ,/ (09=+3*0 5'6> =32 > ,('*5 +? @ 3*3) 9*) =@ 9)0 (':9 @ =300 ,(@ '6> = *,=3 =0,+ + ,6 = ,6 *,2,> A *AB9 *AB9 ! "#$ %& ' () *& + , -'&./ Figure 6: E xample of linking resources describe d accord ing to the same ontology . 3.4 Datasets described with heter ogeneous ontologies Datasets can be describe d by different ontolog ies. This c ase is illustrated in Figur e 7. In order to know which types o f entities have to be linked together, the system needs to k now th e correspo ndences between these types of entities. Then it can work similarly as if there were a single ontology . W e repr esent this c ase in Figure 7 b y introd ucing the c orrespon dences between on tology classes as an align ment. This alignment is presented as im plicit because it does not exist as such, but it is mixed with the linking specification or the data interlinking system. Consider two data sets, one described using FO AF , the other using VCard. The linkin g s pecifica- tion will indicate to th e tool to compare entities of type fo af:Person an d entities o f ty pe vcard :VC , and that when co mparing resourc es of these types, the pro perties foaf:givenname should be co m- pared to vcard:fn , as well as the pro perty foaf:familyn ame co mpared to the pro perty vcard:ln . This is an implicit alignmen t containing tw o correspo ndences. INRIA an interlinking framework for the web of data 7 O1 O2 URI1 URI2 Implicit alignmen t Resource matching of data sets described by different ontologies owl:sameAs Figure 7: T wo data sets interlinked using an implicit alignm ent. For example, Op enCyc 7 represents the artist J.S. Bach using a different ontolo gy than the one used to d escribe MusicBrainz . The properties “firstn ame” and “lastname” co rrespon d to a prope rty “EnglishID” in wh ich both names ar e co ncatenated. The class Mu sicArtist in th e Mu sic Onto logy correspo nds to a class Classical Mu sic Composer in Op enCyc. An align ment between classes and proper ties needs to be spe cified in or der to find an equivalence b etween the two resou rces. This example is illustrated in Figure 8. ! "#$ ! "#% &'()*+ ,- . /0,+0 1 ,2344 5& 3 4 5&3 6375+0 ,54 8 5- 9: :;'9544 :;354<6=75+0, 34: :85- 9: 0>?3 0>?3 !" #$%&' () *+ ' , -.+$/ @A5++ ,- 5AB)*+,- BC3/ D '/&3/ E 4 1 A, +9 B #F Figure 8: Example of two data sets described with heterogen eous ontolog ies. 3.5 Data interlinking with alignments Another a pproac h, illustrated in Figure 9, takes advantage of an alre ady existing explicit alignmen t between the two ontolog ies used by the data sets. An addition al p ossibility , not fo und in existing systems, would be for the da ta link ing system to first m atch th e two o ntolog ies bef ore using the resultin g alignme nt for sup porting d ata inter linking. In such a system, ontolo gy matching and data inter linking would be merged. Figure 10 unifies all these processes in a single d escription. This f ramework leads to clarify interactions between data interlinkin g and ontology matching. The next section discusses d ifferent systems and their position with respect to the p roposed frame - work. 7 sw.opencyc.org RR n° 7691 8 MeLinDa O1 O2 URI1 URI2 Explicit alignment Resource matching of data sets described by different ontologies owl:sameAs Figure 9: T wo data sets matched using an explicit alignment. O1 O2 URI1 URI2 Ontology matching Alignment Data interlink ing owl:sameAs Figure 10: General framework for data interlinkin g in volving ontology matching. 4 Data interlin king to ol analysis The work presented in this section is the re sult of the MeLinDa experiment co nducted jointly with the linked o pen data mailing list. W e asked in terlinking tool dev elopers to send us the linking specifi- cations their tools take as inpu t. W e then co mpared these specification s and ev aluated the possibility to publish them in a common language 8 . Six systems took part in the experiment. W e are aw are of at least two other systems not analyzed in this study [ Saïs et al. , 2008; Hogan et al. , 2007 ] . W e p resent below different cr iteria along which the tools can be compared , the n we briefly de- scribe the specifics of each tool and provide comparison of them along the criteria (T able 2). 4.1 Analysis criteria For e ach analyzed tool, we tried to answer sev eral q uestions reprod uced below . W e will then describe and categorize each tool according to these questions. Degree of automation – Is the tool comp letely automatic (a black box)? – Does the tool need to be p arametrized by the user? What k ind of p arameters (data match ing technique s, ontology alignment)? Used matching techniques 8 http://melinda .inrialpes.fr . MeLinDa stands for Meta-Linking Data. INRIA an interlinking framework for the web of data 9 – String matchin g? – Extern al functions (v alues con version, d ata transformatio ns)? – Similarity prop agation? – Other techniq ues? Access – How does the tool access data? (SP ARQL endpoint, RDF Dump, URL) Ontologies – Does the tool take into account ontolog ies ass ociated to the data sets? – Does the tool allow to interlink data sets described according to dif ferent ontologies? – If the on tologies dif fer , does the tool perfo rm ontology matching? Output – What does the tool pro duce as output ( sameAs links, V oiD linkset, other type of links)? – Does the tool pro pose to merge the two input data sets? Domain Is the tool specific fo r a gi ven domain? Post-processing Does the too l perform any p ost-proc essing oper ations (consistency chec king an d inconsistency resolution)? 4.2 T ools W e considered the 6 following tools. T able 2 summarizes the analysis. 4.2.1 RKB-CRS The co-r eference resolu tion system ( CRS) o f th e RKB knowledge base [ Jaffri et al. , 2008 ] is built around URI equiv alence lists. T hese lists are built using a Java prog ram working o n th e specific domain of univ ersities a nd conference s. A new progr am n eeds to be written for each pair of data sets to integrate. Each progr am consists o f th e selection of the r esources to co mpare, and the ir co mparison using string similarity on the resource prop erty v alues. 4.2.2 LD-Mapper LD-Mappe r [ Raimond et al. , 2008 ] is a data set in tegration tool working on th e mu sic do main. This tool is b ased on a similar ity aggregatio n algorithm taking into acco unt the similarity of a resou rce’ s neighbo rs in the graph describing it. It requires l ittle user configuration b ut only works with data sets described with the Music Ontology [ Raimond et al. , 2007 ] . LD-Mapper is implemen ted in Prolog. 4.2.3 ODD-linker ODD-Linker [ Hassanzadeh et al. , 2009 ] is an inter linking tool implem ented on top o f a tool m ining for equivalent records in relational datab ases. ODD-Linker uses SQL que ries for identifyin g an d com- paring r esources. The tool tran slates link spe cifications expr essed in the LinQL dedicated langua ge originally developed f or duplicate r ecords detection in relational databases. I ts usage in the co ntext of linked data is thus limited to relational databases e xposed as linked data. LinQL is n onetheless an e x- pressiv e fo rmalism fo r link specificatio ns. Th e lang uage suppo rts many string match ing algo rithms, hyponym s and synonyms, conditions on attrib ute v alues. RR n° 7691 10 MeLinDa RKB CRS LD-Mapper ODD RDF-AI Silk Knofuss Ontologies multiple mu ltiple single single single multiple Automation semi- automatic semi- semi- semi- semi- automatic automatic automatic automatic automatic User input progr am none link spec. data set structure links s pec. fusion onto. query align ment method alignment method Input format Java prolog LinQL XML Silk-LSL (XML) O WL Matching string string, string string , string string, technique s similarity W ordnet adaptive propag ation learning Onto. alignment no no no no no yes, in inpu t Output owl:sameAs o wl:sameAs linkset alignment format alignment format alignment format linkset merged data s et linkset merged d ata set Data access API local copy ODBC local copy SP ARQL local copy Domain p ublication s Music Ontology independen t indepen dent independent indepen dent Post-processing no no no no no in consistency resolution T able 2: Comparison of data linking tools. INRIA an interlinking framework for the web of data 11 4.2.4 RDF-AI RDF-AI [ Scharffe et al. , 2009 ] is a n ar chitecture for data set matchin g and fusion. It generates an alignment that can be later used either to g enerate a link set, o r to m erge two d ata sets. The inter linking parameters o f RDF-AI are given in a set of XML files corr espondin g to the different steps of the process. The data set structure and the resources to m atch are described in two files. This description correspo nds to a sm all onto logy containin g only resou rces of interest and the prop erties to u se in the matching p rocess. A pre-pro cessing file describes oper ations to perform on resources befor e matching. Translation of properties and n ame reo rderin g are perf ormed befo re loo king fo r links. A m atching config uration file de scribes which techniqu es should be used fo r whic h resource s. A threshold for genera ting the linkset from th e alignment can be specified. Additio nally , when d ata sets need to be merged, a configur ation file describes the fusion method to use. Th e prototype works with a local copy of the data sets and is implemented in Ja v a. 4.2.5 Silk Silk [ Bizer et al. , 2009 ] is an interlinking tool parametrized by a link specification language: th e Silk Link Specification Languag e (Silk LSL, see § 7). The u ser specifies th e type of re sources to link and the comp arison techniq ues to u se. Datasets are re ferenced by giving the URI of th e SP AR QL endpo int from which they are accessible. A n amed grap h can be specified in o rder to link only resources belon ging to this gr aph. Resou rces to be linked are specified using their typ e, or the RDF path to access them. Silk u ses m any string comp arison techniqu es, numerica l and date similarity measures, concep t d istances in a taxonom y , and set similarities. A c ondition allows f or specifyin g the ma tching algorith m u sed to match resources. Matchin g alg orithms can be combin ed using a set of operators (MAX, MIN, A VG) and litera ls can be transform ed before the comp arison by specifying a transformatio n fun ction, concaten ating or splitting resource s. Regular expr essions can be be used to preprocess r esources. Silk takes as input two web data sets accessi ble behind a SP ARQL endpoint. When resources are matched with a confidence over a gi ven threshold, the tool outputs sameAs lin ks or any other RDF pred icate specified by th e user . The first version of Silk was im plemented in Pytho n; version 2 is a ne w implementation in Scala. 4.2.6 Knofuss The Knossos architectu re [ Nikolov et al. , 2008 ] aims at providing suppor t f or d ata set fusion. A specificity o f Kno fuss is the possibility to use existing ontolog y alig nments. Th e resou rce comp arison process is driven by a ded icated ontolog y for each data set spec ifying r esources to c ompare , as well as the co mparison techniqu es to use. Ea ch o ntolog y gives, for each type o f resour ce to be m atched, an application context defined a s a SP ARQL quer y for this typ e of resource. An object context model is also defined to specify pro perties that will be u sed to m atch these reso urce typ es. Corr espond ing application contexts ar e given the same I D in th e two onto logies and one application context indic ates which similarity me tric should be used for co mparing them. When the two d ata sets are described using different ontolo gies, an o ntology alig nment ca n b e spe cified. This alig nment is given in the ontolog y align ment format [ David et al. , 2011 ] . Kn ofuss allows for expo rting link s b etween data sets, but was o riginally designed to merge equ iv alent resources. It inclu des a consistency resolution module which ensure s th at the data sets resu lting f rom the fusion of th e two d ata sets is consistent with respe ct to the o ntologies. The param eters of the fu sion operation are also given in the inp ut ontolog ies. Kno fuss w orks with local copies of the data sets and is implemen ted in Jav a. An analysis of these tools acco rding to the criteria of Section 4.1 is summ arized in T able 2. Obviously ther e is a lot o f variation b etween the se tools in spite of their commo n goal. Even if they RR n° 7691 12 MeLinDa are very diverse, each o f these data in terlinking tools fit in the pro posed framework as shown on T able 3. Case T ool Manual link specification (§3.1) URI correspo ndence (§3.2) RKB-CRS Common ontolog y (§3.3) LD-Mapper , ODD-Linker Different ontologies, implicit alignm ent (§3.4) RDF-AI, Silk Different ontologies, explicit alignement (§3.5) Knofuss T able 3: Classification of analyzed tools with regard to the framework. The goal of the next section is to conside r h ow using ontolog y alignm ents cou ld lead to more automation for the interlink ing task, as well as how linked data cou ld provide e vidence for obtaining better ontolog y alignments. 5 Matching/link ing coo peration Although ontology matchin g and d ata inter linking can b e similar at a certain level (they bo th relate formal en tities), there are imp ortant d ifferences: o ne acts at the schema level and the other at the instance le vel. In fact, onto logy match ing can take a dvantage of lin ked d ata as an external source of info rmation for ontolog y m atching, and , conversely , da ta interlinkin g c an benefit from on tology matching by using corr esponden ces to fo cus the search for potential instance le vel li nks. These differences are reflected in the types of specification in volved in these processes: – A link, e.g., a sameAs statement, tells which City in wikipedia correspon d to which P (place) in geonames, e.g., Manchester sameAs Manchester . – A lin king sp ecification tells how to find the forme r , e.g., fo r linkin g a City to a P , ev aluate how the label of the first o ne is close to the n ame of the second o ne with som e mea sure, e.g., ja roSimilarity , evaluate how the populationTotal of the first one is close to the population o f the second one with another measure, e.g., numSimilarit y , average th e tw o values and if the result is above .9, then generate the sameAs statement. – An ontolog y alignment tells which components from one ontology corresponds to w hich com- ponen ts in the other . For examp le, dbpedia:C ity is a k ind of geonames:P an d in this c on- text, label is eq uiv alent to name and populationTot al is equ iv alent to population . This r esults in two p rocess specifications – interlinkin g an d matchin g – and th eir results – linksets between data and alignm ents between ontologies. The situation is summar ized by T able 4. process result instance linking specification linkset class matcher alignment T able 4: Interlinkin g and matching processes and their results. By clearly establishin g these d ifferences, we obtain a natu ral partitionin g between d ata links, linking specifications and ontolo gy alignments and the languages for e xpressing them: The assertion expression language (e.g., RDF and V oiD) allows for re presenting equivalence be- tween resources in data sets; INRIA an interlinking framework for the web of data 13 The linking specification languag e (e. g., Silk, LinQL) allo ws for defining how to search for equiv- alence between resources; The alignment representation language (e.g ., the Alig nment form at or EDOAL) allo ws for sp eci- fying equiv alence rules between on tological entities. It would be useful to take advantage of the framework of Sectio n 3, to help to ols intero perate. This would present many advantages, in particu lar the p ossibility to shar e, d istribute and improve linking specifications, as well as reuse them or extend th em instead of computin g them again when ev er a data set is mod ified. This would also allow to compo se linking sp ecifications su ch th at it would be possible to go from one data set to anoth er wi thout going throu gh an intermediary . W e prop ose a scheme u nder which it is po ssible for data linking too ls to take on tology align- ments as a way to constrain th eir solution space. Figur e 10 provides a natural way to implemen t this collaboratio n. W e first pre sent an expressive language for o ntolog y alignm ents that can be exploited by da ta interlinking systems (Sectio n 6). and br iefly introdu ce the linking specification langu age Silk -LSL (Section 7). Then we show ho w they could fruitfu lly be combin ed for data interlinking (Section 8). 6 EDO AL: an expressiv e ontology align ment language EDO AL (Exp ressiv e Declar ativ e Ontology Alignment La nguag e) is the new name of the OMWG mapping lan guage for expressing ontolo gy alignmen t [ Euzenat et al. , 2007b ] that has been available throug h the Alignment API since version 3.1. This language is an extension of the Alignment form at [ Euzenat, 2004 ] that can be generated by m ost matchers. Its main pur pose is to offer mo re expr es- si veness in the way alignm ents are expressed . It presents th e advantage to be declarative and also to specify transfor mations lik e those needed in order to construct links between resources. A first a dvantage o f the expressi veness of EDOAL is the possibility to express corr esponde nces between n on named entities. For instance, a simple assertion such as “a pianist is a musician who plays piano”, can be expressed by (Figure 11): :dbp-mo a align:Alignmen t; align:onto1 ; align:onto2 ; align:map [ :map1 a align:Cell; align:entity1 dbp:Pianist; align:entity2 [ a edoal:Class; edoal:and mo:MusicArtist ; edoal:and [ a edoal:Prop ertyValueConstra int; edoal:property mo:instrument; edoal:value mo:Piano. ]. ]; align:relation align:equivalent; ]. This can help restricting the search space of data inter linking tools far beyond what they curren tly do (named classes). Pianist MusicAr tist instrument Piano ∃ = Figure 11: Correspond ence between non named resource s. RR n° 7691 14 MeLinDa In additio n, in EDOAL, it is possible to express that two classes are equ iv alent, an d that their instances are equivalent mo dulo a transfor mation. Th is can be u sed fo r covering, without fur ther informa tion, the URI co rrespon dence case of the framework (Section 3 .2). For instance, Figur e 12 shows an EDO AL correspon dence u sing regular e xpression transformations for identify ing musician instances between two data sets with different conventions. Classical Music P erformer rdf:id MusicAr tist rdf:id ≤ http://.../([ˆ _]*)$ 1_([ˆ.]*])$2.rd f http://.../([ˆ +]*)$2 +([ˆ/]*)$1/ Figure 12: E xpression of a resou rce equivalence represented in an expr essi ve onto logy alignmen t languag e. Of course, this can only work wh en there exists such cor respond ences, i.e., a n exact m ethod f or generating links. Most of the time, d ata inter linking systems still need to u se heu ristics to find link s between entities. This can be provided by t he simple Alignment form at, but EDO AL can do more by indicating where to look for to establish the correspo ndenc e. In particular, EDOAL allo ws for expressing contextual relations between elements. For instance, the typical example in Silk d ocumen tation is th e linkin g of DBpedia c ities an d geo name P(laces) throug h compa ring their names and populations. E xpressing this with the simple alignment: :dbp-geo a align:Alignme nt; align:onto1 ; align:onto2 ; align:map [ :map1 a align:Cell; align:entity1 dbpedia:City; align:entity2 gn:P; align:relation align:subsumedBy. ]; align:map [ :map2 a align:Cell; align:entity1 dbpedia:population Total; align:entity2 gn:population; align:relation align:equivalent. ]; align:map [ :map3 a align:Cell; align:entity1 rdfs:label; align:entity2 gn:name; align:relation align:equivalent. ]. does not express the expected m eaning b ecause, of co urse, rdfs:label is not equivalent to gn:name . One could consider expre ssing that gn:name is m ore specific th an rdfs:label . This is cor rect but still not p recise enoug h. The intended m eaning is that, in the con text o f dbpdia:City and gn :P , these two prope rties are equ iv alent. This is what EDOAL can express throu gh the schem a of Fig- ure 13 corresp onding to the follo wing alignment: :dbp-geo a align:Alignme nt; align:onto1 ; align:onto2 ; align:map [ :map1 a align:Cell; align:entity1 dbpedia:City; align:entity2 gn:P; align:relation align:subsumedBy. ]; align:map [ :map2 a align:Cell; align:entity1 [ a align:Property; edoal:and dbpedia:popula tionTotal. edoal:and [ a edoal:Prop ertyDomainRestri ction; edoal:domain dbpedia:City. ]; INRIA an interlinking framework for the web of data 15 align:entity2 [ a align:Property; edoal:and gn:population; edoal:and [ a edoal:Prop ertyDomainRestri ction; edoal:domain gn:P. ]; align:relation align:equivalent. ]; align:map [ :map2 a align:Cell; align:entity1 [ a align:Property; edoal:and rdfs:label. edoal:and [ a edoal:Prop ertyDomainRestri ction; edoal:domain dbpedia:City. ]; align:entity2 [ a align:Property; edoal:and gn:name; edoal:and [ a edoal:Prop ertyDomainRestri ction; edoal:domain gn:P. ]; align:relation align:equivalent. ]. City rdfs:label populationT otal P name population ≤ = = Figure 13: Contextual matching of of two classes and its pr operties. Even if such an align ment would provid e inf ormation to da ta interlink ing tools, this is still not su f- ficient. Of course, it tells wh ich pro perties should be eq uiv alent and thus can be used for iden tifying entities. But it does not tell how to take them i nto account. So, this alignement w ould be sufficient to link entities if the values of rdfs:label were exactly the same as those of gn:name an d the values of populationTo tal were exactly the same as those of population , but not otherwise. EDO AL p rovides mo re features f or tr ansformin g this informa tion, like we ha ve seen in Fig- ure 12. This could b e helpful but the pro blem is deeper: data interlinking is a decision p roblem rather that just a transfo rmation. It is the role of the data link ing specification to tell wh en a pa rticular dbpedia:City an d a gn :P shou ld be con sidered the sam e. This is why we p ropo se to use d ata interlinking specification s t ogether with alignm ents. 7 Silk-LSL: a linki ng s pecification language Below is the Silk-LSL [ Bizer et al. , 2009 ] specification to interlink cities in the two data sets DBpedia and Geoname s: htt p://demo_sparql_s erver1/sparql http://db pedia.org htt p://demo_sparql_s erver2/sparql RR n° 7691 16 MeLinDa http://sw s.geonames.org/ owl:sa meAs ?a rdf:type dbpedia:City ?b rdf:type gn:P This specification fulfills two roles: – It is an alignm ent: it specifies the classes in which entities to link can be fo und. Restrictions to dbpedia:Ci ty an d gn:P ar e in fact an alignment between th ese two conc epts. Similarly , the comp ared proper ties po pulationTotal and popula tion and rdf s:label and name , respectively provide the correspondence s between pro perties. – It spec ifies how to lin k entities. Ind eed, wh at Silk brings in addition is the specification of how to decide if two entities sh ould b e linked: when the average ( AVG ) of their resp ectiv e dis- tances ( Compare ) is ov er a threshold ( Threshold , there are two thresholds, one for accepting automatically the equiv alence and one for drawing the attention of a user). It c ould be po ssible to refer to an extern al alignm ent between the two under lying ontolog ies instead o f spec ifying it in the link ing specification. Th is app roach would present obviou s reu se ad- vantages when other data sets r equiring th e same alig nment, i.e., using the same ontolog ies, need to be interlinked . 8 Data interlin king using ontology alignments Apart f rom Knofuss, interlinking too ls d o n ot provide the po ssibility to use an on tology align ment. Knofuss still need s to specify quer ies on both data sets from which r esults equiv alent resour ces will be identified . Indeed , using an explicit alignmen t, provided th at it is expr essi ve enoug h, c an serve two function s: INRIA an interlinking framework for the web of data 17 1. narrowing the search space through pointing to equi valent con cepts, and 2. providin g the prop erties that can be used for identify ing concepts. There are two ways to articu late ontology alignment and linking specifications: – T ransform ing an expressiv e alignment into a lin king specification: this requires that the align- ment contains as much in formatio n as p ossible and that matchers b e able to prod uce such descriptions. This has the a dvantage tha t f rom the align ment, the specification may b e tran s- formed into different l inking specification languages. – Enablin g linking specifications to r efer explicitly to align ments and eventually to match ers: this require s e xtending specification languages for that purpose. W e consider the latter option below . Giv en th at the alignmen t is av ailable, it is po ssible to simplif y the Silk specification and r efer to the align ment, b y intro ducing three types o f info rmation: which alignments to use ( U seAlignment ) , entities of wh ich c orrespon dences must be linked ( LinkCel l ) and which matched prop erties can be compared for identifying entities ( CellP aram ). owl:sa meAs The specifics of the data interlin king task rem ain in th is specificatio n: how to co mpare values, how to aggregate the ir results and when to issue the link or not. In fact, the sym biosis between th e align ment and the linking specification can b e ren dered even more automa tic, e.g., by defining default rules for comparing v alues of a gi ven type, default rules for aggregating metrics, and default threshold ru les. However , it is also usefu l that the linkin g specifica- tion designer can k eep control on what th e interlinking tool does and, e ven if a correspond ence is not in an alignmen t, be able to define it. This approach presents se veral advantages: 1. The link specification is simplified, redu cing the manual input; 2. There is a clear separation between links, linkin g specification, and ontology alignments; 3. The same alignme nt ca n be reused for lin king any two d ata sets d escribed accord ing to the same ontologies. 9 Conclusion Interlinkin g data sets becomes an even more impo rtant problem as their nu mber quickly increase. In order to scale, the interlinkin g task has to be as automated as possible. RR n° 7691 18 MeLinDa W e ha ve studied v arious existing data interlinking tools and observed the following: – Beyond th e variations between these sy stems, it is po ssible to define a general framew ork covering the different levels of expressiv eness (rang ing from a Prolog prog ram to compositions of linking specifications). – Althoug h there is a relev ant similarity with ontolo gy align ment, an ontolo gy alignme nt lan - guage i s not eno ugh to e xpress lin king specifications, particularly because it is not it s primarily goal to identify individual entities. W e h ave thus proposed an architecture based on thre e different langu ages having each its own pr ecise purpo se: expressing links, expressing linking specifications, and e xpressing ontology alignments. This architecture can be used in order to organize a better co llaboratio n b etween ontology match- ers and data interlin king to ols. This can be achieved with only minimal extensions to existing lan- guages. In particular, we h av e illustrated the ontolog y alignm ent part w ith EDO AL, an expre ssi ve ontol- ogy alignment language that of fers the necessary concepts for b eing used in data interlinking. On the data interlinking side, we ha ve focussed on the Silk-LSL languag e which seems to be at once declar - ativ e and powerful enou gh to express a wid e ran ge of co nstraints on data inter linking. Ex tending it with the capac ity to ben efit fro m o ntolog y alignments would allow to ols u sing it to benefit f rom the wide rang e of ontology alignment techniq ues and tools. The domain of interlinking data on the web is quickly e xpandin g. New needs and new techniq ues appear . It is thus impo rtant n ot to breed innovations with a nar row languag e. Developing standard tools to shar e link specifications will g reatly improve those techniq ues. Ther e is still a lot of work to do in ord er to achieve this goal. Ackno wledgements W e thank the data interlin king tool developers who provided us with link ing specifications fo r their tools. This work was conducte d in the context of the Datalift pro ject funded by the Fren ch AN R (Agen ce Nationale de la Recherch e) under grant ANR-10-CORD-009. Refer ences [ Adida et al. , 2008 ] Ben Adida, Mark Birbeck, Shane McCarron, and Stev en Pemberto n. RDFa in XHTML: Syn tax an d pro cessing. Recommen dation, W3 C, 2 008. http://www .w3.org/TR/rdfa- syntax/. [ Alexander et al. , 2009 ] Keith Alexander, Richard Cyg aniak, Michael Hausenb las, a nd Jun Zhao . Describing linked datasets - on th e design and usage o f void, the ’ v ocabulary of interlin ked datasets’. I n Linked Da ta on the W eb W orkshop (LDO W09), W o rkshop at 18 th I nternation al W orld W ide W eb Confer ence (WWW09) , Madrid, Spain, 2009. [ Berners-Lee, 2009 ] T im Berners-Lee. Lin ked-data design issues. W3C design issue d ocumen t, June 2009. http://www .w3 .org/DesignIssues/Lin kedData.html. [ Bizer et al. , 2009 ] Christian Bizer, Julius V olz, Georgi K obilarov , and M artin Gaedke. Silk - a link discovery framework for the web of data. In 18th Internatio nal W orld W id e W eb Confer ence , Ap ril 2009. INRIA an interlinking framework for the web of data 19 [ Bizer , 2003 ] Christian Bizer . D2r m ap - a d atabase to rdf mapping langu age. In P r o c. 12th WWW confer ence poster session , 2003. [ Bouquet et al. , 2008 ] Paolo Bouquet, Heiko Stoer mer, and Barbara Bazzanella. An Entity Nam ing System for the Sema ntic W eb. In Pr oceedings of the 5th Eur opean Seman tic W eb Confer ence (ESWC2008 ) , LNCS, June 2008. [ David et al. , 2011 ] Jérôme Da vid, Jérôme Euzenat, François Scharffe, an d Cássia T rojahn dos San- tos. The Alignmen t API 4.0. S emantic web journal , 2(1):3 –10, 2011. [ Elmagarm id et al. , 2007 ] Ahmed Elmag armid, Panagio tis Ipeirotis, and V as silios V erykios. Du- plicate reco rd detectio n: A survey . IEEE T r ansactions on Knowledge and Data Engin eering , 19(1) :1–16, January 2007. [ Euzenat and Shvaiko, 2007 ] Jérôme Euzen at and Pavel Shvaiko. Ontology ma tching . Springe r- V erlag, Heidelberg (DE), 2007. [ Euzenat et al. , 2007 a ] Jérôme Euzenat, Adrian Moc an, a nd François Scha rffe. Ontology Manage- ment: S emantic W eb, Semantic W eb Services, and Business A pplication s , ch apter Ontology Align- ments:an ontolo gy management perspecti ve. Sprin ger, 200 7. [ Euzenat et al. , 2007 b ] Jérôme E uzenat, François Scharffe, and Antoine Zim mermann . D2 .2.10: Expressive alignment languag e and implemen tation. Project deliverable 2.2. 10, Kn owledge W eb NoE (FP6-507 482) , 2 007. [ Euzenat, 2004 ] Jérôme Euzenat. An API for ontolo gy align ment. In Frank van Harme len, Sheila McIlraith, an d Dimitri Plexousakis, ed itors, The Seman tic W e b - IS WC 200 4: Third Internation al Semantic W eb Confer ence,Hir osh ima, J apan, November 7-11 , 2004. P r o ceedings , volume 3298, pages 698– 712. S pringer, 2 004. [ Fellegi and Sunter , 1969 ] Ivan Fellegi and Alan Sunter . A theory fo r record linkage. Journal of the American Statistical Association , 64(3 28):11 83–1 210, 1 969. [ Hassanzadeh et al. , 2009 ] Oktie Hassanzadeh , Lipyeow Lim, Anastasios Kementsietsidis, an d Min W ang. A declar ativ e fram ew ork for semantic link discovery over relation al data. In WWW ’09: Pr o ceedings of the 18 th intern ational conference on W orld wide web , page s 1101 –110 2, New Y ork , NY , USA, 2009 . A CM. [ Heath and Bizer , 2011 ] T om Heath and Christian Bizer . Linked Da ta: Evolvin g the W eb into a Global Data Space , v olume 1 of Synth esis Lectures on the Semantic W eb: Theory and T e chnology . Morgan & C laypool, 1 edition , 2011. [ Hogan et al. , 2007 ] Aidan Hogan, An dreas Harth, an d S tefan Decker . Per forming object consolida- tion on the sem antic web data g raph. In In Pr o ceeding s o f 1st I3: I dentity , Identifiers, Iden tification W o rkshop , 200 7. [ Jaffri et al. , 2008 ] Afraz Jaffri, Hug h Glaser , a nd Ian Millard . Man aging u ri synonymity to enable consistent refe rence on the semantic web. In IRS W2008 - I dentity and Reference on the Sem antic W eb 2008 at ESWC , 2008. [ Klein, 2001 ] Michael Klein. Combinin g and relating ontologies: an analysis of problems and solu- tions. In W orkshop on Ontologies and Information Sharing , 2001. RR n° 7691 20 MeLinDa [ Nikolov et al. , 2008 ] Andriy Nikolov , V ictoria Uren, Enrico Motta, and Anne d e Roeck . Ha ndling instance corefere ncing in the kno fuss ar chitecture. In Pr oceedings of th e workshop: Iden tity an d Refer ence on the Semantic W eb at 5th Eur o pean Semantic W eb Con fer ence (ESWC 2008) , 2008. [ Noy and Musen, 2000 ] Natalya Frid man Noy and Mark A. Musen . Promp t: Algorithm and tool for automate d ontolo gy merging a nd alignment. In Pr oceedings of the Se venteenth National Con- fer ence on Artificial Inte lligence and T welfth Conference on Innovative Applica tions of Artificial Intelligence , pages 450–4 55. AAAI Press / The MIT Press, 2000. [ Rahm and Bernstein, 2001 ] Erhard Rahm an d Philip Bernstein. A survey o f appro aches to auto- matic schema matching. VLDB J ournal: V ery Lar ge Data Bases , 10(4):334 –350 , 2001. [ Raimond et al. , 2007 ] Yves Raimond, Samer Abdallah , Mark Sandler, and Frederick Giasson. The music o ntology . I n Pr oceeding s of th e Internatio nal Confer ence on Music Informatio n Retrieval , 2007. [ Raimond et al. , 2008 ] Yves Raimond , Christoph er Sutton , and Mark Sandler . Auto mtic interlink ing of music datasets on the semantic web. I n Pr o ceedings of the Linking Data On the W eb w orkshop at WWW’2008 , 2008 . [ Sahoo et al. , 2009 ] Satya Sahoo , W olfgang Halb, Sebastian Hellman n, King sley I dehen, T ed Th i- bodeau , Sören Auer, Juan Seq ueda, and Ahm et Ezza t. A survey of curren t a pproac hes for mapping relational databases to rdf. Repo rt, W3C RDB2RDF incubator grou p, 2009. [ Saïs et al. , 2008 ] Fatia Saïs, Nathalie Pernelle, and Marie-Christine Rou sset. Combining a logical and a numerical method for data recon ciliation. Journal of Data Seman tics , 12, 2008. [ Sauermann and Cyganiak, 2008 ] Leo Sauermann a nd R ichard Cyganiak. Co ol URIs for the sema n- tic web . W3C no te, W3C, March 2008. http://www .w3.org/TR/2008/NOTE-cooluris-20080331/. [ Scharffe and Euzenat, 2010 ] François Scharffe and Jérôme Eu zenat. Méthod es et ou tils pou r lier le web des données. I n Actes 17e co nfér ence AFI A-AFRIF sur r ec onnaissan ce des formes et intelligence artificielle (RFIA), Caen (FR) , pages 678–685 , 2 010. [ Scharffe et al. , 2009 ] François Schar ffe, Y anbin Liu, and Chung uang Z hou. RDF-AI: an architec- ture for RDF datasets m atching, fu sion a nd in terlink. In W orkshop on Identity and R efer ence in Knowledge Repr esentation, IJCAI 2009 , 2009. [ V isser et al. , 19 97 ] Pepjijn R. S. V isser , Dean M. Jones, T . J. M. Bench-Capon, and M. J. R. Shav e. An analysis of ontolo gical mismatch es: Hetero geneity versus intero perability . In AA AI 1997 Spring Symposium on Ontological Engineering , Stanford , USA, 199 7. [ Völkel et al. , 2006 ] Max Vö lkel, Markus Krö tzsch, Denny Vra ndecic, Heiko Haller, and Rud i Studer . Semantic wikip edia. In WWW , page s 585–594 , 2 006. [ V olz et al. , 2009 ] Julius V olz, Christian Bizer , and Martin Gaedke. W eb of data link maintenan ce protoco l. Protocol specification, Frei Univ ersität Berlin, 2009. [ W inkler , 2006 ] W illiam Winkler . Overv iew of record link age and current research directions. T e ch- nical Report 2006- 2, Statis tical Research Division. U.S. Census Bureau, 2006. INRIA an interlinking framework for the web of data 21 Contents 1 Introduction 1 2 W eb of data, data interlinking, and ontology alignment 1 2.1 Linked data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2.2 The data interlinkin g prob lem and linksets . . . . . . . . . . . . . . . . . . . . . . . 2 2.3 Interlinkin g data s ets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.4 Ontology matching and alignmen t . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3 A framework f or data interlinking 4 3.1 Manual interlinking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 3.2 URI correspo ndence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3.3 Datasets sharing the same ontolog ies . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3.4 Datasets described with heterogen eous ontologies . . . . . . . . . . . . . . . . . . . 6 3.5 Data interlinking with alignments . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 4 Da ta interlinking tool analysis 8 4.1 Analysis criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 4.2 T ools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.2.1 RKB-CRS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.2.2 LD-Mapper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.2.3 ODD-linker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.2.4 RDF-AI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4.2.5 Silk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4.2.6 Knofuss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 5 Mat ching/linking cooperation 12 6 EDOAL: an expressi ve ontology alignment languag e 13 7 Silk-LSL: a linking specification language 15 8 Da ta interlinking using ontology alignments 16 9 Co nclusion 17 RR n° 7691 Centre de recher che INRIA Grenoble – R hône-Alp es 655, av enue de l’Europ e - 38334 Montb onno t Saint- Ismier (France) Centre de recherc he INRIA Bordeaux – Sud Ouest : Domaine Uni ve rsitaire - 351, cours de la Libération - 33405 T ale nce Cede x Centre de recherc he INRIA Lille – Nord Europe : Parc Scienti fique de la Haute Borne - 40, av enue Halley - 59650 V illeneuv e d’Ascq Centre de recherc he INRIA Nancy – Grand Est : LORIA, T echnopôle de Nancy-Brab ois - Campus scientifiq ue 615, rue du Jardin Botani que - BP 101 - 54602 V ill ers-lès-Nan cy Cedex Centre de recherc he INRIA Paris – Rocque ncourt : Domaine de V oluceau - Rocquenco urt - BP 105 - 78153 L e Chesna y Cedex Centre de recherc he INRIA Rennes – Bretagne Atlantiqu e : IRISA, Campus univ ersita ire de Beaulieu - 35042 Rennes Cedex Centre de recherc he INRIA Saclay – Île-de-Fran ce : Parc Orsay Uni ver sité - ZAC des V igne s : 4, rue Jacques Monod - 91893 Orsay Cede x Centre de recherc he INRIA Sophia Antipolis – Méditerranée : 2004, route des L uciole s - BP 93 - 06902 Sophia Antipoli s Cedex Éditeur INRIA - Domaine de V oluceau - Rocquencourt, BP 105 - 78153 Le Chesnay Cede x (France) http://www .inria.fr ISSN 0249 -6399

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment