The university-industry knowledge relationship: Analyzing patents and the science base of technologies

The university-industry knowledge relationship: Analyzing patents and the scie nce base of technologies Journal of the American Society fo r Information Science & Technology (forthcoming) Loet Leydesdorff 1 University of Amsterdam, Sc ience & Technology Dynamics Amsterdam School of Comm uni cations Research (ASCoR) Kloveniersburgwal 48, 1012 CX Amsterdam, The Netherlands loet@leydesdorff.net ; http://www.leydesdorff.net Abstract Via the Internet, inform ation scientists can obt ain cost-free access to large databases in the “hidden” or “deep web.” These databases are of ten struc tured far more than the In ternet domains themselves. The patent database of the U.S. Patent and Trade Office is used in this study to examine the science base of patents in term s of the literature references in these patents. University-based patents at the global level are compared with results when using the national economy of the Netherlands as a syst em of reference. Methods for accessing the on- line databases and for the visua lization of the results a re speci fied. The conclusion is that “biotechnology” has historically generated a model for theorizing a bout university-industry relations that cannot easil y be generalized to othe r sectors and disciplines. 1 I would l ike to thank Andrea Scharn horst for com ments on a p revious dra ft of this pa per. 1. Introduction Perhaps even larger than the Internet itsel f are the resources which can be access ed and studied via the web. These resources are some times called the “hidden web,” the “invisible web” or the “deep web” (Ber gman, 2001; Sherman & Price, 2001). Unlike m ost web-based resources—which evolve and change with the development of the Internet during the years— some of the databases of the hidden web are cert ified and fixed. For example, the database of the U.S. Patent and Trade Office (USPTO) c ontains all U.S. patents since 1976 in html- format (at http://www.uspto.gov ). The data have a legal st atus since a patent can be challenged in court, and therefor e the text can no longer be cha nged after the patent has been issued. Furthermore, the examination by the pate nt exam iner is highly codified (Granstrand, 1999). The same data are offered on-line by commercial hos ts like Dialog in formats that f acilitate organization of the data and integrates it with data from other national or international (e.g., European) patent offices. The Derwent Innovation Index even offers an integration of the patent data with the Web-of-Science data of the Institute of Scientific Information. These commercial accesses, however, are relatively expensive for the purposes of academ ic research and higher education. In this study, I explore the on-line data from an inform ation theoretical perspective, tha t is, with a focus on how the knowledge base of th e patents can perhaps be revealed. This question is theoretically intere sting because patents have increasingly beco me a repository of information about how the socially organized pro duction of scientific kno wledge is interfaced with the economy (Noble, 1977). Organized kno wledge production and control (Whitley, 1984) follows a logic of development and di fferentiation different from (potentially knowledge-based) innovation processes in the economy (e.g., Mansfield, 1989). The development of indicators for the knowledge base of an economic system can be considered a priority for innovation policies and the emer ging program of innovati on studies (e.g., David & Foray, 2002; Nelson, 1993; OECD/Euros tat, 1997; Leydesdorff & Meyer, 2003; Leydesdorff & Scharnhorst, 2003). Patent data have been used extensively in economic geography, business economics, and macro-economics as indicators of the innovative ness of corporations, industries, and regions (Jaffe & Trajtenberg, 2002; Me yer, 2000; Pavitt, 1984). However, the specific interest of information scientists in how the patents relate to their know ledge base (e.g., Bhattacharya et al ., 2003; Grupp & Schmoch, 1999; Narin & Olivastr o, 1988, 1992) is not f acilitated by using the value added by commercial databases like De rwent. The so-called “n on-patent literature references” (NPLR) contain references to sc ientific journal liter ature and book chapters among other things, but this field has remained poorly organized in the commercial format. Abbreviations of journal names, fo r example, are not stand ardized. In the case of scientific r eferences, most patents provide titles between quotation marks in order to distinguish them from journal names or from the title of an edited volume. I will use this indicator as a point of access for explori ng the knowledge base of patents. Because the practice of using quotation marks is almost excl usively the case for formalized literature, 1 I hypothesize that this indicator can be used as a proxy for accessing the knowledge base of patents. 1 Sometimes newspaper articles are also included using this format. 3 Two domains will be explored for the year 2002: 1. All patents containing an address with the root “univ*” signif ying “university” am ong the assignees. 2 Since 1980, the Bayh-Dole act in the United States and similar legislation in other countries granted univers ities the right to pa tents on the basis of federal funding. This led to an important increa se in the participati on of universities in the patenting domain (Henderson et al ., 1998; Sampat et al ., 2003). Universities can be among the assignees of patents. Inve ntor names remain natural persons. 2. For the comparison I have used the domain of all U.S. patents in 2002 with a Dutch address among the assignees or the inventors. 3 These patents can be considered as relevant to the knowledge base of the Ne therlands as a national economy (Nelson, 1993). 2. Methods and materials During 2002 a total of 184,531 patents were issue d. 3,455 patents could be retrieved with the root “univ$” in the fields of addresses of i nventors or assignees. After correction for words like “universal” and “universe,” 3,291 patent s remained that had been assigned to a university. As expected, the wo rd “university” never figured among the inventor addresses. Note that a number of univer sities do not use the word “ university” in their names (e.g., 2 The precise qu ery was as foll ows: “isd/$/$/ 2002 and a n /univ$”. The $ is used as a wild card in the USPTO database, and t herefore the query looks for al l patent data that were issued in 2002 (“isd” = issue date) an d that contain the r oot “univ” in t he name of the assignee (field code: “a n”). 3 The precise qu ery was as foll ows: “isd/$/ $/2002 and (acn/nl or ic n/nl)”. The a bbreviation “ nl” is used for the Netherlands; “acn” is the fiel d code of the nam e of th e country of the assignee and “ic n” for the nam e of the country of t he invento r. 4 MIT). Nevertheless, the delineation with the word “university” provides us with a convenient domain of patents for statistical exploration. Second, patents with an origin in The Netherlands (as inventor or assignee) were downloaded as an example of a geographica lly contained set of foreign pa tent holders within the U.S. patent domain. In 2002 it happens to be the ca se that there are 1,963 patents with a Dutch assignee and equally 1,963 patents with a Du tch inventor. The combined set, however, contains 2,827 patents with a Du tch address (2,824 of these pate nts could be retrieved). More than national patents these foreign patents indi cate an investment in the global m arketplace. 4 The investments are made because of a value of the intellectual property to be protected. The two domains are institution al and geographical, respectively. W hile the university-based patents can be used as an indicator of univers ity-industry relations, th e Dutch patents can be expected to represent the internation ally oriented sectors of a national econom y. 2.1 Methodological considerations How does this difference in delineation infl uence the knowledge base of the corresponding sets of patents? This question will be pursued by analyzing the title words in the two sets and by relating these title words to th e title words of the scien tific docum ents cited in these sets. New developments in visualization software en able us to map asymm etrical matrices using 4 In 2002, the number of patents with a Du tch address among the assi gnees in the data base of the Eu ropean Patent Office i s 3,193 and t he number of patents with a Dutch address among the in ventors i s 2,667. The combined set (with an OR) is 3680. Thus, the number of patents publi shed in the USPT O database is 76.8% of the number of patents publi shed with a D utch address in t he EPO data base. Note that t hese two sets do not ha ve to be based on the same patent s. 5 large datasets. 5 Pajek, a freeware program for visualiz ation developed by m athematicians at the University of Ljubljana, 6 for example, contains a subrout ine for analyzing asymmetrical matrices both in either direct ion (Q- or R-mode structural analysis) or bimodal. Since innovations take place at interfaces, the mapping of asymm etries at interfaces in terms of variation, selection, and codificat ion can be considered a prior ity from the perspective of evolutionary economics and innovation studies (David & Foray, 2002; Leydesdorff, 2003). Both the analysis of title words and the anal ysis of the interfaces with the NPLR will be visualized using the algorithm of Kamada & Ka wai (1989) as it is available in Pajek. This algorithm represents the network as a system of springs with a relaxed lengths proportional to the edge length. Nodes are iter atively repositioned to minimize the overall “energy” of the spring system using a steepest descent procedure. The procedure is anal ogous to som e forms of non-metric multi-dimensional scaling. 7 I will compare the results of th is relational a nalysis with the results of a (positiona l) factor analys is of the same m ap (Burt, 1982; Leydesdorff, 1995). In order to keep the visualizations read able, the analys is will pragmatically be lim ited to the approximately one hundred most fre quently occurring words for each case. As a similarity measure among vectors of word di stributions in titles of patents I shall use the cosine (Salton & McGill, 1983). This measure has an advantage over the Pearson correlation (used by the factor analysis) in that the sim ilarity is insensitiv e to the number of zeros because the cosine is not based on the mean of the distribu tion (L arsen & Ingwersen, 2002; 5 Limited dat asets could previously be mapped asy mme trically using Quasi-Corres pondence Analysis (Tijssen et al ., 1987). 6 The homepage of Pajek can be found at http://vlado.fmf.uni-lj.si/pub/networks/pajek 7 A disadvantage of this model is that unconnected nodes may remain randomly positioned across th e visualization. Unconnected nodes are therefore not included in the visu alizations below. See for more details about the different al gorithms, for example, t he overv iew in the introduction to the social network image animator software package SoNIA at http://www.stanford.edu/~skyeb end . 6 Ortega Priego, 2003; cf. Ahlgren et al ., 2003; White, 2003). 8 In empirical cases, these two similarity measures lead of ten to si milar results (Leydesdorff & Zaal, 1989). 9 In the case of the asymmetrical (bi-m odal) matrices of words in the titles of patents versus words in the titles of the correspond ing literature references, the cel l values are not norm alized. 2.2. The retrieval of data from the w eb Patent data are brought on-line by the U.S. Patent and Trade Office (at http://www.uspto.gov ) and by the European Patent Office (at http://ep.es pacenet.com ). The latte r database also contains the data of the World Patent Or ganization. However, the European and world patents are not fully standardiz ed and partly in other formats, while the U.S. database is standardized, organized in hype rtext mark-up language (html), and accessible for searching by robots. 10 Furthermore, the U.S. database is of ten used in scientometric research for comparative purposes because it standardizes the presence of other nations in a single representation (Narin & Olivas tro, 1988, 1992). This database allows, among other things, for the retrieval of citation patte rns in terms of bot h the previous patents cited and the scientific (that is, non-patent) litera ture cited. Additionally, the f ollow-up in terms of ‘being cited ’ in later patents can be traced. 8 Salton’s cosin e is defined as the cosine of t he angle enclosed between two vect ors x and y as follows: Cosine( x,y ) = 11 22 2 2 11 1 1 () * ( ) nn ii ii ii nn n n ii i i ii i i xy xy xy x y == == = = = ∑∑ ∑∑ ∑ ∑ 9 The Jaccard Index differs not only with a factor two from the cosine (Hamers et al ., 1989), but leads in empirical cases often to result s which are rather differ ent from the Pearson correl ation (Leydesdorff & Zaal, 1988). Strong relations in the database (segments) are fore-grounded by the Jaccard In de x, while Salton’s cosine organizes the rela tions geometrical ly so that they can be visualized as structural p atterns of relations (Luukkonen et al ., 1993; cf . Michelet, 1988). Factor or eigenvector analysis enable s us to analyze this construct in terms of its orthogonal dimen sions (Wagner & Leydesdor ff, 1993). 10 The USPTO st ates a limit ation on bulk d ownloads of the data at http://www.uspto.gov/patft/h elp/notices.htm . 7 Because of its legal status within the Am er ican administration, the USPTO database is extremely well organized in terms of search term s and reliability. The search options are documented with help screens (Black, 2002). The results can be retrieved with screens of fifty titles consecutively. These titles are hype rlin ked with the full texts of the patents containing all the information available in h t ml format. The labels are consistent and therefore the data can conveniently be parsed and brought under the control of relational database management. Web search engines do not go more than two leve ls deep into the USPTO’s Web site because they cannot query a database. 11 However, the on-line retrieva l can be autom ated by using a routine in Visual Basic. Visual Basic 6 was th e first version to contai n a so-called In ternet Transfer Control. This component enables us to download the data from a structured database at the Internet such as the one under study here. The routin e for searching all the patents in 2002 with a Dutch address among the inventors or assignees is pr ovided as an example in an Appendix 1. By cutting and pasting the search co mmand from the Internet search in the long line which defines the URL string as the va riable named “strURL,” one is able to accommodate this routine to one’s specific re quirements. The param eter N controls the record number and the param eter P reflects that the titles are provided in screens of fifty records consecutively. The download, parsing, a nd organization into a relational database management can thus be fully automated. This al lows researchers to expl oit this data without constraints. 10 11 Search engines do not access URLs which contain a ques tion mark beca use this indi cates the use of script technology. If spiders enc ounter a “?” in an URL or lin k, they are programmed to stop crawling becaus e they could encoun ter poorly written scrip t or intention al “spider traps” (at http://www.lib .berkeley.edu/TeachingLib/Guides/ Internet/InvisibleWeb.html#Why2 ; cf. Reddi et al ., 2003) 8 As the reader will note, a sim ilar routine can be written for an y database which provides a systematic indication of the sequential results (e.g., the AltaVista Advanced Search E ngine ). However, some databases have deliberately bl ocked this mode of searching by a robot (e.g., ISI’s Web of Science , Google ). 12 3. Results 3.1 University-based patents Among the 184,531 patents with an issue date in 2002, 3,291 refer to universities among the names of the assignees. The total number of assignees is 3,823 and the total num ber of inventors 9,217. In sum, not many of these pate nts are co-assigned, but many of them are co- invented. The 3,291 records contain 44,268 referen ces to patents and 62,138 references to non-patent literature. The number of scientific references outn umbers the patent references for this university-based sample. These patents contain 5,148 unique words (after correction for the stopwords). 13 The 102 most frequently occurring words among these are used for the visualiza tions in Figures 1 and 2 below. The words included occur with a frequency of more than 26 times. 12 The search engine Google offers an alternative by us i ng its own so-called APIs. Th ese allow for searching the database also o n dates, albeit usin g the Julian calen dar. The AltaVista Advanced Se arch Engine is hitherto the only database al lowing for searching with calendar date s (Leydesdorff, 2001 ). 13 For reasons of consistency, t he stopword list available at http://www.uspto.gov/patft/help/stopword.h tm was used througho ut this study a s a standard co rrective to the incl usion and exclusi on of comm on words. Othe rwise, the words are corrected only for the plural “s.” 9 Figure 1 Co-occurrence network of 102 title words in pate nts with a university ad dress during 2002 (N Patents = 3291; Word frequency > 26; 75 wo rds connected at the threshold level of co-occurrences ≥ 10). Figure 1 shows the co-word map given a thres hold of ten co-occurren ces and before the normalization. It is clear that th e most frequently occurring wo rd is “method(s).” This word draws most of the other words into a star-sha ped network. However, one cluster is visible containing the words “fiber,” “liquid,” “bundle,” et c. This cluster is rela ted to the central set through the words “polymer,” “structure,” “high,” and “tem perature.” 10 Upon normalization using the cosine formula, 5 the picture changes to exhibit th e intellectual organization (Figure 2). The main areas of t echnological activities in which universities patent are now visible as clusters. The relatively low value of the threshold (that is, cosine ≥ 0.1) indicates that this stru cture is relatively robust. biomedicine thin films molecular biology fibers Figure 2 Cosine normalized map of 102 title words in 3 ,291 patents with a university address in 2002. (N Patents = 3,291; Word frequency > 26; 85 wo rds connected at the threshold level of cosine ≥ 0.1). The various clusters indicated in the map can be further explored and designated using factor analysis of the matrix. The factor analysis positions the clusters differently in a m ulti- 11 dimensional space. Figure 3 shows the plot of co mponents 1 and 2 in a six-factor solution of this matrix. (Note that factor analysis uses by defa ult the Pearson correla tion as a sim ilarity measure.) Com ponent 1 1.0 .8 .6 .4 .2 0.0 -.2 Component 2 1.0 .8 .6 .4 .2 0.0 -.2 sp e cific preventi on m easurem en formi n g diagnos tic bundle type inhi biti on beta array vaccine m ult ip le thin tem peratur comprising carbon alpha m easuring com posi te anti analy si deli very sens or detect ing li quid activi ty response product organic m agnetic asso ciated viru me mb r a n e im age drug bindi ng tum o r m ole c ule inhi biti ng sur face rel ated appli cati o im agi ng control analog subs trate low vecto r sem i conduc non li ght deriv ati ve complex e canc er ex press ion fi ber las e r sequenc e grow th factor recom binan produci ng tissue film producti on treati ng stru ctu re nu c leic synth esi contai ning preparatio hum an me t a l pepti de dna inhi bitor detect ion encodi ng dis ease poly m er plant agent ma k i n g based opti cal receptor com pound ma t e r i a l proce aci d high gene treatm ent same devi ce cell protein co m p os itio apparatu usi ng system m ethod Figure 3 Results of the factor analysis of the co-occurr ences of 102 title words in 3,291 patents with a university address in 2002. Figure 3 exhibits two of the cl usters indicated in Figure 2 as the major dimensions of the matrix. Another (fifth) factor (w ith factor loadings for the title words “gro wth,” “sequence,” and “factor”) exhibits interfactor ial complexity w ith the one designated above as “molecular biology” (containing the title wo rds “nucleic”, “acid”, and “en coding”). In principle, one would thus be able to use the factor loadings of words in a p ositional analysis for pur poses of the designation of clusters in the visualization of mutual relations (Bur t, 1982). However, the 12 development of this technique would reach beyon d the scope of this study (Leydesdorff, in preparation). 3.2 The scientific knowledge bas e of university -based patents The 38,509 literature references that contain ti tle words within quotation marks can be broken down into 25,078 unique words. Of these word s, we use again the approximately one hundred words which occur most frequently. These one hundred words were found to occur more than 438 times in these references. The co-occurrences of these words with the 102 title words of patents used in the pr evious analysis can be organized into an asymmetrical or bi- modal matrix (Table 2). The title words of the patents are used as th e column variables and the title words in the NPLRs as the case labels. U sing Pajek this matrix can be represented as in Figure 4. 13 method system using apparatus composition protein cell 102 title words of patents → cell 2087 182 414 26 653 466 1210 protein 1549 121 374 34 501 956 412 gene 1788 275 329 31 678 544 772 human 1150 134 211 16 477 230 535 dna 706 116 84 8 190 156 254 expression 640 94 102 2 229 181 372 receptor 683 3 91 2 198 197 175 virus 686 198 184 2 187 212 803 factor 525 16 102 5 178 81 174 tumor 570 42 121 2 322 138 210 synthesis 438 36 121 13 160 79 84 peptide 586 27 180 11 177 297 141 growth 498 22 165 10 164 56 207 100 title words in references ↓ Table 2 Part of the asymmetrical matrix of 102 title words in patents versus 100 most frequently occurring title words in the literature reference s within these patents. 14 Figure 4 Bimodal representation of the 102 most frequently occurring words in titles of patents (> 26 times) and the 100 most frequently occurring word s in titles of scien tific citations (> 438 times) in these patents (3291 patent s; 38,509 scientific references). The figure shows that the title words of the patent s in the biomedical sector are so rted in the middle of a set of title words from the cited sc ientific literatur e. The connecting patent words (in white) are in the center becau se this set ties the set of wo rds from the literature together with various foci. A group of patent words on th e bottom-left side of this figure is only related to the research s ide of the biomedical literature and no t to more clinically oriented words like “patient,” “therapy” or “treatment.” Title words from thes e patents are positioned outside the halo formed by the title words of the NPLR. 15 The depiction in Figure 4 should be read with the caveat that it remain s a visualization of a system. The system under study m ay have more than a single (relative) minim um for the “energy” (Kamada & Kawai, 1989). Another loca l minimum, for example, can be found in this matrix so that the title words of patents envelop the title words of the NPLR. The patents draw on the scientific references as their knowledge base and this can be depicted either as a focusing device in the middle of th e variation or as a ball with th e original variation contained within it. If we focus on the subset of 1,920 patents whic h contain NLPR, we can generate a list of words occurring most frequently in these pa tents only. The 96 words occurring more than 17 times in these patents ar e mapped in Figure 5 after norm alization for cosine ≥ 0.1 (as previously). As can be expected on the basis of the above analysis, the biotechnology field and the molecular biology field are com pletely dominant in this s ubset. Note that the agricultural applications of biotechnology are marginal. 16 pharma electronic instruments biotechnology molecular biology Figure 5 Cosine normalized map of relations between 96 title words in patents with literatu re references and a university address in 2002. (N Patents = 1,920; Word frequency > 17; 90 words connected at the threshold level of cosine ≥ 0.1). 3.3 Dutch patents Unlike the patents with a university address among the assignees, the 2,824 patents with a geographical address in the Nether lands contain far more referenc es to other patents than to non-patent literature, no tably 31,514 and 6,396 references, respectively. Among the 6,396 non-patent literature references some 3,440 contain title words between quotation marks, and 17 these reference belong to only 643 patents in the set. Thus, the science base of this set—as indicated by form al literature references—is much less prom inent than in the case of the previous set. The role of the Dutch universi ties is marginal: while 29 of these 2,824 patents contain a university address, only 15 of these university addre sses (0.5%) are located in the Netherlands. medical systems flowers chemistry electro-technical coating cars energy Figure 6 Cosine normalized map of 105 co-occurring word s in patents (in 2002) with a Dutch address among the assignees or inventors (N Pate nts = 2,824; Word frequency > 22; 94 words connected at the thres hold level of cosine ≥ 0.1). 18 Figure 6 shows the normalized co-occurrence map of the 105 title word s that occur with a frequency of more than 22 among the 4,005 unique title words contained in the set. The picture exhibits a recognizable representati on of the Dutch industrial structure with a dominance of electro-te chnical and chemical applications. Multin ational corporations are dominant in the set. For example, Philips wi th a f ocus on electro-technical systems holds 768 of the 1,963 patents (39.1%) with a Dutch addr ess among the assignees. Medical systems are related to the electro-technical side of the set through imaging devices . The occurrence of a small set of patents related to th e nam es of flowers is noteworthy. Figure 7 exhibits the occurrence of these 105 titl e words of patents in relations to the 3,440 scientific literature citations that contain title words between quotation marks. The latter constitute a domain of 6,072 unique words, of wh ich we selected the 101 that occur with a frequency larger than 31. 19 Figure 7 Network of title words in patents 14 with a Dutch address among the assignees or inventors in relation to the title words used in their literature referen ces (N Patents = 2,824; Word frequency > 22; 3,440 literature references with 6,072 unique wo rds of which 101 occur with a frequency > 31). The picture shows that the references are concen trated in the bio-m edi cal sector. A relatively small set of (643) patents is related in this way, and these title words are placed in the center of a halo with the title words from the litera ture references. The other patent words are not related in this way. Title words of these pate nt are placed ou tside the halo of bio-medical applications. 14 Because I used the first ten c haracters of the words fo r the identification in this case, t h e 105 words were reduced to 104: “manuf acture” and “manufactur ing” were equated. 20 The 643 “science-based” patents contain 1,681 unique words of which 107 occur more than 6 times. Figure 8, finally, provides the cosine-map of these co-occurrences in a form at similar to the ones above. Note that this map is much more fine-grained th an the previous ones because of the much lower level of the thre shold for the occurrence frequencies of title words. Because of the smaller range of values in the cells, th e structure becomes visible only when the cosine-threshold is raised to ≥ 0.2. bio-medical networks electro- technical immobilization imaging Figure 8 Cosine normalize map of 107 m ost frequently occurring words in 643 “literature-based” patents with a Dutch address among the assigne es or inventors (N Patents = 643; Word frequency > 6; 83 words connected at the threshold level of cosine ≥ 0.2). 21 Figure 7 above showed that the knowledge base of the Dutch patents—as visible in the U.S. patent database—is integrated by the bio-medi cal applications, but Fi gure 8 shows that the latter are not central to the aggregate of these activities. In this depicti on, the industrial structure rem ains more im portant than the intellectua l organizati on of these patents. Biomedical terms (e.g., “DNA”, “nucleic”) are re latively peripheral in Figure 8. However, the finding that the knowledge base of this patent set is integrat ed by a bio-m edical network of title words in their NPLR is meaningful because the industrial structu r e visible at the surface is dominated by electro-technical and chemical applications. 4. Conclusion The science-based model of university-industry collaborations was shaped in the 1980s with biotechnology as the prime ex ample (Narin & Nom a, 1985; OECD, 1988). Our data for 2002 suggest that this pattern has now been established as a dominant pattern (Narin et al ., 1997; Owen-Smith et al ., 2002). Information and communicati on technologies, for exam ple, have not led to similar patterns of fo rmalized exchanges between the scientific literature and the patent literature in other fi elds. Kaghan & Barnett (1997) sign aled that the laboratory model tends to work as a “metonymy” because it gu ides the thinking about new policies in university-industry relations. Universities are active in ne w fields (e.g., thin films; cf. Bhattacharya et al ., 2003), but the relationship with the organized knowledge production system is much less form alized in terms of literature relatio ns. 15 15 Glänzel & Meyer (2003) have studied the “reverse citation” of patents being cited in the scientific literature. Their conclusi on is that estab lished fields li ke “che mistry” are the main contributors to these relations. 22 In the second case, we turned to The Nether lands as an example of a knowledge-based, but nationally integrated economy. When we raise the question of how this knowledge base was reflected in the U.S. patent data, one can recognize the major industrial players in the patent domain. As noted, the role of the Dutch unive rsities is marginal. Nevertheless, am ong these patents, only the ones with bio-medical relevanc e contain the noted pattern of science-based references. Thus, this relationship between scien tific literature and pate nts is not specific to universities, but sector specific. In the Dutch case, the patter ns of scientific referencing provide a network that connects the main opera tional areas of knowledge -based industries in the background. These results suggest that one should be aw are that policy-makers tend to think about university-industry relati ons in general terminologies, but tha t these relations are mainly shaped in the knowledge base of the bio-medical sector . Other sectors may contain mechanisms for integration and knowledge-transfe r that are completely different from these bio-medical innovations. Thus, one should not generalize easily from the experience with biotechnology and bio-medicine to other sector s of industry or disc iplines of science. Biotechnology is a specific m ode of interrelation ship between science and industry. From the perspective of the further development of Internet research and inform ation science and technology, I have mainly wished to show that the Internet opens domains beyond the Internet for new scientific i nvestigations. One routine for a ccessing these “hidden” domains was specified. The large amounts of data that can be m ade ava ilable by this technique, can be analyzed by using the visualization to ols and the normalizations indicated. 23 References Ahlgren, P., B. Jarneving, and R. Rousseau (2003). Requirement for a Cocitation Similarity Measure, with Special Reference to Pearson’s Correlation Coefficient, Journal of the American Society for Information Science and Technology 54(6), 550-560. Bergman, M. K. (2001). The Deep Web: Surfacing Hidden Value, Journal of Electronic Publishing, 7(1) (2001); at http://www.press.umich.edu/jep/07-01/bergman.html Bhattacharya, S., H. Kretschmer, and M. Me yer. (2003). Characterizing Intellectu al Spaces between Science and Technology. Scientometrics 58(2), 369-390. Black, G. R. (2002). Exploring Inventions and Ideas: K eyword Patent Searching - Online . Southfield, MI: http://www.keypatent.net/ . Burt, R. S. (1982). Toward a Structural Theory of Action . New York, etc.: Academic Press. David, P. A., & D. Foray. (2002). An Introduction to the Economy of the Knowledge Society. International Social Science Journal , 54 (171), 9-23. Fuchterman, T., and E. Reingold (1991). Gr aph drawing by force-directed replacement, Software--Practice Experience 21, 1129-1166. Glänzel, W., & M. Meyer. (2003). Patents Cited in the Scientific Literature: An Exploratory Study of ‘Reverse’ Citation Relations. Scientometrics (forthcoming). Granstrand, O. (1999). The Economics and Management of Intellectual Property: Towards Intellectual Capitalism . Cheltenham , UK: Edward Elgar. Grupp, H., and U. Schmoch. (1999). Patent Statis tics in the Age of Globalisation: New Legal Procedures, New Analytical Met hods, New Economic Interpretation. Research Policy, 28 , 377-396. 24 Hamers, L., Y. Hemeryck, G. Herweyers, M. Ja nssen, H. Keters, R. Rousseau, et al. (1989). Similarity Measures in Scientometric Re search : The Jaccard Index Versus Salton's Cosine Formula. Information Processing & Management , 25 (3), 315-318. Henderson, R., A. Jaffe, and M. Trajtenberg (199 8). Universities as a Source of Commercial Technology: A Detailed Analysis of University Patenting, 1965-1988. Review of Economics and Statistics , 80(1), 119-127. Jaffe, A. B., & M. Trajtenberg. (2002). Patents, Citations, and I nnovations: A Window on the Knowledge Economy . Cambridge, MA/London: MIT Press. Kaghan, William N. and Gerald B. Barnett ( 1997). The Desktop Model of Innovation in Digital Media, in: Henry Etzkow itz and Loet Leydesdorff (eds.), Universities and the Global Knowledge Economy: A Triple He lix of University-Industry-Government Relations . London: Cassell Academic, pp. 71-81. Kamada, T., and S. Kawai (1989). An algorith m for drawing general undirected graphs, Information Processing Letters 31(1), 7-15. Larsen, B., & P. Ingwersen. (2002, August 11- 15). The Boomerang Effect: Retrieving Scientific Documents Via the Network of References and Citations. Paper presented at SIGIR’02 , August 11-15, Tampere, Finland. Leydesdorff, L. (1995). The Challenge of Scientometrics: The Developm ent, Measurement, and Self-Organization of Sc ientific Communications . Leiden: DSWO Press, Leiden University; at http://www.upublish.com/books/leydesdorff-sci.htm . Leydesdorff, L. (2001). Indicators of Innovation in a Knowledge-Based Economy. Cybermetrics , 5 (Issue 1), Paper 2, at http://www.cindoc.csic.es/cybe rmetrics/articles/v5i1p2.htm l . 25 Leydesdorff, L. (2003). The Mutual Inform ation of University-Industry-Government Relations: An Indicator o f the Triple Helix Dynamics. Scientometrics , 58 (2), 445- 467. Leydesdorff, L. (in preparation). Meaning and Translation at the In terfaces of Science: Mapping the Case of ‘Stem-Cell Research,’ Paper presented at the 27th Annual Meeting of the Society for Social Stu dies of Science (4 S) , Altanta, GA, 15-18 October 2003. Leydesdorff, L., & R. Zaal (1988). Co-Words and Citations. Relations between Docum ent Sets and Environments. In L. Egghe & R. Rousseau (Eds.), Informetrics 87 /88 (pp. 105-119). Amsterdam: Elsevier. Leydesdorff, L., & M. Meyer. (2003). The Trip le Helix of University-Industry-Governm ent Relations: Introduction to the Topical Issue. Scientometrics , 58 (2), 191-203. Leydesdorff, L., & A. Scharnhorst. (2003). Measuring the Knowledge Base: A Program of Innovation Studies. Report to the “Förderinitiative Science Po licy Studies” of the German Bundesministerium für B ildung und Forschung. Berlin: Berlin- Brandenburgische Akademie der Wissenschaften; at http://sciencepolicystudies .de/Leydesdorff&Scharnhorst.pdf . Mansfield, E. (1991). Academic Re search and Industrial Innovation. Research Policy , 20 (1), 1-12. Meyer, M. (2000). Does Science Push Technol ogy? Patents Citing Scientific Literature. Research Policy 29 , 409-434. Michelet, B. (1988). L'Analyse des Associations. Unpublished Ph.D. Thesis, Université Paris VII, Paris. Narin, F., K. S. Hamilton, & D. Olivastro. (1997). The Increasing Link between U.S. Technology and Public Science. Research Policy , 26(3), 317-330. 26 Narin, F., & E. Noma. (1985). Is Technology Becoming Science? Scientometrics , 7, 369-381. Narin, F., & D. Olivastro. (1988). Technol ogy Indicators Based on Patents and Patent Citations. In A. F. J. van Raan (Ed.), Handbook of Quantitative Studies of Science and Technology (pp. 465-507). Amsterdam: Elsevier. Narin, F., & D. Olivastro. (1992). Status Re port: Linkage Beteen Technology and Science. Research Policy , 21, 237-249. Nelson, R. R. (ed.). 1993. National Innovation Systems: A comparative analysis . New York: Oxford University Press. Noble, D. (1977). America by Design . New York: Knopf. OECD. (1988). Biotechnology and the Changing Role of Government . Paris: OECD. OECD/Eurostat (1997). Proposed Guidelines for Collect ing and Interpreting Innovation Data, “Oslo Manual”. Paris: OECD. Ortega Priego, J. L. (2003). A Vector Space Model as a Methodological Approach to the Triple Helix Dimensionality: A Comparative Study of Biology and Biom edicine Centres of Two European National Councils from a W ebometric View. Scientometrics , 58 (2), 429-443. Owen-Smith J., M. Riccaboni, F. Pammolli, a nd W. W. Powell (2002). A Comparison of U.S. and European University-Industry Relations in the Life Sciences. Management Science 48(1), 24-43. Pavitt, K. (1984). Sectoral Patterns of Technical Change: Towards a Theory and a Taxonomy. Research Policy, 13 , 343-373. Reddy, C., P. Wouters, & I. Aguillo. (2003). Invisible Internet in Parts of the EU Research Area . Deliverable of Project WISER to the European Commission. Amsterdam / Madrid: Nerdi, CINDOC. 27 Salton, G., & M. J. McGill. (1983). Introduction to Modern Information Retrieval . Auckland, etc.: McGraw-Hill. Sampat, B. N., D. C. Mowery, A. A. Sidonis (2 003). Changes in University Patent Quality after the Bayh-Dole Act: A Re-Examination. International Journal of Industrial Organization (forthcoming). Sherman, C., & G. Price. (2001). The Invisible Web . Medford, NJ: Cyberage Books. Wagner, C. S., & L. Leydesdorff. (2003). Ma pping Global Science using International Co- Authorships: A Comparison of 1990 and 2000. In J. Guohua, R. Rousseau, W. Yishan (Eds.), Proceedings of the 9th Internati onal Conference on Scientometrics and Informetrics (pp. 330-340) . Dalian: Dalian University of Technology Press. White, H. D. (2003). Author Coci tation Analysis and Pearson’s r . Journal of the American Society for Information Science and Technology , 54 (13), 1250-1259. Whitley, R. D. (1984). The Intellectual and Social Organization of the Sciences . Oxford: Oxford University Press. 28 29 Appendix 1 Visual Basic 6 code for collecting the 2,827 pa tents in 2002 with either an inventor or assignee with a Dutch (NL) address Private Sub Form_Load() Dim strURL As String ' URL string Dim intFile, N As Integer, P As Integer ' FreeFile variable P = 1 For N = 1 To 2827 intFile = FreeFile() strURL = "http://patft.uspto.gov/netacgi/nph-_ Parser?Sect1=PTO2&Sect2=HITOFF&u=/netahtml/search-adv.htm&r="_ + LTrim(Str(N)) + "&f=G&l=50&d=PTXT&s1=(ISYR-2002+AND+_ (nl.ASCO.+OR+nl.INCO.))&p=" + LTrim(Str(P)) + "&OS=isd/$/$/2002+_ and+(acn/nl+or+icn/nl)&RS=(ISD/2002$$+AND+(ACN/nl+OR+ICN/nl))" Open ("C:\temp\p" + LTrim(Str(N)) + ".htm") For Output As #intFile Print #intFile, Inet1.OpenURL(strURL) Close #intFile If N Mod 50 = 0 Then P = P + 1 Next End

The university-industry knowledge relationship: Analyzing patents and the science base of technologies

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment