The influence of relatives on the efficiency and error rate of familial searching

The inﬂuence of r el atives on the eﬃci ency and erro r ra te of famil ial searching Rori V. Rohlfs a, ∗ , Erin Murph y b , Y un S. Song c,d , Mon tgo mery Slatkin a a Dep artment of Inte gr ative Biolo gy, Univer si ty of California, Berkeley, CA 94720, USA b Scho ol of L aw, New Y ork University, New Y ork, NY 10012, USA c Computer Sci enc e Di vision, University of Cali fornia, Berkeley, CA 94720, USA d Dep artment of Statistics, University of Calif ornia, Berkeley, CA 94720, USA Abstract W e in v estigate the consequences of adopting th e criteria used by the s tate of California, as d escrib ed b y My ers et al. (2011 ), for conducting familial searc hes. W e carried out a s im ulation s tu dy of randomly generate d p roﬁles of related and unrelated individ uals with 13-locus CODIS genot yp es and YFiler R  Y-c hromosome h aplot yp es, on wh ich the Myers p rotocol for relativ e id entiﬁcatio n w as carried out. F or Y-c h romosome sharin g ﬁrst degree relativ es, the My ers proto col has a high probabilit y (80 ∼ 99%) of identifying their r elatio n ship. F or unrelated individuals, there is a lo w probabilit y that an un related p erson in the d atabase w ill b e identiﬁed as a ﬁrst-degree relativ e. F or more distant Y-haplot yp e sharing relativ es (half-siblings, ﬁrst cousins, half-ﬁrst cousins or second cousins) th er e is a substant ial p robabilit y that the more d istan t relativ e w ill b e incorrectly iden tiﬁed as a ﬁrst-degree relativ e. F or example, there is a 3 ∼ 18% p r obabilit y that a ﬁr st cousin will b e iden tiﬁed as a full sibling, with the p robabilit y dep ending on the p opu lation bac kground. Although the California familial searc h p olicy is lik ely to iden tify a ﬁrst degree relativ e if his pr oﬁle is in the database, and it p oses little risk of falsely identifying an unr elated individ ual in a database as a ﬁ rst-degree relativ e, there is a sub stan tial risk of falsely iden tifying a more distan t Y-haplot yp e sharing relativ e in the database as a ﬁrst-degree relativ e, with the consequence th at their imm ediate family may b ecome the target for further inv estigation. Th is risk f alls disprop ortionately on those ethnic groups that are currently o verrepresen ted in state and federal databases. Keywor ds: familial searc hing, kinship searc hing, p opulation genetics, distant relativ es, like lih o o d ratio, Y-c hromosome In tro duction DNA d atabases hav e unq u estionably assumed a vital r ole in the American crimin al justice system. Genetic evidence has serv ed to b olster the evidence in existing cases and to identify susp ects through “cold hit” database m atches [1]. T ypically , in vestiga tive queries in DNA d atabases ha ve b een limited to searc hes in tended to ﬁ nd the source of the crime-scene sample. Ho wev er, increasing atten tion has b een give n to the question of whether law enforcement should also b e able to search for partial matc h es, th at is, DNA searc hes inte n ded to ﬁnd th e source b y ident if y in g a relativ e in the database [1 , 2]. Suc h searches are commonly called “kinship” or “familial” searches. ∗ Corresponding author, rroh lfs@berkeley .edu, +1 (510) 643-0060 Email addr esses: rrohlfs@berkeley .edu (Rori V . Rohlfs), eem9@nyu.edu (Erin Murphy), yss@eecs.b erkeley.edu (Y un S. Song), slatkin@berkel ey.edu (Montgo mery Slatkin) Pr eprint submitte d to arXiv Aug ust 16, 2013 The concept of f amilial searc hin g is not particularly n ew. In fact, familial searc hes fu eled some of th e earliest illustrations of the in ve stigativ e p o wer of DNA typing [3]. In 2002, inv estigato r s in the Un ited Kingdom iden tiﬁed a serial rapist in part th rough a database searc h that led th em to the p erp etrator via the DNA proﬁle of his son [4]. In an other widely cited case, UK in ve stigators reco v er ed DNA from a bric k thrown oﬀ an o verpass that landed on a tr uc k, leading to the driver’s fatal heart attac k, and foun d the source through a database s earc h that lo cated a relativ e [5]. More recen tly , California authorities used a familial searc h to iden tify the putativ e son of a serial killer nic knamed the “Grim Sleep er,” and arrested the susp ect after a sting op eration in w hic h p olice collect ed a discarded pizza crust [5]. F amilial searc h es by no means domin ate the use patterns of DNA databases, in part b ecause they are diﬃcult to conduct, require access to a sizeable database, are sub ject to v arious clear or unclear legal restrictions, and raise ethical concerns [6]. Because f amilial searc hes are by design inexact, th e most eﬀectiv e metho d s t ypically emplo y several steps b eyo nd a s imple database searc h. T o ﬁn d a lead, one commonly used approac h, wh ic h ma y enta il additional rounds of testing, relies on examining the Y haplot yp e of all signiﬁcan t partial matc h es. Ev en if a partial m atc h is identiﬁed, la w enforcement still must in vestig ate the relativ es of that p erson to determine if an y are lik ely to b e th e crime scene sample sou r ce. Nev ertheless, familial searc hes are presen tly conducted by a n umber of jur isdictions and con tin u e to garn er interest. The UK is the most pr ominen t and longstanding advocate of the tec hnique: from 2004 to 2011, 179 cases w ere submitted f or searc h [7]. As of 2009, New Zealand had condu cted 12 familial searc hes in serious cases [8]. The Netherlands recen tly passed legislation authorizing familial searc h es [9, 10]. J apan, Australia and C anada h a v e r ob u st DNA collectio n programs, b ut only Canada h as explicitly rejected familial searc hes on what app ear to b e p riv acy ground s [11]. With regard to Eur op e, it is w orth noting that adoption ma y b e slo wed b y the Decem b er 2008 judgment by the Europ ean Court of Human Righ ts that declared the reten tion of DNA p roﬁles and samp les from u ncon victed p ersons a violation of Article 8 of the Con ven tion for th e Protection of Hu man Right s and F undamental F reedoms. That opinion, (S. & Marp er v. United Kingd om), in vok ed the pro vision of the Conv en tion that safeguards its mem b ers’ “priv ate life” [12]. Although the inﬂuence of the ruling b eyond its imm ediate holding is un clear, the Marp er decision do es represent the ﬁ rst sub stan tial curtailmen t of DNA expansion programs by a legal en tit y . Moreo ver, Marp er ma y b e used by p riv acy adv o cates and opp onents to widespr ead DNA t yp ing to b olster their legal claims to circumscrib e suc h p rograms. In th e United States, the push to expand DNA testing has in tensiﬁ ed . Originally , United States national database administrators prohibited the disclosure of iden tifying information for partial matc hes made across state lines [5]. As a result, although man y states either legally authorized or informally p ermitted partial m atch (“mo derate strin gency”) rep orting and/or familial searc hes [13], inv estigators could not obtain inf orm ational leads on p roﬁles generated out of state. In 2006, ho wev er, th e FBI mo d iﬁed its p olicy and no w p ermits inte r state sharing [14]. As of Ma y 2012, a bill was p ending b efore Con gress that w ould allo w the FBI to conduct familial searc hes in federal and state inv estigati ons [15]. Because th e ru les go ve r ning familial searc h metho ds in the United States consist of a patch w ork of state law, s tate and lo cal r egulation, and ev en internal lab oratory p olicies [5, 6], it is im p ossible to relate a precise legal picture. In J une of 2013, th e U.S. Su preme Court in Maryland v. King [16] upheld DNA collection fr om arrestees for s er ious oﬀenses. Although the Court noted that Maryland forbids f amilial searches, th at observ ation did not seem cen tral to its h olding, and no lo wer courts ha ve ruled on the iss u e. Assessmen t is furth er complicate d by the slim line th at diﬀeren tiates uninte n tional and in tentio n al partial matc h searc h es, b ecause some jur isdictions allo w the former but not the latter. Nev ertheless, some clarit y is p ossible. 2 A t the state lev el, b oth Maryland and W ashington, D.C. h av e laws expressly forbidd ing familial searc hing [17 , 18], although the language of b oth statutes could b e inte rpreted to p ermit r ep orting of uninte n tional partial matches. As a matter of either written or unwritten p olicy , r ou gh ly nine states expressly forbid b oth partial matc hing and familial searc h ing: Alask a, Nev ada, Utah, New Mexico, Michiga n , V erm on t, Massac husetts, and Georgia [19]. A t least an other seven states prohibit familial searc hes, b ut allo w rep orting of inadverte n t p artial matches [19]. Fifteen states allo w b oth forms of p artial matc hing, although all of them rely up on formal or informal p olicies rather than express statutory auth orizatio n [19]. The states most activ ely pursu ing familial matc hes are California, Colorado, Virginia, and T exas; Pe nnsylv ania, Minnesota, and T enn essee are considering legislat ion. In April 2008, Califor- nia b ecame the ﬁrst state to form ally endorse and adopt exp licit rules for conducting inten tional familial searc hes [20]. The bur geoning in terest in familial searc hing has reignited a n ational conv ers ation ab out the propriet y of the metho d that fo cu ses on legal and ethical issues [21, 22]. The ma jor concerns are t wo- f old: ﬁ rst, is familial searc hin g actually eﬃcacious, and second, do es it adequately resp ect priv acy and equalit y in terests? With regard to eﬃcacy , the c hallenges of familial searc hing are reﬂected in its rep orted success rates, alb eit based on limited data. Th e UK rep orts the greatest eﬀectiv eness with a 11 ∼ 27% success rate) [23]. California has conducted 29 searches, with 2 r ep orted successes ( ∼ 7% su ccess rate) [24 ]. With regard to the ethical iss ues, familial searc h es raise p riv acy , equalit y , and d emocr atic ac- coun tabilit y concerns [5, 25, 2, 1, 6 , 26]. In the United S tates, the most common critique is that the metho d is likely to h a v e a discriminatory eﬀect b ecause DNA d atabases conta in the proﬁles of cer- tain racial minorities in disp rop ortion to their presence in the p opulation. T o d ate there ha ve b een only a handful of eﬀorts to quantify the imp act of familial searc h es, and all ha ve b een u ndertak en without referen ce to a sp eciﬁc searc h p olicy [1, 2, 3]. Only one s tudy , by a multidisciplinary team of researc hers, attempted to calculate th e general d iscriminatory impact and concluded that roughly “four times as m u c h of the African-American p op u lation as the U.S. Caucasian p opulation would b e ‘un der surveilla n ce’ as a resu lt of family forensic DNA” [2]. It is this estimation that s c holars, p olicymak ers and the p opular press hav e latc hed up on as a means of quantifying the racial imp act of familial searc h ing [21, 27, 28, 29], and while helpful, it is nonetheless an approximati on reac hed b efore any sp eciﬁc p olicy was in p lace to b e examined. Answ ering the eﬃcacy and ethical concerns raised by familial searc h metho ds in part requires addressing complex statistica l questions. The articulation of the ﬁ r st formal familial searc h p olicy b y California [30], an American state with the w orld’s fourth largest DNA database (nearly 2 million proﬁles) and a large and diverse general p opulation [31], aﬀords an opp ortun it y to gain v aluable insigh t into the question of whether and under what circum s tances f amilial searc hing should b e allo w ed. T he racial and ethnic d iv ersit y of the California database roughly mirr ors the racial and ethnic div er s it y of the United States n ational database [32, 33]. Moreo v er, as a b ellw ether of criminal justice p olicy , California has already wielded inﬂuence b oth nationally and in ternationally as other jur isdictions con template v arious app roac hes. Metho ds Here we imp lemen t the My ers et al. familial iden tiﬁcation pro cedure used f or familial searc hing in California [30] to estimate p o w er and false p ositiv e rate in addition to estimating the rates of misiden tiﬁcation of distant relativ es as ﬁrst-degree relativ es. As d etailed more b elo w, in the Myers et al. metho d, b oth p aren t-oﬀspring and sib ling relationships are considered by ﬁrst calculating 3 eac h lik eliho o d ratio usin g autosomal data b etw een the u nkno w n sample and eac h en try in the state database. Of these, the database samp les with the highest lik eliho o d ratios are considered in a sec- ondary lik eliho o d ratio analysis using Y-c hromosome haplot yp es. Th e cum u lativ e lik eliho o d ratios are calculate d u nder three p op u lation genetic assump tions and if they pass particular thresholds, the individual is considered a susp ect. This metho d is d etailed in our descriptions b elo w. Al lele fr e quency data Autoso mal data In this study , w e u se allele frequency estimates to in vestig ate id en tiﬁcation p ro cedures that are con tingen t on racially deﬁned p opulation samp le allel e frequen cy calculatio n s. F or th e autosomal STR allele frequencies, we rely up on estimates from a published su rv ey of ﬁv e p opu lation samples consisting of 182-2 13 individuals eac h and classiﬁed according to so cially-iden tiﬁed race [33]. The groups are describ ed in the study as ‘Vietnamese,’ ‘African American,’ ‘Caucasian,’ ‘Hispanic,’ and ‘Na v a jo.’ Any lab eling s cheme introdu ces questions and classiﬁes groups in diﬀerent wa ys not indep end ent of the so cial constru ction of these group s. In this study we use the lab els Vietnamese American, Af r ican American, Eu rop ean American, Latino Am er ican, and Nativ e American. The consent and p opulation groupin g pro cedures us ed to obtain these data are not clear. Since these data were collected, th e customary ethical standards regarding informed consent pro cesses ha ve c hanged considerably , driven by sev eral cases of severe m isuse of samples pr o vided by Indige- nous comm u nities [34, 35, 36 , 37, 38, 39, 40, 41, 42]. W e u s e these data b ecause of their public a v ailabilit y and utilit y to in vestig ate error rates and eﬃcacy in familial searc h ing. W e lo ok f orw ard to working with data collected using transparent informed consen t metho dology . The California state database consists of some entries with the 13 core CODIS lo ci and some with 15 lo ci [30]. T o maintain manageable complexit y , in this study w e only consider the core 13 lo ci. Similar analyses can b e p erformed w ith 15 lo cus proﬁles. Y-chr omosom e data F or the Y-haplot yp es, w e consider d ata released by ABI consisting of YFiler R  haplot yp es geno- t yp ed in ind ividuals group ed according to so cial lab els ‘Vietnamese,’ ‘Afr ican American,’ ‘Cau- casian,’ ‘Hispanic,’ and ‘Nativ e American’, with sample sizes of 103, 1918, 4102, 1594, and 105 individuals, resp ectiv ely (Applied Biosystems R  , F oster Cit y , CA) [43]. Again, w e r efer to these groups as Vietnamese American, Afr ican American, Europ ean American, Latino American, and Nativ e American. Individu als were genot yp ed and categorized in to p opu lation lab eling sc h emes diﬀerentl y for the autosomal and Y-chromosome marke rs. I n this study , w e u s e samples with the same lab els in b oth the autosomal and Y c h romosome data to get our com b ined p opulation sample allele frequencies for the Vietnamese American, African American, Europ ean American, and L atino American groups. Accordingly , the group w e call Nativ e American is created from ‘Na v a jo’ autosomal mark er allele frequencies and ‘Nativ e American’ Y-c hromosome allele frequencies. This inconsistency brin gs to question the relev ance of these results for highly sp eciﬁed p opulations. Ho wev er, this degree of inconsistency in p opu lation lab eling is not remark able when consid ering the wid e v ariation t yp ical to categoriz ing p opulation groups (so cial identit y-b ased lab els like ‘Hispanic’, ‘Afr ican American,’ or ‘Caucasian’). Th e results of th e analysis of these data sh ould b e conﬁ rmed and augmen ted by similar analyses of more transparent data. 4 Simulation sc heme Simulating r elatives T o inv estigate the p o w er and false p ositiv e rate of relativ e identiﬁcatio n pro cedures, pairs of related ind ividuals w ere sim u lated. Sp eciﬁcally , 100,00 0 pairs of paren t-oﬀspr ings, siblings, half- siblings, cousins, half-cousins (individu als sh aring a single grandparent), and s econd cousin s (in- dividuals s haring a set of great-grand paren ts) w ere sim ulated u sing allele frequ ency distribu tions for eac h of the ﬁve p opu lations describ ed ab o ve . The r elativ e p airs were s imulated to share a Y- haplot yp e by descent , an d we refer to this sort of r elationship as Y-sharing. The autosomal mark ers for all of the individ u al pairs w er e simulated with a p opulation backg r ound relatedness parameter θ = . 01, in accordance with the lo wer recommended corr ection in identiﬁcati on lik eliho o d ratio estimations [44]. Simulating unr elate d individuals Since unrelated individuals v ery r arely sh are enough alleles to resem b le genetic relativ es, more sim u lations are needed to accurately estimate the rates of p ositive r elativ e id entiﬁcatio n b et wee n unrelated ind ividuals. T o this end, 200,00 0,000 pairs of unrelated ind ividuals w ere simulat ed based on allele f requencies from eac h p air of p opu lation samples. Because of the immense p olymorphism of Y -c hromosome haplot yp es, accurate estimates of bac k- ground Y-c h romosome relatedness ( ˆ θ Y ) require greater sample sizes. T o simulate Y-haplot yp es of unrelated individu als with realistic lev els of bac kground relatedness, haplot yp es were indep endent ly dra w n f rom the data. This wa y , rates of coincidenta lly shared Y-haplot yp es corresp on d with those observ ed in the a v ailable data. Note that sim u lated rates of coinciden tally shared Y-c hromosome haplot yp es are greatly inﬂuenced by the av ailable data, which for some p opu lation samples is based on small num b ers of individuals. R elative identiﬁc ation pr o c e dur e P arent-o ﬀs pring and s ibling identiﬁcat ion proto cols w ere follo we d with the metho d implement ed in California wh ic h incorp orates autosomal and Y-c h romosome haplot yp e data [30]. Th ese calcu- lations were p erformed on p airs of individuals simulated with diﬀerent genetic r elatio n ships, using the allele fr equencies from eac h p opulation samp le. Autoso mal likeliho o d r atio Using au tosomal data, the s tandard likel iho o d ratio (LR) comparing th e pr ob ab ilities of the observ ed genot yp es ( G ) assu ming a particular genetic relationship (paren t-oﬀsprin g or sibling) and assuming the individuals are un related is deﬁned as [1, 45] d LR A = P ( G | k 0 , k 1 , k 2 ) P ( G | k 0 = 1 , k 1 = 0 , k 2 = 0) , (1) where k 0 , k 1 , and k 2 are p arameters d escribing th e pr obabilities that individu als with the sp eciﬁed relationship share 0, 1, or 2 alleles iden tical b y descen t (IBD) [46]. As sp eciﬁed by My ers et al. , this LR is estimated under three conditions using allele fr equency d istributions from African American, Europ ean American, and Latino American p opulation samp les with no θ -correction for p opulation substru cture, as p racticed in California [30]. 5 Y-haplotyp e likeliho o d r atio Ignoring m utation, the probability that t wo Y-sharing relativ es h a v e the same haplot yp e of p opulation frequency p is p . On the other hand, the pr obabilit y that t wo u nrelated male individuals eac h ha ve that same h aplot yp e is p 2 . So, the Y-haplotype lik eliho o d ratio LR Y is 1 /p . In the My ers et al. pr o cedure [30 ], LR Y w as estimated as the inv erse of th e upp er 95% conﬁd ence limit of the h aplot yp e frequ ency , obtained us ing the data p o oled across p opulations excluding the sampled haplot yp e [47, 30]. S p eciﬁcally , after exclusion, if the Y-haplot yp e is observ ed w ith sample frequency ˆ p in the database, d LR Y = " ˆ p + 1 . 96 r ˆ p (1 − ˆ p ) n # − 1 , (2) whereas if th e Y-haplot yp e is not observed in the database, d LR Y =  1 − 0 . 05 1 /n  − 1 , (3) where n d enotes the total n u mb er of Y-haplot yp es in th e d atabase. Combine d r esult The combined test statistic d eﬁned by My ers et al. [30] is the pro d uct of the autosomal m ark er and Y-haplot yp e LR estimates, divided b y the database size ( N ): X = d LR A · d LR Y N . (4) X is calculate d for eac h of the th ree p opulation samples describ ed ab ov e. In this study w e consider a database of s ize N = 1 , 824 , 085, the size of the California state database as of Jan u ary 2012 [48]. An inv estigativ e p ositiv e identi ﬁcation (called simply a p ositiv e identiﬁcat ion here) is called when X is greater than 0.1 un der all th r ee assumed p opulation s amples, and greater than 1.0 for at least one p opu lation sample [30]. Results F alse p ositive r ates of r elative identiﬁc ation Unrelated individuals were simulate d b ased on allele f requency data from ﬁv e p opulation sam- ples to inv estigate f alse p ositiv e rates of parent -oﬀspring and sibling iden tiﬁcation. Autosomal and Y-c hromosome LRs w ere estimated using (1 )-(3), and the com b ined test statistic X deﬁned in (4) was calculated for u nrelated pairs of individu als simulated from all pairs of p opulation sam- ples. Using the pro cedure describ ed by My ers et al. [30], false p ositive rates were estimated for paren t-oﬀspr ing (T able 1) an d sib ling (T able 2) id en tiﬁcations. Ev en though false p ositiv e r ates are lo w , on the order of 1 × 10 − 5 to 1 × 10 − 9 , across p opulation sample pairs, there is some v ariation (T ables 1 and 2). In particular, the false p ositiv e rates f or unrelated pairs of individu als simulate d with Vietnamese American and with Nativ e American allele frequencies are r elativ ely high and lo w, resp ectiv ely (T ables 1 and 2). In sibling identiﬁcati on, th e Vietnamese American sample sho ws a comparativ ely high false p ositiv e rate of 1 . 1 × 10 − 5 , w hile no false p ositiv es are observe d in the Nativ e American sample (T able 2). This can b e exp lained b y the particular Y-haplot yp e p atterns considered for these p opu lation samples. F alse p ositiv e iden tiﬁcations were observ ed only wh en un r elated individuals coinciden tally share a Y-hap lotype. In the a v ailable Vietnamese American p opulation sample of Y-haplot yp es ( n = 103), several pairs 6 of ind ivid uals share Y-haplot yp es, while in the Nativ e American p opulation sample ( n = 105), no individuals sh are Y-haplotypes. In the other p opulation samples, Y-haplot yp es are shared at frequencies inte r mediate to those in the Vietnamese American and Nativ e American p opulation samples. Giv en the small sizes for these p opulation samples, it is not clear if v arying rates of coinciden tal Y-haplot yp e sharing are due to p opu lation genetic d iﬀerences, or sto c h asticit y of small samples. T o examine the v alidit y of th e total lac k of observed false p ositiv e relativ e identiﬁcat ions for unrelated individu als simulated fr om th e Nativ e Am er ican p opulation sample, we consid er th e p os- sibilit y of observ in g complete Y-haplot yp e div ersity (as observ ed) by chance. Using sim ulations, 100,00 0 subsamples of 105 (the Nativ e American sample size) Y-haplot yp es w ere randomly c ho- sen from th e larger African American, Eu rop ean American, and Latino American samples. Of the s ubsamples, 0.67, 0.57, 0.37 of the African American, Europ ean American, and Latino Amer- ican samples, resp ectiv ely , consisted of all u nique h aplot yp es, as observ ed in th e Nativ e American sample. This indicates the plausibility that a small samp le from a group with the in termediate degree of Y-haplot y p e diversit y observed in these larger p opu lation samples could all ha ve unique Y-haplot yp es b y c hance. Larger Y-haplot yp e samples are requ ired to conﬁdentl y estimate false p ositiv e rates b etw een u nrelated individuals across p opulation samples. F alse p ositives in the datab ase c ontext Our results agree with previous work, sh owing that with the pr escrib ed metho dology , false p ositiv e rates of parent -oﬀspr ing and sibling iden tiﬁcation are lo w, on the order of 1 × 10 − 5 to 1 × 10 − 9 (T ables 1 and 2) [30]. But ev en w ith these lo w f alse p ositiv e rates, diﬀerences were observ ed b et ween p op u lation samples, raising the question of h ow these d iﬀeren ces in false p ositiv e rates interact with distortions in DNA database r epresen tation. T o in ve stigate this question, California census and pr ison p opulation p r op ortions of Asian, African American, Europ ean American, Latino American, and Nativ e American individu als we re normalized to ﬁt the assump tion that all individuals are d escrib ed by exactly one of these cate- gories (T able S1 in File S1) [49, 50]. In combining census and p opulation genetic d ata, groups lab eled as ‘Vietnamese’ and ‘Asian’ were equated to eac h other. Clearly , these simpliﬁcations limit the ap p licabilit y of the p opulation sample-sp eciﬁcit y of this analysis, ho wev er it pr o vides a ﬁrst approac h. Using eac h of th e census and prison d emographics, the pr op ortion of false p ositiv e parent- oﬀspring and siblin g identiﬁcati ons that inv olv e at least one memb er of eac h p opulation group were estimated (T ables S2 and S3 in File S1). As exp ected, in the demographic conte xt of a p rison system in which African Americans are drastically ov er-represen ted (T able S1 in File S1, exact binomial test p < 2 . 2 × 10 − 16 ), the rates of false id en tiﬁcation of individ u als in this groups is m uch higher, roughly t wo orders of magnitude h igher (T ables S 2 and S3 in File S1). Nevertheless, the o ve r all rate of false identiﬁcat ion of unr elated individuals remains lo w. Spurious identiﬁc ation of distant r elatives The simulat ions of u nrelated individuals sho wed lo w false p ositiv e rates of parent -oﬀspring and sib ling identiﬁcati on. H o we v er, d istan t Y-sharing relativ es may b e more often mistak en for paren t-oﬀspr ings or siblings. T o in ve stigate this, ind ividuals with v arious Y-sharing relationships (paren t-oﬀsprin g, siblings, half-siblings, cousin s , half-cousins, and second cousins) from p opulation sample bac kground s w er e simulat ed and used in th e s ame relativ e identi ﬁcation p ro cedure. Note that when considering Y-sh aring relativ es, the d LR Y calculatio n is greatly inﬂ uenced b y the database size, as opp osed to the Y-haplot yp e reference frequ ency . 7 The observed distributions of the test statistic X for second-degree and d istan t relativ es is shifted left of those for ﬁrst-degree r elativ es, but still has signiﬁcant mass greater than 1 (Fig- ure 1). So as relatedness decreases, the X more eﬀectiv ely distinguish es ﬁrst-degree from distan t Y-sharing relativ es. Concordant with a previous stud y [51], d istinguishabilit y is also higher with appropriately-sp eciﬁed allele frequencies in p opulation samples with higher p olymorph ism at th e mark ers considered. By considerin g these distr ib utions, it is clear that r egardless of the exact decision pro cedure, distant Y-sharing relativ es sho w elev ated X v alues. P ositiv e rates v ary across tr u e relationships, p opulation samp les, and tests of parent -oﬀspring v ersu s sibling relationships (Figure 2, T ables 3 and 4 ). The p o we r of the p arent-oﬀspring test v aries from 0.94 to 0.99 and the sibling test v aries from 0.68 to 0.85 for v arious p op u lation samples. Of course, a d iﬀeren t threshold pro cedure could raise the p o wer of these tests, but w ill simulta neously raise the false p ositiv e rates. Regardless of th e particular thr eshold pro cedure, the relativ e trends observ ed across tru e relationships, p op u lation samples, and tests of parent- oﬀspring v ersus sibling relationships will h old for LR-based metho ds. When implemen ting the f u ll My ers et al. pro cedure to call p utativ e relativ es, Y-sharing relativ es are fr equen tly mistak enly iden tiﬁed as parent -oﬀspr ings or siblings (T able 4). Second degree Y- sharing relativ es lik e half-siblings are called as siblings in 5 ∼ 24% of simulations (T able 4). The frequency of relativ e identiﬁcat ion decreases w ith the degree of relatedness (or equiv alently , those with higher kinship co eﬃcien ts), but ev en Y-sharing h alf-cousins are called as siblings in 1 ∼ 10% of sim ulations, dep ending on the p opulation sample (T able 4). P ositiv e id en tiﬁcation b et we en distan t Y-sh aring relativ es o ccurs more often when considerin g sibling relationships rather than parent- oﬀspring b ecause of th e less strin gent allele sharing require- men ts. F or examp le, Y-sharing half-siblings are called as siblin gs in 5 ∼ 24% of sim u lations and called as parent-oﬀspring in 4 ∼ 10% of simulations (T ables 3 and 4). S haring at least one allele at eac h lo cus, as required for p aren t-oﬀspring relationships, is less likel y by c hance than sharing on a ve r age one allele at eac h lo cus, as exp ected for sibling relationships. Higher r ates of p ositiv e identi ﬁcation are observed for ind ivid uals simula ted w ith Nativ e Amer- ican or Vietnamese American allele frequencies (Figure 2, T ab les 3 and 4). Th is is like ly due to allele frequency m iss p eciﬁcation inherent in the metho d , wh ic h calculates the test statistic X under African American, Eu rop ean American, and Latino American allele frequencies only , and due to v arying p opu lation sample gene dive rsit y , as found in a stud y of au tosomal lo ci [51]. F or relativ es sim ulated from African American, Eu r op ean American, or Latino American p opulation samples, the metho d correctly sp eciﬁes th eir allele frequencies, so they show comparative ly low er iden tiﬁcation r ates (Figures 2 and 1, T ables 3 and 4). T o sho w that these diﬀerences in identiﬁca tion rates across p opulation samples are not d riv en b y diﬀering sample sizes, the same rates were estimated with a reduced Y-haplot yp e reference of 103 haplot yp es p er p opu lation sample. Again, w e see the same trends across p opulation samples, conﬁrming that they are n ot caused b y v arying reference Y-haplot yp e sample sizes (T ables S4 and S5 in File S1). No te that the absolute false identiﬁca tion r ates diﬀer in the full and subsample analysis b ecause the estimated Y-haplot yp e frequency a fu nction of the p o oled samp le size. Discussion W e ha ve in vestig ated by compu ter sim u lation th e consequences of using a familial searc h p olicy similar to that describ ed by My ers et al. [30], whic h is the p olicy currently used by the state of California for condu cting familial searc hes. Our simulatio ns assumed that allele frequ encies at the 13 CODIS lo ci and the Y haplot yp es for ﬁv e ethnic groups are as giv en in Budo wle et al. and the ABI reference database [33, 43]. W e reac h th ree main conclusions. F ir st, if the proﬁle of a 8 ﬁrst-degree relativ e of a r andomly generated proﬁle is in the d atabase searched, there is a relativ ely high probabilit y of iden tifying the relativ e as s u c h. T h u s w e agree with My ers et al. [30], Bieb er et al. [1], and Curran and Buc kleton [3] that familial searc hing can b e an eﬀectiv e wa y to iden tify ﬁrst-degree Y-sharin g r elativ es of an individu al w ho left a crime scene sample. Ho w ever, note that the simulati on study of Bieb er et al. [1] s u ggests higher ident iﬁ cation eﬃciency th an observed in an empirical stud y b y Curr an and Bucklet on [3], p ossibly due to p opulation stru cture in the emp irical dataset [51 ]. Slo oten and Meester [52] ha ve also shown that there m ay b e h igh v ariabilit y in p o wer to id entify relativ es when considering p roﬁles of v arying rareness in sp eciﬁc databases. Second, we f ound that the p robabilit y of identifying an unr elated Y-c hromosome-carrying indi- vidual as a ﬁrst-degree relativ e is quite low, agreeing with the results of Mye rs et al. [30]. Ho wev er, our abilit y to obtain precise estimates of this probability for d iﬀeren t ethn ic groups is limited by the r elativ ely sm all samples sizes av ailable to estimate Y haplotype frequencies, esp ecially for the Vietnamese Am er ican and Nativ e American samples. F or p opulation samp les other than those, the probabilities are so lo w that we could reasonably exp ect at most one unrelated in dividual w ould b e incorrectly identiﬁed as a ﬁrst-degree relativ e even in a database as large as California’s, wh ich is approac hing 2 million pr oﬁ les. T he h igh f alse p ositiv e rate in the Vietnamese American p opu lation sample is sub ject to samplin g err or w ith the relativ ely lo w n umb er of Y-haplot y p es for this group (103 h aplot yp es), so w e hesitate to pu t great conﬁdence b ehind th at particular rate. Our third conclusion is that there is a pr eviously unr ecognized r isk from conducting familial searc hes created by the p ossibilit y that a more distan t relativ e whose proﬁle is in a database will b e incorr ectly identiﬁed as a ﬁrst-degree r elativ e of the p erson who left th e crime-scene sample. With the data considered here (13 autosomal loci and 17-locus Y-haplotypes), ev en with other decision pro cedures, d istin gu ish abilit y of ﬁrs t-degree and distant genetic relativ es may b e limited (Figure 1). This is esp ecially tr ou b ling when con templating the p ossibility that familial searc hes ma y b e conducted in th e national database, whic h contai n s o ver ten million proﬁles. Widen ing the geographic scop e of a s earch is lik ely to result in more of the source’s d istan t r elativ es ha ving a presence in the database. T o b e clear, our concerns arise only with resp ect to in adv ertent erron eous identiﬁcat ion of distan t relativ es as ﬁrst-degree leads. F amilial searc hes are ineﬀectiv e if secondary relativ es are in tentio n ally sought. Ind eed, the Myers proto col targets ﬁrst-degree relationships only b ecause activ ely seeking more r emote connections ordin arily returns to o many leads to in ve stigate. Y et familial searc h es also cann ot b e conﬁgur ed to assure that only ﬁrst-degree r elativ es of the crime scene s ample source are identiﬁed as leads. As our and other researc h h as shown, the tailored approac h of the Myers et al. proto col has the adv an tage of return ing few spur ious leads – if a lead is generated, it is almost certainly a relativ e of the crime s cene sample source. Ou r ﬁndin gs, ho wev er, suggest th at th e closeness of the lead to the sour ce is an op en question. Signiﬁcan tly , our researc h do es not revea l the p ercen tage of cases in which a lead returned will b e a distant r elativ e, as opp osed to a ﬁrst d egree relativ e. Such an estimation requires a diﬀeren t set of simula tions including complex d emographic estimates. In our sim u lations, w e set the coancestry co eﬃcien t θ = . 01, w h ic h aligns w ith the less conser- v ative parameter v alue suggested for direct id en tiﬁcation [44]. The curr en tly imp lemented familial searc hing metho dology in California assum es θ = 0 . 0. This discrepancy cont r ibutes to elev ated rates of p ositiv e identiﬁcat ions observed b etw een b oth unrelated individuals and distan t Y-sharin g relativ es. In add ition, ou r simula tion parameter v alue θ = . 01 may b e an underestimate for some p opulation samples [44]. F or th ese cases, we ha ve underestimated the amoun t of coinciden tal re- latedness, and thus, estimated p ow er and false p ositive r ate. T h is is particularly relev ant for some p opulation s amples with higher θ including s ome Nativ e American group s. In our analysis, we estimate the Y-haplotype frequency u pp er 95% conﬁdence limit asymptoti- 9 cally , rather th an exactly , as ind icated in the Mye r s et al. metho d. This estimate ma y b e suﬃcient, but has greater err or than the exact conﬁdence limit. F or very lo w Y-haplot yp e frequencies, the asymptotic estimate may b e lo we r than the tru e conﬁdence limit, w hic h w ould lead to inﬂ ated (an ti- conserv ativ e) LR Y . A study of the aﬀect of diﬀerent conﬁdence limit estimates on ﬁ n al outcomes w ould inform metho d c h oice. In this study , we hav e considered only complete genot yp es with no err ors or allelic drop out. It is not clear h o w allelic drop out wo u ld aﬀect familial searc hing results, but this must b e explored b efore consid ering exten tion to lo w-template s amp les. The p robabilities we estimated with our sim u lations are n ecessarily approximate . Autosomal allele and Y-haplot yp e frequencies for v arious p opu lation samples are p o orly kno w n b ecause pub licly a v ailable databases are of limited size and are unav ailable f or many p opu lation groups. Neverthele ss, the grou p s for w hic h we hav e d ata in clude Afr ican Americans, who hav e relativ ely high genetic div ersity at the considered lo ci, and Nativ e Americans, wh o hav e r elativ ely lo w d iv ersit y , whic h suggests th at our results are applicable to other p opulations for which data are u na v ailable. A diﬀerence b et ween our analysis and the imp lemen ted My ers et al. metho d is the one or tw o- stage design. In the Mye rs et al. metho d, ﬁrst an analysis is p erformed usin g only autosomal data and the top 168 matc hes are genot yp ed f or Y-haplot yp e and the cum u lativ e statistic X is computed only for these samples [30]. In our analysis w e simp ly compu ted the cum u lativ e X f or all samp les considered. An additional study of p ositiv e identiﬁcatio n of d istan t relativ es using the t wo-sta ge metho d in the con text of a realistic database w ould pr o vide more r ealistic rate estimates, h ow ev er this sort of analysis is hindered by lac k of access to forensic databases [53]. Su ch a study is u nlik ely to sho w substantial ly diﬀerent results than those presented h ere since the pairs of in dividuals w e p ositiv ely identify as ﬁrst d egree relativ es are lik ely to app ear r elated and rank ab o ve the 168 p erson threshold. W e also note that in this analysis we only consider the familial searc h ing metho d of Myers et al. . T o our k n o wledge, at the writing of this manuscript, the My ers et al. metho d is the only explicit proto col a v ailable and the curr ent standard in the ﬁeld [54, 55 ]. Although the absolute rates of iden tiﬁcation w ill change according to th e metho d used, when considering LR-based approac hes, whic h ha ve b een sho wn to b e more eﬀectiv e than allele-sharing metho ds [56], the trends we observ ed across p opulation samp les and close and distan t r elativ es will hold. Implic ations of spurious identiﬁc ation of distant r elatives Our ﬁn dings conﬁ r m that familial s earches carried out according to the Myers p rotocol do a go o d job of lo cating a relativ e if one is in the database. T hey also aﬃrm that a searc h is unlik ely to retur n a f alse lead – in other w ord s, a matc h that app eared to b e r elated to th e crime scene source, bu t in fact was n ot. Ho w ever, w e hav e sh o wn th at if there is a m ore distant relativ e in the database, that p erson ma y h a v e u p to a 42% c hance of b eing r etur ned as a lead and erroneously lab eled as a ﬁrst degree relativ e of the crime scene source (T able 4). The p ossibility that the lead is a more remote relativ e of th e source might not b e a concern if in vestig ators could easily ascertain wh at kind of lead they had b een giv en. But the Myers proto col can do n o more than alert inv estigators that the sour ce ma y b e a relativ e of the ind ividual in the database; it d o es not tell in vestig ators whic h relativ e or the closeness or kind of r elatio n . I n an y case, once a searc h returns a lead, law enf orcemen t must und ertak e further inv estigation to lo cate the actual s ou r ce. It is th e scop e and impact of th e follo w-up inv estigati on that, in light of our results, m ay b e troubling. Before our r esearc h iden tiﬁed the p ossibilit y that a familial searc h might ident ify distant relativ es and err oneously lab el them as ﬁ rst degree relativ es of the source, it ma y b e that la w enforcemen t simply assumed that all leads were to a ﬁ r st d egree relativ e, b ecause that is what the searc h is 10 structured to ﬁ nd. Accordingly , if fu rther inv estigation did not iden tify a sour ce from among the lead’s ﬁrst d egree relativ es, then oﬃcers lik ely assu med that the problem w as the lead, rather than the d epth of their inv estigation. In ligh t of our results, ho wev er, la w enforcemen t may no w recognize that a lead that fails to reve al a source among ﬁrst degree relativ es m a y still b e a go o d lead, it is only that the inv estigati on m ust extend to more remote b ranc hes of the f amily tree. T o illustrate, supp ose that la w enforcement conducts a familial search to ﬁnd a burglar. F ol- lo wing the Mye r s pr otocol, the search returns a lead to the p roﬁle of K, a kno wn oﬀender in the database. Con ven tional wisdom holds that the burglar is lik ely a br other or the father of K, and so la w enforcement oﬃcers initiate their inv estigation accordingly . T hey ascertain the id entify of K’s father and any brothers , and c h ec k their ages and criminal records. T hey determine wh ether the father or brothers w ere in the area of the b urglary at the time it o ccurr ed, used cell phones or credit cards around that area, or otherwise engaged in susp icious b eha v ior. Ultimate ly , they migh t surreptitiously attempt to obtain DNA samples for testing from members of K’s immediate family – say by p osing as restauran t p ersonnel or collecting up a half-eaten lu nc h. In some num b er of cases, one of those immediate family mem b ers will matc h , and the bu rglar will b e f ou n d. But if no matc h is made, then inv estigato r s a ware of our researc h may conclude three things: that the f amilial searc h wa s almost certainly eﬀectiv e, that the probabilit y that the lead was a bad lead is low, and that leads that d o not initially p an out are like ly to ha ve faltered only b ecause the source is a m ore distant relativ e than in v estigators presu med. In other w ord s, th e source is not a brother or father, but instead is a cousin, second cousin, uncle, half-sibling, or ev en half-cousin. A t that p oint the oﬃcers ha ve t wo choic es. Th ey ma y limit themselv es to the f ollo w -up they ha ve already condu cted with the ﬁ rst degree relativ es and simply stop their inv estigation or, more lik ely , they ma y s im p ly widen the s cop e of their inv estigation, and start pursu ing all second-degree relativ es of the lead. Our r esearc h thus suggests t wo unant icipated like ly outcomes of familial searc h p olicies. Firs t, in vestig ations ma y wr on gly target th e immediate families of kno wn oﬀenders, b ecause oﬃcers mis- tak enly b eliev e that their lead is a ﬁr s t-degree relativ e. Second, inv estigatio n s ma y ultimately pr ob e far more d eeply than initially imagined, b ecause once oﬃcers are convinced th at th e source cannot b e found among ﬁr s t d egree relativ es, they will widen their net of in ve stigation to includ e more distan t relations. Both of these consequences exacerbate the numerous ethical problems presen ted b y familial searc h ing. First, familial searc hes will aﬀect a greater num b er of p ersons. There is n o wa y for inv estigato r s to know f rom the start that a lead is a d istan t, rather th an immediate, relativ e of the source. Th u s suspicion m a y n o longer b e restricted to a father and small n u m b er of sib lin gs – one of whom is lik ely to b e the crim e scene samp le source – but instead will fall up on inn o cen t immediate family mem b ers and a muc h larger num b er of second-degree relativ es. Th e greater the num b er of p ersons in volv ed, and the less likely that one of them is in fact the p erp etrator, the more su c h inv estigatio ns ma y b egin to feel lik e a ﬁshing exp edition rather than a r easonable s earc h. T his is particularly true giv en that any inv estigated family mem b er is, by d esign, a mem b er of the family whose DNA is not already in the database as a r esu lt of wrongdoing. Second, follo w-u p inv estigatio n s ma y pro ve m ore in tru siv e and yet less eﬀectiv e. Iden tiﬁcation of more d istan t r elativ es requir es more complicated in vesti gation th an d o es determining a lead’s immediate family mem b ers. F or ins tance, the known oﬀender will lik ely hav e pro v id ed information ab out immediate relativ es in the course of the criminal case that is readily av ailable, suc h as in a bail rep ort, corrections dossier, or probation ﬁ le. But suc h sour ces are muc h less lik ely to conta in information ab out secondary relativ es, and th u s simply comp osing the list of p oten tial su sp ects could requir e more aggressiv e inv estigation. Moreo ver, the diﬃculty in accurately mapping more distan t familial relations might lo we r the already lo w su ccess rate of familial searches. Although 11 a lead ma y in fact b e a relativ e, it ma y simply b e to o diﬃcult to lo cate th e actual s ou r ce if that p erson is a half-cousin or other distan t relation. Third, widenin g the p o ol intensiﬁes th e threat that familial searc hing p oses to ou r u nderstand- ings of families as constru ctions of so cial, not biological , realities. A p erson ma y hav e hundreds of “cousins” b ut only a handfu l of biologica l cousin s. Inv estigato r s ma y either ignore the diﬀerence and unn ecessarily in vestig ate th ose non-biological relatio ns, or else engage in p oten tially intrusiv e questioning or activit y (such as DNA sampling) to diﬀerentia te b et we en proﬀered and actual rela- tions. Probing secondary b iological relationships m igh t also dredge up painful family exp eriences of d eath, unkn own b iological ties, or pr evious partners. And , to the exten t th at some advocates of familial searc h ing hav e justiﬁed the practice on grounds akin to “crime run s in families,” such argumen ts m ay b e less d efensible wh en more remote connections are in volv ed. Finally , to the exten t that our ﬁndings suggest that familial searc hes ma y in fact necessitate in vestig ating a greater n um b er of p eople with a greater degree of intrusiv eness, that consequence is particularly troublin g in that it will b e sp ecially visited on certain racial groups. It has b een w ell do cument ed that familial searc hing is apt to disprop ortionately aﬀect African American families, due to the greater representa tion of those groups in DNA databases and the h igh rate of in tra- racial pro creation. Limiting inv estigations to the imm ed iate family mem b ers of known oﬀenders at least minimizes the in tru sion on inno cen t r elativ es within those racial grou p s. But if more distan t relations are in cluded, the w eb of p oten tial “genetic susp ects” b ecomes still br oader, and ma y eﬀectiv ely encompass entire comm unities. It tak es only one mem b er of a large and v aried family tree to rend er eve r y father, brother, half-brother, cousin, half-cousin, uncle, nephew and so on vu lnerable to scru tin y and surreptitious samplin g by la w enf orcemen t oﬃcers. Of course, it is alwa ys p ossible to limit, for p r actica l or ethical reasons, the range of p ermissible follo w-up in vesti gation to ﬁrst degree r elativ es in familial searc h cases as a matter of p olicy . Such an approac h migh t b e sensible f r om a practical p ersp ectiv e in light of the diﬃculty in iden tifying and inv estigati ng more remote relativ es, and the heighte n ed ethical concerns. It would also ensu re that an y sp urious leads – of wh ic h, gran ted, there are exp ected to b e few – would n ot ﬁrst generate highly in v asiv e and costly inv estigations. Whatev er the case, our r esearc h suggests that as states and lo calities deb ate the virtues of familial searching and craft p olicies to go vern law enforcemen t, it wo u ld b e w ise to consid er terms delimiting th e scop e of p oten tial follo w -up inv estigatio n with regard to degree of relatedness. Ac kno w ledgmen ts W e are immensely grateful to the individ uals w hose DNA samples were used in this study , without wh ic h none of this work would b e p ossible. W e thank Kirk Lohmueller for his v aluable discussions on th ese topics. Supp orting Information Legends File S1, T ables S1-5 12 References [1] F. Bieb er, C. Brenn er, D. Lazer, Finding criminals through DNA of their relativ es, Science 312 (2006 ) 1315– 1316. [2] H. Greely , D. Riordan, N. Garrison, J. Mounta in , F amily ties: The u se of DNA oﬀender databases to catc h oﬀenders’ kin , J ournal of La w, Medicine, and Ethics 34 (2006) 248–2 62. [3] J. Curran, J. Buc kleton, Eﬀectiv eness of familial searc hes, Science and Justice 48 (2008) 164– 167. [4] R. Williams, P . Johnson, Inclusiv eness, eﬀectiv eness and intrusiv eness: Issues in the dev eloping uses of dn a proﬁling in su pp ort of crimin al inv estigations, The J ournal of La w , Medicine, and Ethics 33 (2005) 545–55 8. [5] E. Murphy , Relativ e d oubt: Familial searc hes of DNA databases, Mic higan Law Review 109 (2011 ) 291–349. [6] C. Gersha w, A. S c hw eighardt, L . Rourke, M. W allace, F orensic utilizatio n of familial searc hes in DNA databases, F orensic Science International: Genetics 5 (2011) 16–20. [7] K. O ’Connor, E. Butts, C. Hill, J. Butler, P . V allone, Ev aluating the eﬀect of additional forens ic lo ci on lik eliho o d ratio v alues for complex kinship analysis, in : 21st In ternational Symp osium on Human Identi ﬁcation, F amilial Searc h W orksh op , 2010, citing Ch ris Maguire, formerly of the F orensic Science Service. [8] S. Rushton, F amilial searching and predictive DNA testing f or forensic purp oses: A review of laws and practi c e s july 2010. URL htt p://dnapr oject.co.z a/new_dna/wp- content/uploads/2011/03/Report- Familial- S earching- a n d - [9] Netherlands Ministry of Securit y and Justice, Senate agrees to DNA relationship test, 22 No v 2011. URL htt p://www.g overnment. nl/documents- and- p ublications/press- releases/2011/11/23/senate- a g r e [10] Netherlands Ministry of S ecurit y and J ustice, Ove rview of acts th at will ente r int o eﬀect on 1 april 2012, 3 Ap ril 2012. URL htt p://www.g overnment. nl/documents- and- p ublications/press- releases/2012/04/02/overview - [11] Statemen ts of Lisa C ampb ell & Constable Derek Egan, b efore S tand ing Committee on Public Safet y and Na t i o n a l (Discussing inabilit y to make familial matc hes, p ressure to do so, and p riv acy-related con- cerns against) Canadian in ve stigators did solve one case b y ev aluating t w o samples oﬀered v oluntarily and determining that a relativ e w as the like ly p erp etrator. Th ere h a v e b een some recen t eﬀorts to authorize familial s earches in Canad a. URL htt p://www2. parl.gc.ca /HousePublications/Publication.aspx?DocId=3702024&Language=E&Mod [12] S. & Marp er v. United Kin gdom, Eur. Ct. H.R. 1581 (2008). URL ht tp://www.b ailii.org/ eu/cases/ECHR/2008/1581.html [13] N. Ram, F ortuit y and forensic familial identiﬁcatio n , S tanford La w Review 63 (2011) 751. [14] In terim plan for release of information in the even t of a ‘p artial match’ at n dis (July 20, 2006). [15] H.R. 3361, Utilizing DNA Technolog y to S olv e Cold Cases Act of 2011. 13 [16] S.Ct., 2013 WL 2371466, No. 12-207 (June 3, 2013) . [17] Md. Co de Ann., P ub. S afet y s 2-506 (d ) (West 2010). [18] D.C. Co de s 22-41 51 (2010). [19] Council for Resp ons ib le Genetics, State rules on partial/familial searc h ing, although the map do es not s ho w New Y ork in the inadverte n t matc h category , the state h as recen tly authorized only that form of rep orting. URL ht tp://www.c ouncilforr esponsiblegenetics.org/dnadata/usa/usa2.html [20] California Departmen t of Justice, Division of La w Enforcement, In formation bulletin no. 2008- BFS-01, DNA partial matc h (crime scene DNA proﬁle to oﬀender) Policy (2008). [21] J. Rosen, Genetic surveil lance for all, Slate. [22] E. Nak ashima, S. Hsu, U.S . to expand collection of crime su sp ects’ DNA: Policy adds p eople arrested b ut not con victed, W ashington Post. [23] J. Butler, Adv anced T opics in F orensic DNA Typing: Metho dology , Academic P r ess, 2012. [24] K. Konzak, F amilial searc hing: A practical & eﬀectiv e approac h : Californias exp erience, in: 22nd International S ymp osium on Human Iden tiﬁcation, F amilial Searc h W orkshop, 2011. [25] T. Hic ks , F. T aroni, J. Curran, J . Buckle ton, V. C astella, O. Ribaux, Use of DNA pr oﬁles for in vesti gation using a sim ulated national DNA database: Part I I. Statistical and ethical considerations on familial searc hin g, F orensic S cience International: Genetics 4 (2010 ) 316– 322. [26] N. Garrison, R. Rohlfs, S. F u llerton, F orensic familial searc hing: S cien tiﬁc and so cial im p lica- tions, Nature Reviews Genetics 14, in press. [27] T. Reid, M. Baird, J. Reid, S. L ee, R. Lee, Use of siblin g pairs to determine the familial searc hing eﬃciency of forensic databases, F orensic Science I nternational: Genetics 2 (200 8) 340–3 42. [28] D. Grimm, The d emographics of genetic sur v eillance: Familial DNA testing and the Hispan ic comm un ity , Columbia La w Review 107 (2007) 1164–1 194. [29] 60 Min u tes, A n ot so p erfect m atc h (Marc h 2007). URL ht tp://www.c bsnews.com /stories/2007/03/23/60minutes/main2600721.shtml [30] S. My ers, M. Timke n , M. Piucci, G. Sims, M. Greenw ald, J. W eigand, K. Konzak, M. Buoncris- tiani, Searc h in g for ﬁrs t-degree familial relationships in California’s oﬀend er DNA database: V alidation of a likeli ho o d ratio-based approac h, F orensic Science International: Genetics 5 (2011 ) 493–500. [31] E. Stein b erger, G. Sims, Finding criminals through the DNA of their relativ es – Familial searc hing of the California oﬀender DNA database, Prosecutor’s Brief 31 28. [32] H. W est, W. Sab ol, Prisoners in 2007, T ec h . Rep . 3, Bureau of Justice S tatistics (2008). [33] B. Budowle, T. R. Moretti, Examp les of STR p opulation databases for CODIS and casewo r k, 9th International Sym p osium on Human Identi ﬁcation 1 (1998) 64–73. 14 [34] R. Dalton, T r ib e b lasts ‘exploitation’ of blo o d samples, Nature 420 (2002) 111. [35] D. Wiw char, Nuu-c h ah-n ulth blo o d returns to w est coast, Ha-Shilth-Sa. [36] M. Mello, L. W olf, T he Ha v asupai I n dian trib e case – lessons for researc h in volvi ng stored biologic samp les, Th e New England J ournal of Medicine 363 (2010 ) 204–207 . [37] Asociaci´ on ANDES, Genographic pro ject hun ts the last of the Incas, ANDES C omm un iqu´ e. [38] L. Arb our, D. Co ok, DNA on loan: Issues to consider wh en carrying out genetic researc h wtih Ab original families and communities, Communit y Genetics 9 (2006) 153–160 . [39] S. Goering, S. Holland, K. F ry er-Ed w ards , T ransforming genetic researc h p ractices with marginalized communities: A case for resp onsive ju stice, Hastings Center Rep ort 38 (2008) 43–53 . [40] J. And erson, Commen tary on implications of the Genographic Pro ject, In ternational Journ al of Cu ltural Prop erty 16 (2009) 213–217. [41] J. Ka ye , C. Heeney , N. Ha wkin s, J. d e V ries, P . Bo ddington, Data sharing in genomics – re-shaping s cien tiﬁc pr actice , Nature Reviews Genetics 10 (2009) 331–3 35. [42] R. McIness, 2010 presiden tial add ress: Culture: The silent language geneticists must learn – genetic researc h with In digenous p opulations, American Journ al of Hu man Genetics 88 (2011) 254–2 61. [43] Applied Biosystems, YFiler haplot yp e database. URL ht tp://www6. appliedbio systems.com/yfilerdatabase/ [44] Nati onal Researc h Cou n cil: Committee on DNA forens ic science, The ev aluation of forensic DNA evidence, National Academy Press, 1996. [45] J. Buc kleton, C . T riggs, F orensic DNA Evidence Int erpretation, CRC Press, 2005, C h. Relat- edness. [46] B. W eir, A. Anderson, A. Help er, Genetic relatedness analysis: mo dern data and n ew c hal- lenges, Nature Reviews Genetics 7 (2006) 771–7 80. [47] Scien tiﬁc W orking Group on DNA Analysis Metho ds (SWGD AM), Y-c hr omosome short tan- dem rep eat (Y-STR) inte r pretation guid elines, F orens ic Science Comm u nications 11 (2009) 1–5. [48] FBI, CODIS-NDIS statistics (Jan u ary January 2012). URL ht tp://www.f bi.gov/abo ut- us/lab/codis/ndis- s tatistics [49] California Departmen t of Corr ections and Rehabilitation, T otal institution p opu lation, oﬀend- ers by ethnicit y and gend er, Prison Censu s Data. [50] United States Censu s Bureau, S tate and coun t y Quic kFacts. URL ht tp://quick facts.cens us.gov/qfd/states/06000.html [51] R. R oh lf s , S. F ullerton, B. W eir, F amilial identiﬁcat ion: Population structure and relationship distinguishabilit y , PLoS Genetics 8 (2012) e1002469. 15 [52] K. Slo oten, R. Meester, F orensic iden tiﬁcation: Database lik eliho o d ratios and familial DNA searc hing, arXiv 1201 (2012) 4261v3 . [53] D. Krane, V. Bahn, D. Balding, B. Barlo w, H. Cash, B. Desp ortes, P . D’Eustac h io, K. Devlin, T. Do om, I. Dror, S. F ord, C. F unk, J. Gilder, G. Hampikian, K. In man, A. J amieson, P . Kent, R. Kopp l, I. K orn ﬁeld, S. K rimsky , J. Mno okin, L. Mueller, E. Murphy , D. Pa oletti, D. Pet r o v, M. Ra ymer, D. Risinger, A. Roth, N. Rudin, W. Shields, J. Siegel, M. Slatkin, Y. Song, T. Sp eed, C. S p iegelman, P . Su lliv an, A. Swienton, T. T arp ey , W. Thompson, E. Un gv arsky , S. Z ab ell, Time for DNA d isclosure, S cience 326 (2009) 1631–1632 . [54] K. O’Connor, Introdu ction to f amilial searching (2011). URL ww w.cstl.nis t.gov/strb ase/pub_pres/OConnor_Promega2011.pdf [55] Global Priv acy and Inform ation Qu alit y W orking Group (QPIQWG) , An introd uction to familial DNA searc hing for state, lo cal, and tribal ju stice agencies (2011). URL ht tp://www.i t.ojp.gov/ docdownloader.aspx?ddid=1698 [56] D. Balding, M. Kra wcza k, J. Buc kleton, J. Curran, Decisio n -making in familial database searc hing: KI alone or n ot alone?, F orens ic Science International: Genetics 7 (2012) 52–54 . 16 Figure Legends Figure 1: Di stributions of the test statistic X , de ﬁned in (4) , for sibling te s t for indiv iduals who are siblings (solid red), parent- oﬀ springs (solid black), half-sibs (dashe d bl ac k ), cousins (dashdot blac k ), and second cousins (dotted black). The p opulation sample indiv iduals are sam pled from is along the top and the assumed p op sample i s along the side. Figure 2: Positive ide n ti ﬁcation rates across diﬀerent true relationships of indi viduals si mu lated from diﬀerent sample p opulations Vietnamese American (red circles), African American (orange trian- gles), Europ ean A merican (purple pluses), Latino American (blue exe s), and Native American (green diamonds); le ft plot is for sibli ng test, right for parent test 17 T ables T able 1: F alse p ositive parent- oﬀ spring identiﬁcation rates b etw een pai rs of unrelated i ndividuals si m - ulated from all pairs of p opulation sam pl es. Vietnamese African Europ ean Latino N ativ e American American American American Amer ican Vietnamese American 8 . 2 × 10 − 7 5 . 0 × 10 − 9 1 . 5 × 10 − 8 < 5 . 0 × 10 − 9 < 5 . 0 × 10 − 9 African American 6 . 5 × 10 − 8 < 5 . 0 × 10 − 9 < 5 . 0 × 10 − 9 < 5 . 0 × 10 − 9 Europ ean American 1 . 5 × 10 − 7 1 . 5 × 10 − 8 1 . 0 × 10 − 8 Latino American 1 . 3 × 10 − 7 5 . 0 × 10 − 9 Nativ e American < 5 . 0 × 10 − 9 T able 2: F alse p ositive sibli ng identiﬁcation rates b et ween pai rs of unrelated indivi duals simulated from all pairs of p opulation sam ples. Vietnamese African Europ ean Latino N ativ e American American American American Amer ican Vietnamese American 1 . 1 × 10 − 5 < 5 . 0 × 10 − 9 1 . 0 × 10 − 8 5 . 0 × 10 − 9 < 5 . 0 × 10 − 9 African American 1 . 7 × 10 − 7 < 5 . 0 × 10 − 9 5 . 0 × 10 − 9 1 . 0 × 10 − 8 Europ ean American 1 . 7 × 10 − 7 1 . 5 × 10 − 8 4 . 0 × 10 − 8 Latino American 4 . 3 × 10 − 7 2 . 0 × 10 − 8 Nativ e American < 5 . 0 × 10 − 9 T able 3: P arent-oﬀspring test i den tiﬁcation rates for di ﬀ erent Y-sharing rel atives and p opulation sam- ples Vietnamese African Europ ean Latino Nativ e American American American American American paren t-oﬀspr ing 0.9974 58 0.9893 36 0.987365 0.9881 30 0.9988 09 sibling 0.2636 59 0.2440 48 0.255853 0.2484 39 0.3483 73 half-sib 0.0561 35 0.0457 46 0.050528 0.0470 72 0.1050 56 cousin 0.0093 11 0.0064 51 0.007765 0.0070 39 0.0271 39 half-cousin 0.0033 37 0.0020 19 0.002506 0.0020 95 0.0127 16 second cousin 0.0019 71 0.0009 97 0.001423 0.0011 45 0.0085 80 T able 4: Sibling test identiﬁcation rates for diﬀerent Y-sharing relatives and p opul ation sample s Vietnamese African Europ ean Latino Nativ e American American American American American sibling 0.89156 6 0.819365 0.7930 25 0.798273 0.9 25786 paren t-oﬀspr ing 0.9070 37 0.8133 99 0.767383 0.7773 60 0.9429 95 half-sib 0.3038 88 0.1635 25 0.138446 0.1401 61 0.4235 58 cousin 0.0995 29 0.0334 60 0.028376 0.0279 85 0.1816 87 half-cousin 0.0444 57 0.0104 45 0.009258 0.0089 78 0.1006 43 second cousin 0.0271 39 0.0049 34 0.004582 0.0043 32 0.0707 61 18 0.0 0.2 0.4 0.6 0.8 1.0 parent−offspring test true relationship positiv e identification rate second cousins half cousins cousins half− siblings siblings parent− offspring Vietnamese Am. African Am. European Am. Latino Am. Nativ e Am. 0.0 0.2 0.4 0.6 0.8 1.0 sibling test true relationship positiv e identification rate second cousins half− cousins cousins half− siblings parent− offspring siblin gs Vietnamese Am. African Am. European Am. Latino Am. Nativ e Am. 0.0 0.1 density Vietnamese Am. 0.0 0.1 density African Am. 0.0 0.1 density European Am. 0.0 0.1 density Latino Am. 0.0 0.1 density −10 0 10 20 log(LR) −10 0 10 20 log(LR) −10 0 10 20 log(LR) −10 0 10 20 log(LR) −10 0 10 20 log(LR) Native Am. assumed population sample Supp orting Information F ile S1 The inﬂuence of relativ es on the eﬃciency and error rat e of familial search ing Rori V. Rohlfs a , Erin Mur p h y b , Y un S. S ong c,d , Mont gomery Slatkin a a Department of Integrative Biolo gy , Univ ersity of C a lifornia, Berkeley , CA 9 4720, USA b School o f Law, New Y or k Universit y , New Y or k, NY 10012 , USA c Computer Science Divisio n, Universit y of Califor nia, Berkeley , CA 94720 , USA d Department of Statistics, Universit y of California, Berkeley , CA 94 720, USA Suppor ting T ables T able S1: Normalized California census an d p rison p opu lation frequencies. sim u lated p opulation Asian African American Europ ean American Latino American Nativ e American Census p opulation .133 .0633 .410 .384 .0102 Prison p opulation .00626 .303 .268 .413 .00940 Note: W e mak e use of these normativ e racial categorical constructs to estimate relev ant p opulation- sp eciﬁc identiﬁcati on rates. Ho wev er, it’s clear that members hip in these categories is not mutually exclusiv e. By n ature, assumptions must b e made in the collec tion and tabulation of this sort of data. F ur ther, as with all collect ed so cial data, some groups ma y b e u nder or o v er-repr esen ted in data collection, and categorica l resu lts are sub ject to b iases of rep orting metho d (self-iden tiﬁed, inferred, etc.) (Spade and Rohlfs, in r eview ). T able S 2: Pro jected p ercen t of false parent-oﬀspring leads wh ic h in vo lv e at least one ind ividual from eac h p opu lation sample sim u lated p opulation total FPR Vietnamese American African American Europ ean American Latino American Nativ e American Census p opulation 5.73e-0 8 28.3 0.560 3 8.5 38.59 0.114 Prison p opulation 3.84e- 08 0.182 1 6.3 25.3 62. 9 0.122 T able S3: Estimated p ercent of false sibling identiﬁcat ions wh ic h in vo lv e at least on e individ ual from eac h p opu lation sample sim u lated p opulation total FPR Vietnamese American African American Europ ean American Latino American Nativ e American Census p opulation 3.01e-0 7 67.9 0.266 1 0.6 22.3 0.0849 Prison p opulation 1.06e- 07 0.459 1 5.2 13.5 73. 1 0.199 T able S4: P arent-o ﬀsp ring test iden tiﬁcation rates for d iﬀeren t Y-sharing r elativ es and p opu lation samples wh ere the set of reference Y-haplot y p es wa s do wn-samp led to 103 haplot yp es p er p opu- lation sample. The similarit y of these r esults to those with the fu ll set of Y-haplot yp es indicates that the v arying reference sample sizes are not driving p opulation-based diﬀerences in iden tiﬁcation rates. Vietnamese American African American Europ ean American Latino American Nativ e American paren t-oﬀspr ing 0.97 1109 0.9174 81 0.900600 0.9047 50 0.9814 29 sibling 0.2605 68 0.238 175 0.24 7218 0.2401 25 0.346 432 half-sib 0.0520 82 0.037 330 0.03 9416 0.0374 30 0.099 234 cousin 0.0081 84 0.004 503 0.00 5204 0.0045 43 0.024 529 half-cousin 0.0028 59 0.001 183 0.00 1447 0.0012 75 0.010 976 second cousin 0.0015 70 0.000 522 0.00 0741 0.0006 16 0.007 245 T able S5: S ibling test identiﬁcati on rates for diﬀeren t Y-sharing relativ es and p opulation samples where the set of reference Y-h ap lotypes w as down-sampled to 103 haplot yp es p er p opulation sample. The similarit y of these results to those with the fu ll set of Y-haplot yp es in d icates that the v aryin g reference sample sizes are not driving p opulation-based diﬀerences in iden tiﬁcation rates. Vietnamese American African American Europ ean American Latino American Nativ e American sibling 0.7860 12 0.6773 11 0.636255 0.6451 79 0.8314 49 paren t-oﬀspr ing 0.7504 33 0.595 312 0.52 1906 0.5367 82 0.815 689 half-sib 0.1664 90 0.070 316 0.05 5062 0.0567 29 0.240 527 cousin 0.0437 01 0.010 769 0.00 8384 0.0084 58 0.080 298 half-cousin 0.0169 70 0.002 642 0.00 2165 0.0021 75 0.038 323 second cousin 0.0093 40 0.001 130 0.00 0979 0.0009 28 0.024 955

The influence of relatives on the efficiency and error rate of familial searching

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment