Bayesian Nonparametrics for Gene-Gene and Gene-Environment Interactions in Case-Control Studies: A Synthesis and Extension

Ba y esian Nonparametrics for Gene-Gene and Gene-En vironmen t In teractions in Case-Con trol Studies: A Syn thesis and Extension Durba Bhattac harya 1 and Sourabh Bhattac harya 2 1 Departmen t of Statistics, St. Xa vier’s College (Autonomous), Kolk ata 2 In terdisciplinary Statistical Research Unit, Indian Statistical Institute, Kolk ata Abstract Gene-gene and gene-en vironmen t in teractions are widely b eliev ed to pla y signiﬁcan t roles in explaining the v ariabilit y of complex traits. While substantial research exists in this area, a comprehensiv e statistical framew ork that addresses multiple sources of uncertain ty sim ulta- neously remains lacking. In this article, w e syn thesize and propose extension of a nov el class of Ba y esian nonparametric approaches that accoun t for interactions among genes, lo ci, and en vironmen tal factors while accommo dating uncertaint y about p opulation substructure. Our con tribution is threefold: (1) W e provide a uniﬁed exp osition of hierarc hical Bay esian mod- els driv en b y Dirichlet processes for genetic interactions, clarifying their conceptual adv an tages o v er traditional regression approaches; (2) W e shed light on new computational strategies that com bine transformation-based MCMC with parallel pro cessing for scalable inference; and (3) W e present enhanced hypothesis testing pro cedures for identifying disease- predisp osing loci. Through applications to my o cardial infarction data, we demonstrate ho w these metho ds oﬀer biological insights not readily obtainable from standard approac hes. Our synthesis highlights the adv an tages of Bay esian nonparametric thinking in genetic epidemiology while pro viding practical guidance for implementation. Keyw ords: Diric hlet Pro cess; Disease Predisp osing Lo ci; Epistasis; Mixture Mo del; Parallel Com- puting; T ransformation based Mark ov Chain Monte Carlo. AMS Sub ject Classiﬁcations: 62K05, 62F15, 92D10 Con ten ts 1 In tro duction 3 1.1 The c hallenge of genetic interactions in complex diseases . . . . . . . . . . . . . . . . 3 1.2 Our con tribution: a Ba y esian nonparametric synthesis . . . . . . . . . . . . . . . . . 3 1.3 Article structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 A uniﬁed Bay esian nonparametric framew ork 4 2.1 Mo deling philosoph y: from regression to conditional genotype mo dels . . . . . . . . 4 2.2 Key adv an tages of our framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3 Notation and data structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1 3 Gene-gene in teraction mo del 5 3.1 Mo del sp eciﬁcation: a roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3.2 Detailed form ulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.3 Computational strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.4 Hyp othesis testing and DPL iden tiﬁcation . . . . . . . . . . . . . . . . . . . . . . . . 7 4 Gene-en vironment interaction mo del 8 4.1 Extending the framew ork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 4.2 Enhanced h yp othesis testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 5 Hierarc hical Dirichlet pro cess mo del 10 5.1 Motiv ation and limitations of previous models . . . . . . . . . . . . . . . . . . . . . . 10 5.2 HDP form ulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 5.3 Chinese restauran t pro cess analogy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 5.4 Dep endence structure induced b y the HDP mo del . . . . . . . . . . . . . . . . . . . . 12 5.4.1 Dep endence among individuals . . . . . . . . . . . . . . . . . . . . . . . . . . 12 5.4.2 Dep endence among genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 5.4.3 Case-con trol dep endence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 5.5 Adv an tages of HDP approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 5.6 Computational implemen tation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 5.7 Hyp othesis testing in the HDP framew ork . . . . . . . . . . . . . . . . . . . . . . . . 14 6 Applications to Myocardial Infarction data 15 6.1 Data description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 6.2 Implemen tation details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 6.3 Results and biological insigh ts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 6.3.1 Eﬀect of sex v ariable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 6.3.2 Roles of individual genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 6.3.3 Gene-gene in teractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 6.3.4 P opulation structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 7 Sensitivit y analysis 16 7.1 Sensitivit y to mixture comp onent speciﬁcations . . . . . . . . . . . . . . . . . . . . . 16 7.2 Sensitivit y to priors on base measure parameters . . . . . . . . . . . . . . . . . . . . 17 7.3 Sensitivit y to priors on cov ariance matrices . . . . . . . . . . . . . . . . . . . . . . . 17 7.4 Sensitivit y to hypothesis testing thresholds . . . . . . . . . . . . . . . . . . . . . . . 17 7.5 Sensitivit y to environmen tal cov ariance structure . . . . . . . . . . . . . . . . . . . . 17 7.6 Sensitivit y to hierarchical Diric hlet pro cess hyperparameters . . . . . . . . . . . . . . 18 7.7 Sensitivit y to link age disequilibrium and lo cus lab el p ermutation . . . . . . . . . . . 18 7.8 Summary of sensitivit y analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 8 Discussion 18 8.1 Comparison with existing metho ds . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 8.2 Limitations and robustness considerations . . . . . . . . . . . . . . . . . . . . . . . . 20 8.2.1 Fixed mixture w eights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 8.2.2 Mo del complexit y and in terpretability . . . . . . . . . . . . . . . . . . . . . . 20 8.3 F uture directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 8.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2 1 In tro duction 1.1 The challenge of genetic in teractions in complex diseases Complex diseases such as cardiov ascular disorders, diab etes, and psychiatric conditions result from in tricate netw orks of genetic and environmen tal factors. While genome-wide association studies (GW AS) hav e identiﬁed n umerous single nucleotide p olymorphisms (SNPs) asso ciated with disease risk, these explain only a small fraction of heritabilit y (Larson and Schaid, 2013). The “missing heritabilit y” problem has spurred interest in gene-gene (epistasis) and gene-environmen t interac- tions as p otential explanations. T raditional approaches to studying these interactions face several signiﬁcan t limitations that our work aims to address. Most existing metho ds rely on linear or additive mo dels that may not adequately capture the complex biological pathw a ys through whic h genetic factors in teract (W ang et al. , 2010). These simpliﬁed mo deling assumptions often fail to represen t the intricate bio chemical net works that c haracterize many complex diseases. F urthermore, the failure to prop erly accoun t for p opulation stratiﬁcation–the presence of genetic substructure within study p opulations–can lead to inﬂated false positive rates in asso ciation studies (Bhattac harjee et al. , 2010). This issue is particularly problematic in genetically div erse p opulations where diﬀerent subgroups may hav e distinct allele frequencies unrelated to disease risk. The computational burden represents another substan tial challenge in studying genetic in terac- tions. T esting all p ossible pairwise SNP-SNP in teractions b ecomes infeasible for genome-wide data, leading researchers to adopt heuristic screening metho ds that may miss imp ortant interactions or iden tify spurious ones. Additionally , many curren t approaches pro vide p oint estimates without adequately characterizing the uncertain ty in mo del structure, particularly regarding the n umber of underlying sub-p opulations or the complexit y of in teraction netw orks. This lack of comprehen- siv e uncertain ty quan tiﬁcation limits the reliabilit y and interpretabilit y of ﬁndings from genetic in teraction studies. 1.2 Our contribution: a Ba y esian nonparametric synthesis This article syn thesizes and extends a series of Ba yesian nonparametric models dev elop ed to address these c hallenges. Unlike previous w orks that presented these mo dels in isolation (Bhattac harya and Bhattac harya 2018, 2020, 2024), we pro vide a uniﬁed framew ork connecting gene-gene interaction mo dels, gene-en vironmen t extensions, and hierarchical Diric hlet pro cess form ulations. Our syn- thesis represen ts a comprehensive ov erview of this methodological approach, highlighting both theoretical foundations and practical implemen tation considerations. W e dev elop enhanced computational strategies that lev erage parallel pro cessing and transformation- based MCMC (Dutta and Bhattachary a, 2014) for practical implementation of these complex mod- els. These computational inno v ations mak e it feasible to apply Ba yesian nonparametric metho ds to realistic genetic datasets of me aningful size. Additionally , w e present new h yp othesis testing pro cedures for iden tifying disease-predisp osing lo ci in the presence of p opulation stratiﬁcation, of- fering more robust alternatives to traditional asso ciation tests. Finally , we pro vide comprehensiv e applications demonstrating biological insights from m y o cardial infarction data, showing how these metho ds can unco v er relationships not readily apparent using standard approaches. Our approac h fundamen tally departs from standard logistic regression by mo deling genotypes conditional on disease status using Dirichlet pro cess mixtures. This inv ersion of the typical mo d- eling relationship allows us to capture several imp ortan t features simultaneously . W e can mo del uncertain ty in p opulation substructure nonparametrically , allowing the data to inform ab out the 3 n umber and characteristics of genetic subgroups. W e capture gene-gene interactions through co- v ariance structures rather than regression co eﬃcients, providing a more ﬂexible represen tation of genetic dep endencies. F urthermore, w e accommo date sub ject-sp eciﬁc environmen tal eﬀects through hierarc hical mo deling, enabling p ersonalized assessmen t of genetic risk factors. 1.3 Article structure The remainder of this article is organized as follo ws. Section 2 introduces our mo deling philosoph y and contrasts it with traditional approaches, explaining the conceptual shift from regression-based to conditional genotype mo deling. Section 3 presents the gene-gene in teraction mo del with compu- tational details, including our parallel implementation strategy . Section 4 extends this framework to incorporate gene-environmen t in teractions, discussing b oth mo deling extensions and enhanced h yp othesis testing pro cedures. Section 5 introduces the hierarchical Diric hlet pro cess form ulation, whic h addresses limitations of previous mo dels b y allowing more ﬂexible sharing patterns. Section 6 presents comprehensiv e applications to my o cardial infarction data, demonstrating the practical utilit y of our metho ds. Section 7 pro vides extensiv e sensitivity analyses to assess the robustness of our results to k e y mo deling c hoices across all three framew orks. Finally , Section 8 discusses com- parisons with existing metho ds, addresses limitations and robustness considerations, and outlines future researc h directions. 2 A uniﬁed Ba y esian nonparametric framew ork 2.1 Mo deling philosoph y: from regression to conditional genot yp e models T raditional GW AS analyze disease status Y given genotypes X using logistic regression mo dels of the form P ( Y = 1 | X ) = logit − 1 ( β 0 + P β j X j + P β j k X j X k ). This approach, while interpretable and widely used, faces the “curse of dimensionalit y” when considering interactions and mak es strong parametric assumptions about the relationship b et ween genot yp es and disease risk. The logistic regression framework requires specifying whic h interactions to include, t ypically limiting consideration to pairwise eﬀects due to computational constrain ts. F urthermore, these models assume that genetic eﬀects are additiv e on the log-odds scale, which ma y not reﬂect the true biological mec hanisms underlying complex diseases. Our approach inv erts this relationship by mo deling genotypes conditional on disease status using ﬁnite mixture mo dels: [ X | Y = k ] = P M m =1 π m f ( X | θ mk ), where the mixture comp onents represen t laten t sub-p opulations. By placing Dirichlet process priors on the parameters { θ mk } , w e allow the num b er of sub-populations to b e inferred from the data rather than sp eciﬁed in adv ance. This mo deling strategy represents a fundamen tal shift in persp ective: instead of asking ho w genotype aﬀects disease risk, we ask how the distribution of genotypes diﬀers b et w een cases and con trols. This approach naturally accommo dates p opulation structure, as mixture comp onents can corresp ond to genetic subgroups with distinct allele frequencies. The conditional mo deling approac h oﬀers sev eral conceptual adv an tages. First, it directly ad- dresses the issue of p opulation stratiﬁcation by explicitly mo deling genetic heterogeneity . Second, it provides a natural framework for identifying disease-asso ciated genetic v ariants through compar- ison of genot yp e distributions b etw een cases and controls. Third, it allo ws for ﬂexible mo deling of dep endencies among genetic loci through the sp eciﬁcation of the mixture comp onen ts. Rather than assuming independence or simple correlation structures, we can incorp orate complex dependence patterns that ma y reﬂect biological relationships among genes. 4 2.2 Key adv an tages of our framew ork Our Ba y esian nonparametric framew ork oﬀers sev eral k ey adv an tages ov er traditional approaches. First, it pro vides ﬂexible mo deling of p opulation structure through Diric hlet pro cesses, which au- tomatically disco ver p opulation substructure without requiring pre-sp eciﬁed ancestry information. This is particularly v aluable in studies of admixed p opulations or when detailed ancestry infor- mation is una v ailable. The nonparametric nature of Diric hlet pro cesses allo ws the num b er of sub-p opulations to b e determined by the data, a voiding b oth underﬁtting and o verﬁtting that can o ccur with ﬁxed ﬁnite mixture mo dels. Second, our approach conceptualizes gene-gene interactions as statistical dep endence rather than additive eﬀects on disease risk. W e model these in teractions through cov ariance structures in hierarc hical priors, capturing complex dep endencies that may not b e well re presen ted b y linear or additiv e mo dels. This represen tation aligns with biological understanding that genetic factors often act in netw orks rather than indep enden tly . By focusing on co v ariance structures, we can identify sets of genes that co-v ary in their eﬀects on disease risk, p oten tially revealing functional pathw a ys or biological mo dules. Third, the conditional indep endence structure of our mo dels enables scalable computation through parallel implemen tation. Diﬀeren t comp onents of the mo del can b e updated indep en- den tly , allo wing us to lev erage mo dern parallel computing arc hitectures. This computational eﬃ- ciency makes it feasible to apply our methods to datasets with large n umbers of genetic v ariants and samples, addressing a k ey limitation of many Bay esian nonparametric approaches. F ourth, our fully Bay esian approac h provides comprehensiv e uncertaint y quantiﬁcation through p osterior distributions for all quan tities of in terest. This includes not only p oint estimates of genetic eﬀects but also measures of uncertaint y ab out p opulation structure, interaction strengths, and mo del complexity . Prop er characterization of uncertaint y is particularly imp ortant in genetic studies, where sample sizes are often limited relativ e to the num b er of genetic v ariants considered. 2.3 Notation and data structure W e consider case-con trol data with N k sub jects in group k ∈ { 0 , 1 } , where k = 0 denotes controls and k = 1 denotes cases. F or each sub ject i , we observ e genotypes represented as x s ij r ∈ { 0 , 1 } for c hromosome s ∈ { 1 , 2 } , gene j ∈ { 1 , . . . , J } , and lo cus r ∈ { 1 , . . . , L j } . Here, x s ij r = 1 indicates the presence of the minor allele on chromosome s at locus r of gene j for individual i , while x s ij r = 0 indicates its absence. Additionally , we may observe environmen tal co v ariates E i for eac h individual, whic h could include factors such as sex, age, smoking status, or other exp osures. Disease status is denoted b y Y i ∈ { 0 , 1 } . The total n umber of genes is J , and L j represen ts the num b er of lo ci within gene j . This notation pro vides a comprehensive framework for represen ting the complex hierarc hical structure of genetic data, with v ariations at the chromosome, lo cus, gene, individual, and group lev els. 3 Gene-gene in teraction mo del 3.1 Mo del speciﬁcation: a roadmap Our mo del for gene-gene in teractions comprises three k ey comp onen ts that w ork together to capture the complex structure of genetic data. First, w e emplo y sub ject-lev el mixture m o dels that represen t genot yp es as arising from ﬁnite mixtures, with eac h mixture comp onent corresp onding to a latent sub-p opulation. This approach allo ws us to account for p opulation stratiﬁcation without requiring prior speciﬁcation of ancestry groups. Second, w e place Diric hlet pro cess priors on the parameters 5 of these mixture mo dels, enabling ﬂexible sharing of mixture comp onents across sub jects and auto- matic determination of the n umber of sub-p opulations. The Diric hlet pro cess framework pro vides a principled Bay esian nonparametric approach to mo deling p opulation structure, with the precision parameter con trolling the degree of sharing among sub jects. Third, w e introduce a hierarc hical in teraction structure through matrix-normal priors that capture dep endencies among genes. This hierarc hical structure allo ws us to model gene-gene in teractions at m ultiple levels, from pairwise correlations to higher-order dep endencies. The com bination of these three comp onents creates a comprehensiv e mo deling framework that addresses several limitations of traditional approac hes. The mixture mo del comp onent handles p opulation stratiﬁcation, the Diric hlet process prior provides ﬂexibility in mo deling the n um b er and characteristics of sub-populations, and the hierarc hical in teraction structure captures complex dep endencies among genetic factors. Imp ortan tly , these comp onents are in tegrated in a coherent Ba yesian framework that allows for uncertain ty quantiﬁcation at all lev els, from individual genot yp e probabilities to o verall population structure. 3.2 Detailed formulation F or each gene j and group k , we mo del the genotype vector X ij k = ( x ij k 1 , . . . , x ij k L j ) using a ﬁnite mixture distribution: [ X ij k ] = P M m =1 π mj k Q L j r =1 Bernoulli( x ij k r | p mj k r ). Here, M represen ts the maximum n umber of mixture comp onents, whic h we set to a suﬃciently large v alue to ensure adequate mo del ﬂexibility . The mixing w eigh ts π mj k are ﬁxed at 1 / M for all m , j , and k . This c hoice of ﬁxed equal weigh ts, while somewhat uncon ven tional, has b een sho wn in previous work (Ma jumdar et al. , 2013) to yield b etter p erformance in estimating the true num b er of mixture comp onen ts compared to using Dirichlet priors. The robustness of this choice is discussed further in Section 8.2. The allele frequency parameters p mj k r follo w a hierarchical structure based on Dirichlet pro cess designed to capture b oth lo cus-sp eciﬁc eﬀects and dep endencies among genes. Sp eciﬁcally , it holds that the marginal prior distribution of the allele frequency parameters is p mj k r ∼ Beta( ν 1 j k r , ν 2 j k r ), where the Beta distribution parameters are themselv es mo deled as ν 1 j k r = exp( u r + λ j k ) and ν 2 j k r = exp( v r + λ j k ). The parameters u r and v r are locus-sp eciﬁc eﬀects assumed to follow indep enden t standard normal distributions: u r , v r ∼ N (0 , 1). Allo wing u r and v r to diﬀer ensures that the mean of the Beta distribution for p mj k r dep ends on the sp eciﬁc lo cus r , capturing v ariation in allele frequencies across the genome. Gene-gene interactions are captured through a matrix-normal prior on the parameter matrix λ = { λ j k } , whic h represents gene- and group-sp eciﬁc eﬀects. W e assume λ ∼ N ( µ , A ⊗ Σ ), where A represen ts the cov ariance matrix capturing dep endencies among genes, and Σ represen ts the co v ariance matrix capturing dep endency b etw een the genot yp e distribution of case and con trol, giv en any gene. The Kroneck er pro duct structure A ⊗ Σ allo ws for separable cov ariance structures across genes and disease status, providing a computationally tractable y et ﬂexible represen tation of dep endencies. This formulation enables us to mo del complex interaction patterns while maintaining computational feasibilit y through the separable cov ariance structure. 3.3 Computational strategy The conditional independence structure of our model enables eﬃcient computational implemen- tation through a com bination of parallel pro cessing and sp ecialized MCMC tec hniques. The key insigh t is that, giv en the interaction parameters λ , the mixture mo dels for diﬀeren t gene-group pairs ( j, k ) are conditionally indep enden t. This conditional independence allows us to up date the 6 mixture parameters for each ( j, k ) pair in parallel across multiple processors. Each pro cessor han- dles a subset of the gene-group pairs, updating the allocation v ariables z ij k , the allele frequency parameters p mj k r , and the other mixture-related parameters indep enden tly of the others. F or updating the in teraction parameters λ , w e emplo y transformation-based MCMC (TM- CMC), which up dates all elements of λ simultaneously through deterministic transformations of a low-dimensional random v ariable. TMCMC is particularly w ell-suited for high-dimensional pa- rameter spaces, as it can prop ose mov es in directions that resp ect the correlation structure of the p osterior distribution. In our implemen tation, we use a single random v ariable to generate prop os- als for all elemen ts of λ , with the transformation designed to maintain reasonable acceptance rate and mixing. The Dirichlet pro cess mixture comp onents are up dated using a fast Gibbs sampling algorithm sp eciﬁcally designed for ﬁnite mixtures with Dirichlet pro cess priors. This algorithm leverages the P olya urn represen tation of the Dirichlet pro cess to eﬃciently up date the allo cation of sub jects to mixture comp onen ts and the parameters asso ciated with eac h comp onen t. The algorithm alternates b et w een up dating the allo cation v ariables giv en the parameters and up dating the parameters given the allo cations, with b oth steps b eing computationally eﬃcient due to conjugacy relationships. Figure 1 displa ys the sc hematic diagram for our gene-gene in teraction mo del. F or each gene and case-con trol status, genot yp e data are modeled using Dirichlet pro cess-based mixtures that cap- ture sub-p opulation structure. SNP-lev el dep endencies and gene-gene in teractions are in tro duced through a matrix-normal prior on laten t in teraction parameters. The mo dular design of the mo del allo ws eﬃcient parallel computation: gene-sp eciﬁc mixture comp onents are up dated indep enden tly across processors, while the in teraction parameters are up dated cen trally using TMCMC. Our parallel co des are written in C, in accordance with the Message P assing Interface (MPI) proto col. Case/Con trol ( k ) Gene j Mixture for ( j, k ): G j k ∼ Diric hlet Pro cess Minor allele at SNP r : x ij k r u r , v r ∼ N (0 , 1) (Lo cus-sp eciﬁc) λ j k ∼ N ( µ , A ⊗ Σ ) (Gene-group) Figure 1: Schematic representation of the Bay esian mo del for gene-gene interactions. 3.4 Hyp othesis testing and DPL iden tiﬁcation W e dev elop comprehensive Bay esian hypothesis testing pro cedures to ev aluate v arious asp ects of genetic asso ciations within our framew ork. F or asses sing gene eﬀects, w e compare clustering pat- terns b etw een cases and controls using metrics based on p osterior distributions of the mixture comp onen ts. Signiﬁcan t diﬀerences in clustering patterns indicate that the genetic structure diﬀers 7 b et w een cases and con trols, suggesting an asso ciation betw een the gene and disease status. This approac h provides a more nuanced assessmen t than traditional asso ciation tests, as it considers the en tire distribution of genotypes rather than just summary statistics. T o test for gene-gene in teractions, w e examine elemen ts of the co v ariance matrix A in the matrix- normal prior on λ . Non-zero oﬀ-diagonal elements in A indicate dep endencies b etw een the eﬀects of diﬀeren t genes, which w e interpret as statistical in teractions. W e compute p osterior probabilities that sp eciﬁc elements of A are non-zero, providing a quantitativ e measure of interaction strength with asso ciated uncertaint y . This approach allo ws us to identify pairs or groups of genes that show co ordinated eﬀects on disease risk, p oten tially revealing biological pathw ays or functional mo dules. Our k ey idea for identifying disease-predisposing lo ci (DPL) ma y b e lik ened to computing p osterior probabilities that sp eciﬁc SNPs show diﬀeren tial distributions b et w een cases and con- trols. Sp eciﬁcally , for each SNP r in gene j , our idea is analogous, in principle, to computing P (SNP r is DPL | Data) = P    p case · j r − p control · j r   > δ | Data  , where δ is a clinically meaningful threshold for allele frequency diﬀerences. The probabilities p case · j r and p control · j r represen t the allele fre- quencies in cases and controls, resp ectiv ely . This approac h provides a principled Bay esian metho d for DPL identiﬁcation that accoun ts for multiple sources of uncertain ty , including p opulation strat- iﬁcation and estimation error. Our h yp othesis testing framework extends b eyond simple signiﬁcance testing to include mea- sures of eﬀect size and uncertaint y . F or each test, we report p osterior probabilities along with credible interv als for relev ant parameters, providing a comprehensiv e picture of the evidence for v arious genetic asso ciations. This Bay esian approach naturally incorp orates multiple testing con- siderations through the prior distributions and p osterior probabilities, av oiding the need for ad ho c corrections that can b e problematic in high-dimensional settings. 4 Gene-en vironmen t in teraction mo del 4.1 Extending the framew ork T o incorp orate environmen tal co v ariates E i , we extend our mo deling framework to allow for sub ject- sp eciﬁc mixture distributions that dep end on en vironmen tal factors. The extended mo del repre- sen ts genot yp es as [ X ij k | E i ] = P M m =1 π mij k Q L j r =1 Bernoulli( x ij k r | p mij k r ), where now the mixing w eights and comp onent parameters can v ary across individuals based on their environmen tal ex- p osures. As in the gene-gene in teraction model, w e ﬁx π mij k = 1 / M for all ( i, j, k , m ), main taining computational simplicit y while allowing ﬂexibilit y through the comp onent parameters. The k ey extension in the gene-environmen t in teraction mo del lies in the parameterization of the Beta distribution parameters for the allele frequencies. W e now model these as ν 1 ij k r = exp( u j r + λ ij k + µ j k + β ⊤ j k E i ) and ν 2 ij k r = exp( v j r + λ ij k + µ j k + β ⊤ j k E i ). This expanded pa- rameterization includes sev eral new terms: u j r and v j r are gene- and lo cus-sp eciﬁc eﬀects; λ ij k represen ts individual-sp eciﬁc genetic eﬀects that capture residual v ariation not explained by other factors; µ j k are gene- and group-sp eciﬁc intercept terms; and β j k are v ectors of co eﬃcients captur- ing the eﬀects of en vironmental co v ariates on the genetic parameters for gene j in group k . The inclusion of environmen tal eﬀects through the term β ⊤ j k E i allo ws the distribution of geno- t yp es to v ary systematically with environmen tal exp osures. When β j k  = 0 , en vironmen tal factors mo dify the genetic parameters for gene j in group k , represen ting gene-environmen t interaction. Imp ortan tly , this formulation allo ws environmen tal factors to aﬀect not only the mean genetic ef- fects but also the co v ariance structure among genes, as the en vironmental terms enter in to the hierarc hical mo del that generates the λ ij k parameters. This enables the mo del to capture ho w 8 en vironmental exp osures might mo dify not just individual genetic eﬀects but also the patterns of in teraction among genes. The co v ariance structure in this extended mo del con tinues to capture how environmen t mo diﬁes gene-gene in teractions. The individual-sp eciﬁc eﬀects λ ij k inherit dep endence structure from the hierarc hical model, with en vironmental factors p otentially inﬂuencing b oth the mean and cov ariance of these eﬀects. This allows for rich patterns of gene-environmen t interaction, including scenarios where environmen tal exp osures alter the strength or pattern of genetic correlations. F or example, an environmen tal factor migh t strengthen the correlation betw een t wo genes in cases but w eaken it in controls, or it might induce correlations among genes that are indep endent in the absence of the exp osure. 4.2 Enhanced hypothesis testing W e extend our hypothesis testing framew ork to address questions sp eciﬁc to gene-en vironment in teractions. After testing for ov erall genetic eﬀect b y extending the previous metho d, we test for the presence of gene-en vironment interaction b y ev aluating the n ull hypothesis H 0 : β j k = 0 for sp eciﬁc genes and en vironmental factors. W e compute posterior probabilities that β j k diﬀers from zero, providing a direct measure of evidence for gene-en vironment in teraction. This approach allo ws us to identify genes whose eﬀects on disease risk are mo diﬁed by environmen tal exp osures, whic h is of particular interest for understanding disease etiology and targeted interv entions. W e also test for join t eﬀects inv olving b oth gene-gene and gene-environmen t interactions. This in volv es ev aluating whether environmen tal factors mo dify the patterns of interaction among genes, whic h we assess by examining how environmen tal cov ariates aﬀect the co v ariance structure in the mo del. Sp eciﬁcally , we test whether parameters go verning the dep endence of the co v ariance structure on environmen tal factors diﬀer from zero. Signiﬁcan t ﬁndings indicate that environmen tal exp osures alter the net work of genetic interactions, potentially revealing mechanisms through whic h en vironment inﬂuences disease risk. Third, we develop metho ds for stratiﬁed DPL iden tiﬁcation that accoun t for environmen tal het- erogeneit y . Rather than identifying DPL that show av erage diﬀerences b etw een cases and con trols, w e identify SNPs whose eﬀects dep end on environmen tal exp osure. This inv olv es computing p os- terior probabilities that the diﬀerence in allele frequencies b etw een cases and controls v aries across lev els of environmen tal factors. F or example, we migh t identify SNPs that sho w strong asso ciations in exp osed individuals but weak or no asso ciations in unexp osed individuals, or vice v ersa. This stratiﬁed approach can reveal genetic factors that are imp ortant only under sp eciﬁc environmen tal conditions, pro viding insights in to the context-dependence of genetic risk. Apart from these, we additionally dev elop visualization to ols to help in terpret complex inter- action patterns, including heatmaps of p osterior in teraction probabilities and netw ork diagrams sho wing how genes and en vironmental factors are connected through interaction eﬀects. Figure 2 provides the schematic diagram for our gene-environmen tal interaction mo del. Envi- ronmen tal co v ariates inﬂuence individual-lev el Diric hlet process mixtures, allo wing the mo del to accoun t for p ersonalized eﬀects of en vironmental exp osures on genot yp e distributions. The prior structure integrates lo cus-sp eciﬁc, gene-speciﬁc, and environmen t-dep enden t parameters. Parallel computation is emplo yed by up dating gene-environmen t sp eciﬁc comp onen ts in parallel and inter- action parameters centrally via TMCMC. As b efore, our parallel co des are written in C, leveraging the MPI proto col. 9 Individual i En vironmental co v ariates E i Gene j Mixture for ( i, j, k ): G ij k ∼ DP Minor allele at SNP r : x ij k r u j r , v j r (Lo cus-gene) λ ij k , µ j k , β ⊤ j k E i (Individual-gene) Figure 2: Diagram of the extended Bay esian framework incorporating gene-en vironmen t in terac- tions. 5 Hierarc hical Diric hlet pro cess mo del 5.1 Motiv ation and limitations of previous mo dels The gene-en vironmen t interaction mo del presen ted in Section 4 makes the simplifying assumption that environmen tal eﬀects act uniformly on gene-gene interactions across all individuals. This as- sumption may not hold in many practical settings for several reasons. First, diﬀeren t individuals ma y exp erience diﬀerent lev els or types of environmen tal exp osure, leading to heterogeneous eﬀects on genetic in teractions. F or example, the eﬀect of smoking on genetic risk factors for cardio v ascular disease ma y dep end on duration and intensit y of smoking, whic h v aries across individuals. Second, en vironmental eﬀects may b e heterogeneous across the p opulation due to unmeasured factors or eﬀect mo diﬁers. Genetic bac kground, other en vironmen tal exp osures, or lifest yle factors might mod- ify ho w a particular en vironmental factor inﬂuences genetic in teractions. Third, gene-en vironment in teractions ma y b e context-dependent, with eﬀects manifesting only under sp eciﬁc conditions or in sp eciﬁc subgroups. The uniform eﬀect assumption fails to capture this context-dependence, p oten tially missing imp ortan t biological relationships. These limitations motiv ate the dev elopment of a more ﬂexible mo deling approach that can ac- commo date heterogeneous and context-dependent gene-environmen t interactions. The hierarchical Diric hlet pro cess (HDP) mo del provides a principled nonparametric framework for capturing com- plex sharing patterns among individuals, genes, and groups in a data-driven manner. Rather than imp osing a parametric structure on how environmen tal factors inﬂuence genetic dep endencies, the HDP model learns these relationships nonparametrically from the data. This approac h can disco v er complex patterns of interaction that might b e missed by more restrictive parametric mo dels, while still pro viding a coherent probabilistic framew ork for inference. 5.2 HDP formulation W e address the limitations of previous mo dels through a three-level hierarchical Diric hlet pro cess form ulation that introduces ﬂexible, nonparametric dep endence structures among genes, en viron- men tal v ariables, and case-control status. Our mo del represents a signiﬁcant extension of the 10 traditional HDP framework (T eh et al. , 2006) b y incorp orating an additional level of hierarc hy that sp eciﬁcally captures case-con trol dep endence while allowing for sub ject-sp eciﬁc gene-gene interac- tions inﬂuenced b y environmen tal factors. F or each individual i in group k , gene j , and mixture comp onent m , we assume that the minor allele frequency vector p mij k = ( p mij k 1 , p mij k 2 , . . . , p mij k L ) is generated from a hierarch y of Dirichlet pro cesses: p 1 ij k , p 2 ij k , . . . , p M ij k iid ∼ G ij k (1) G ij k ∼ DP( α G,ik G 0 ,j k ) (2) where DP( α G,ik G 0 ,j k ) denotes a Dirichlet pro cess with base measure G 0 ,j k and precision pa- rameter α G,ik . The environmen tal dep endence enters through the precision parameter: log( α G,ik ) = µ G + β ⊤ G E ik (3) where E ik is a d -dimensional vector of environmen tal v ariables for individual i in group k , β G is a d -dimensional v ector of regression co eﬃcients, and µ G is an in tercept term. A t the second level of the hierarch y , we assume: G 0 ,j k iid ∼ DP( α G 0 ,k H k ); j = 1 , . . . , J (4) with log( α G 0 ,k ) = µ G 0 + β ⊤ G 0 ¯ E k (5) where ¯ E k = 1 N k P N k i =1 E ik is the a verage en vironmental v ariable in group k . The third lev el of hierarch y creates dep endence b etw een case and control groups: H k iid ∼ DP( α H ˜ H ); k = 0 , 1 (6) with log( α H ) = µ H + β ⊤ H ¯ ¯ E (7) where ¯ ¯ E = ( ¯ E 0 + ¯ E 1 ) / 2 is the o verall a verage environmen tal v ariable. The global base measure ˜ H is sp eciﬁed as: p mij k r iid ∼ Beta( ν 1 , ν 2 ) under ˜ H (8) where ν 1 , ν 2 > 0 are ﬁxed h yp erparameters. This hierarc hical structure has a natural interpretation in terms of genetic architecture. The sub ject-level distributions G ij k capture individual-speciﬁc genetic patterns, whic h ma y diﬀer due to unique genetic backgrounds, environmen tal exp osures, or other individual factors. These sub ject- lev el distributions share comp onen ts through the gene-group lev el distributions G 0 ,j k , which rep- resen t common patterns for eac h gene in each disease group. The group-level distributions H k capture o verall genetic patterns for cases and controls separately , allowing for diﬀerences in ge- netic arc hitecture b etw een aﬀected and unaﬀected individuals. Finally , the global distribution ˜ H represen ts the ov erall genetic background of the p opulation. 11 5.3 Chinese restaurant pro cess analogy The hierarc hical structure can be understo o d through an extended Chinese restauran t process anal- ogy (T eh et al. , 2006). F or eac h group k = 0 , 1, imagine J restauran ts (genes). Eac h individual i visits these restauran ts, where at the j -th restauran t, they are seated at tables (mixture com- p onen ts) that serve dishes (allele frequency parameters). The dishes at diﬀerent tables within the same restauran t are drawn from G 0 ,j k , whic h itself is drawn from H k . This creates sharing of dishes across tables within a restauran t (within a gene across individuals). No w consider that all restauran ts share a global menu of dishes from ˜ H . Diﬀeren t restaurants (genes) may serv e diﬀerent selections from this global men u, with the selections dra wn from H k . This creates sharing of dishes across restaurants (across genes), with the degree of sharing con trolled b y α H . The environmen tal v ariables inﬂuence ho w likely customers (individuals) are to sit at new tables through α G,ik , and ho w likely restauran ts are to oﬀer new dishes through α G 0 ,k and α H . This analogy clariﬁes how our mo del creates dependence: individuals sharing the same table at a restauran t share the same dish (allele frequency parameters) for that gene, creating depen- dence among individuals. Diﬀerent restaurants (genes) serving the same dish creates dep endence among genes. The sharing of dishes b etw een case and control restauran ts ( k = 0 and k = 1) cre- ates case-control dep endence. Environmen tal factors inﬂuence these sharing probabilities, thereby mo dulating the dep endence structure. 5.4 Dep endence structure induced b y the HDP mo del Our HDP model induces three k ey t yp es of dep endence that are crucial for understanding gene-gene and gene-en vironment in teractions: 5.4.1 Dep endence among individuals F rom the P olya urn represen tation, the joint distribution of p M ij k = { p 1 ij k , . . . , p M ij k } shows that individuals share dish parameters ϕ tij k dra wn from G 0 ,j k . The probability of sharing dep ends on α G,ik , whic h in turn dep ends on individual environmen tal v ariables E ik through equation (3). This creates environmen tal-mo dulated dep endence among individuals: individuals with similar en vironmental exposures are more likely to share genetic patterns. Imp ortan tly , the marginal distribution of p mij k is G 0 ,j k , whic h do es not dep end on E ik . This is biologically desirable: p opulation minor allele frequencies should not b e aﬀected b y en vironmen tal v ariables, although environmen tal exp osure may inﬂuence how individuals cluster together in their genetic patterns. 5.4.2 Dep endence among genes Gene-gene dep endence arises through the sharing of dishes (parameters) across restauran ts (genes). The parameters ϕ tij k for diﬀerent genes j share common v alues from H k , creating dep endence among genes. The degree of this dep endence is inﬂuenced by α G 0 ,k , which dep ends on the group a verage en vironmen t ¯ E k through equation (5), and also indirectly through α G,ik whic h aﬀects the n umber of tables τ ij k . This structure ensures that gene-gene interactions are sp eciﬁc to individuals and are inﬂuenced b y b oth individual environmen tal v ariables ( E ik ) and group av erages ( ¯ E k ), while the marginal distributions of individual genes remain unaﬀected b y environmen t. 12 5.4.3 Case-con trol dep endence The sharing of dishes b etw een case and con trol restaurants (through H 0 and H 1 sharing from ˜ H ) creates dep endence b etw een case and con trol groups. This dep endence captures factors that aﬀect b oth cases and controls but may not b e explicitly measured, such as population stratiﬁcation or unmeasured en vironmental factors. The degree of case-control sharing is controlled by α H , which dep ends on the o verall av erage en vironment ¯ ¯ E through equation (7). This allows en vironmen tal factors to mo dulate the similarity b et w een cases and controls in their genetic patterns. 5.5 Adv antages of HDP approac h The hierarc hical Dirichlet pro cess mo del oﬀers sev eral adv antages ov er the parametric mo dels dis- cussed previously . First, it provides ﬂexible, nonparametric dependence structure: unlike previous matrix-normal based mo dels that assume uniform environmen tal eﬀects on cov ariance structures, our HDP mo del allo ws en vironmen t to inﬂuence dep endence structures in a ﬂexible, nonparametric manner that v aries across individuals. Second, it enables sub ject-sp eciﬁc gene-gene in teractions: eac h individual can ha ve their o wn pattern of gene-gene in teractions, with en vironmental factors in- ﬂuencing these individual-sp eciﬁc patterns. This is more realistic than assuming a single correlation structure for all individuals. Third, it main tains biological in terpretabilit y: the mo del preserv es the biologically imp ortant prop erty that en vironmental v ariables inﬂuence dependence structures but not marginal allele frequency distributions. Only dep endence structures (how genes interact) are aﬀected by environmen t, not the genes themselv es. F ourth, it oﬀers automatic complexity control: the Diric hlet process priors automatically determine the appropriate n umber of mixture comp o- nen ts at each level of the hierarch y , av oiding the need to pre-sp ecify the num b er of sub-p opulations or in teraction patterns. Fifth, it achiev es computational eﬃciency through parallelization: the con- ditional indep endence structure of the HDP mo del enables eﬃcient parallel implemen tation, with diﬀeren t genes and individuals up dated indep enden tly given the higher-lev el parameters. 5.6 Computational implementation W e implement the HDP mo del using a no vel parallel MCMC algorithm that com bines Gibbs sam- pling steps with transformation-based MCMC (TMCMC). The algorithm leverages the conditional indep endence structure of the mo del: given the higher-level distributions ( H k and G 0 ,j k ), the sub ject-level parameters can be updated indep enden tly across individuals and genes. This allows for parallel computation across m ultiple processors. Key components of our computational strategy include retrosp ective sampling (Papaspiliopoulos and Rob erts, 2008) for up dating parameters from Diric hlet process mixtures, whic h av oids inﬁnite-dimensional representations by generating param- eters as needed. The allocation v ariables z ij k and allele frequency parameters p mij k r are up dated in parallel across diﬀerent ( i, j, k ) combinations using Gibbs steps that exploit conjugacy . The parameters µ G , β G , µ G 0 , β G 0 , µ H , β H are up dated in a single blo c k using a mixture of additive and m ultiplicative TMCMC (Dey and Bhattach ary a, 2016), whic h eﬃcien tly explores correlated pa- rameter spaces. Our parallel implementation in C, as b efore, uses MPI for communication betw een pro cessors, with careful load balancing to ensure eﬃcien t parallel scaling. The sch ematic diagram of our HDP model is presented in Figure 3. This fully nonparametric framew ork models dep endencies across individuals, genes, and groups through a three-level hier- arc hy of Dirichlet pro cesses. En vironmental cov ariates inﬂuence the precision parameters at each lev el, allowing ﬂexible, individualized representation of in teraction structures. The base distribu- tion is a Beta prior on allele frequencies. This hierarc hy enables ric h mo deling of stratiﬁcation and 13 in teraction while maintaining computational scalabilit y . En vironmental co v ariates E i Individual DP: G ij k log α G,ik = µ G + β ⊤ G E ik Gene DP: G 0 ,j k log α G 0 ,k = µ G 0 + β ⊤ G 0 ¯ E k Group DP: H k log α H = µ H + β ⊤ H ¯ ¯ E Base: p mij k r ∼ Beta( ν 1 , ν 2 ) Figure 3: Schematic of the hierarchical Diric hlet pro cess (HDP) mo del for gene-gene and gene- en vironment in teractions. 5.7 Hyp othesis testing in the HDP framew ork W e extend our h yp othesis testing procedures to the HDP framew ork with speciﬁc tests for: gene eﬀects, testing H 0 : h 0 j = h 1 j for j = 1 , . . . , J , where h 0 j and h 1 j are the marginal distributions of genot yp es for controls and cases resp ectively for gene j ; environmen tal eﬀects, testing H 0 : β G = 0 , H 0 : β G 0 = 0 , and H 0 : β H = 0 for the environmen tal coeﬃcients at diﬀeren t lev els of the hierarc hy; gene-gene in teractions, deﬁning sub ject-sp eciﬁc gene-gene in teraction measures C ( i, j 1 , j 2 , k ) as co v ariances b etw een logit-transformed av erage allele frequencies for genes j 1 and j 2 for individual i in group k , and testing H 0 : C ( i, j 1 , j 2 , k ) = 0; and case-control dep endence, assessing whether the sharing b etw een cases and controls (through α H ) is signiﬁcant, indicating common factors aﬀecting b oth groups. The interpretation of these tests follows a logical framework: if genes hav e no eﬀect but environmen t aﬀects dep endence structures, then environmen t inﬂuences gene-gene interactions without aﬀecting disease status. If genes hav e eﬀects but en vironmen t do esn’t aﬀect interactions, then purely genetic factors determine disease. If b oth genes and environmen t are signiﬁcan t and aﬀect in teractions, then gene-environmen t interactions inﬂuence disease risk. 14 6 Applications to My o cardial Infarction data 6.1 Data description W e apply our Ba yesian nonparametric framew ork to a m yocardial infarction (MI) case-control dataset to demonstrate its practical utility and biological insights. The dataset comprises genotype information from the MI Gen study , obtained from the dbGaP database. This dataset includes m ultiple sub-p opulations: Caucasian, Han Chinese, Japanese, and Y oruban. F or our analysis, we considered a set of SNPs that are found to b e individually asso ciated with diﬀerent cardiov ascular endp oin ts in v arious GW AS, along with SNPs marginally associated with MI in the MIGen study . The data inv olve 32 genes cov ering 1251 lo ci, including 33 previously identiﬁed lo ci asso ciated with my o cardial infarction. The dataset includes b oth cases (individuals who exp erienced m yocar- dial infarction) and con trols. An imp ortant en vironmen tal cov ariate av ailable is sex (male/female), whic h is incorporated to inv estigate gene-environmen t interactions. This provides an opp ortunity to demonstrate ho w our framew ork can unco ver b oth genetic main eﬀects and interactions with en vironmental factors. The genetic data consist of diploid genot yp es , whic h we con v ert to binary indicators for eac h c hromosome for compatibilit y with our mo deling framew ork. This represen tation allows us to mo del the tw o c hromosomes separately while main taining the diploid nature of the data in our likelihoo d function. 6.2 Implemen tation details W e implement our Ba y esian nonparametric mo dels using our MPI-based C co des and parallel com- puting resources. F or the HDP model, w e set the maxim um num b er of mixture components M = 30 based on preliminary analyses to provide suﬃcien t ﬂexibility without excessiv e computational cost. W e set ν 1 = ν 2 = 1, making ˜ H a uniform distribution on [0 , 1]. The precision parameters are sp eciﬁed as: α G,ik = 0 . 1 × exp(100 + µ G + β ⊤ G E ik ) α G 0 ,k = 0 . 1 × exp(100 + µ G 0 + β ⊤ G 0 ¯ E k ) α H = 0 . 1 × exp(100 + µ H + β ⊤ H ¯ ¯ E ) with µ G , µ G 0 , µ H iid ∼ U (0 , 1) and β G , β G 0 , β H iid ∼ U ( − 1 , 1) as priors. This structure ensures adequate n umbers of sub-p opulations and satisfactory MCMC mixing. W e p erform MCMC sampling with 30,000 iterations, discarding the ﬁrst 10,000 as burn-in. Con vergence is assessed primarily using trace plots. The parallel implementation distributes com- putation across 50 cores, with total run time of approximately 7 days for the full analysis. 6.3 Results and biological insights Application of our HDP model to the m y o cardial infarction data yields sev eral imp ortan t insights: 6.3.1 Eﬀect of sex v ariable W e ﬁnd strong evidence that sex inﬂuences genetic patterns: P ( | β G | < ϵ β G | Data) ≈ 0 and P ( | β G 0 | < ϵ β G 0 | Data) ≈ 0, indicating that individual-lev el ( E ik ) and group-a v erage ( ¯ E k ) sex eﬀects are highly signiﬁcant. Ho wev er, P ( | β H | < ϵ β H | Data) ≈ 1, suggesting the ov erall av erage sex 15 eﬀect ( ¯ ¯ E ) is not signiﬁcan t. This pattern indicates that sex plays an imp ortant role in inﬂuencing sub ject-sp eciﬁc and group-level genetic patterns, but not the ov erall case-con trol similarit y . 6.3.2 Roles of individual genes Our clustering-based hypothesis tests indicate signiﬁcant o verall genetic eﬀects. How ever, individual gene tests show that none of the 32 genes are individually signiﬁcan t. This apparent paradox is explained by gene-gene in teractions: when genes are correlated, the maxim um of their individual distances can b e signiﬁcan t even when individual distances are not, similar to ho w max( X 1 , X 2 ) from a biv ariate normal distribution can hav e a non-zero median ev en when X 1 and X 2 ha ve zero medians individually . 6.3.3 Gene-gene in teractions Our HDP mo del reveals sub ject-sp eciﬁc gene-gene in teraction patterns not detected by previous mo dels. Tw o genes, AP006216.10 and C6orf106 , show signiﬁcan t interactions with other genes in most sub jects. In terestingly , the only sub jects with no signiﬁcan t gene-gene interactions inv olving these genes were male cases, suggesting that lac k of protective gene-gene interactions ma y contribute to MI risk in males. The gene-gene interactions appear to ha v e a protectiv e eﬀect: in sub jects where AP006216.10 and/or C6orf106 in teract with other genes, the risk of MI seems reduced. This b eneﬁcial eﬀect of gene-gene in teractions con trasts with the traditional view that interactions primarily increase disease risk. 6.3.4 P opulation structure The p osterior distribution of the n um b er of sub-p opulations sho ws mo des at 3 and 4 comp o- nen ts, supp orting the kno wn four sub-p opulations in the data (Caucasian, Han Chinese, Japanese, Y oruban). The mo del correctly iden tiﬁes that these p opulations cannot b e further subdivided genetically , consisten t with the biological understanding of these p opulation groups. 7 Sensitivit y analysis 7.1 Sensitivit y to mixture comp onen t sp eciﬁcations F or the gene-gene in teraction mo del of Section 3 and the gene-environmen t interaction mo del of Section 4, we examine sensitivit y to the maxim um num b er of mixture comp onents M and the ﬁxed mixing weigh ts π mj k = 1 / M . F ollowing the rule-of-thum b established in prior w ork (Ma jumdar et al. , 2013), our primary analyses set M = 30, which pro vides suﬃcien t capacity while main taining computational eﬃciency . Alternativ e sp eciﬁcations with M = 20 and M = 50 yield qualitativ ely similar results; the p osterior distribution of the eﬀective num b er of mixture comp onents remains stable once M exceeds a threshold (approximately 20 in our applications), as the Diric hlet pro cess prior automatically determines the num b er of distinct comp onen ts through the P´ oly a urn scheme. This robustness renders the exact v alue of M largely inconsequen tial beyond ensuring adequate mo del ﬂexibilit y . 16 7.2 Sensitivit y to priors on base measure parameters F or the Beta base measure parameters u r and v r , which are sp eciﬁed as indep enden t standard nor- mal distributions in our primary analyses, we inv estigate robustness using alternativ es: N (0 , 10 2 ), N (0 , 0 . 1), and Cauc h y(0 , 1). P osterior distributions of the interaction parameters λ j k and the re- sulting gene-gene correlation estimates exhibit remark able stability across these sp eciﬁcations. As noted in Section 3, w e ﬁnd that Gaussian priors on u r and v r with other means and v ariances do not yield signiﬁcan tly diﬀerent results, indicating inherent prior robustness in our mo deling strategy . The Cauch y prior pro duces slightly hea vier tails but do es not alter conclusions of an y h yp othesis tests regarding gene signiﬁcance or gene-gene in teractions. 7.3 Sensitivit y to priors on cov ariance matrices F or the gene-gene interaction matrix A and case-con trol dependence matrix Σ in the matrix-normal prior λ ∼ N ( µ , A ⊗ Σ ), w e sp ecify inv erse-Wishart priors: A ∼ I W ( ξ , A 0 ) with ξ = J + 2, and Σ ∼ I W ( ζ , Σ 0 ) with ζ = 4. Sensitivity is assessed across degrees of freedom ( ξ = J + 1 , J + 2 , J + 5 , J + 10; ζ = 3 , 4 , 5 , 10) and scale matrix sp eciﬁcations (estimated from data versus identit y matrices). Posterior inferences regarding which genes are marginally signiﬁcant and which gene- gene in teractions are presen t pro ve highly robust to v ariations in degrees of freedom. How ever, the magnitudes of posterior correlations sho w some sensitivity: larger degrees of freedom, which corresp ond to stronger prior information, produce shrink age to w ard the prior mean and yield slightly atten uated correlation estimates. The choice of scale matrices has minimal impact when degrees of freedom are set to the minim um v alues ensuring prop er priors ( ξ = J + 2, ζ = 4). 7.4 Sensitivit y to hypothesis testing thresholds A critical component of our Bay esian testing procedure, described in Section 3, is the sp eciﬁcation of thresholds ϵ for distance measures. F ollowing our established approac h, we set ϵ = F − 1 (0 . 55) where F is the p osterior distribution function under the n ull model. The c hoice of the 55th percentile rather than the median is delib erate: for the median, the p osterior probabilit y of the true n ull h yp othesis is 0.5, whereas under zero-one loss the true null will b e accepted only if its p osterior probabilit y exceeds 1 / 2. Using the median results in b orderline decisions in null simulations, while the 55th p ercentile pro vides appropriate op erating characteristics. Sensitivit y analyses using the 50th, 60th, and 75th p ercentiles demonstrate that the 60th and 75th p ercen tiles pro duce more conserv ativ e tests with reduced false p ositives but increased false negatives, while the 50th p ercentile yields unacceptably high false p ositive rates. Our choice of the 55th percentile represents a balanced compromise that main tains p ow er while controlling Type I error. 7.5 Sensitivit y to environmen tal cov ariance structure In our gene-environmen t in teraction model of Section 4, w e examine sensitivity to the sp eciﬁcation of the environmen tal co v ariance structure. The p osterior probability P ( ϕ < ϵ ϕ | Data) is robust across alternative kernel sp eciﬁcations, consistently indicating no diﬀerential eﬀect of sex on ge- netic interactions in the my o cardial infarction application. P osterior estimates of ϕ sho w some sensitivit y to prior v ariance, with more diﬀuse priors yielding wider credible in terv als, but the 95% credible interv al consistently con tains zero. The smo othness parameter in the k ernel is diﬃcult to iden tify from the data, with p osterior distributions largely reﬂecting the prior when sample sizes are mo derate. How ev er, this lack of identiﬁabilit y do es not aﬀect primary conclusions regarding gene signiﬁcance or gene-gene in teractions, as ϕ is consistently estimated to b e negligible. 17 7.6 Sensitivit y to hierarchical Diric hlet pro cess h yp erparameters F or our hierarchical Diric hlet pro cess mo del of Section 5, we assess sensitivit y to the sp eciﬁcation of precision parameters and their asso ciated regression co eﬃcients. Our primary sp eciﬁcation, with α G,ik = 0 . 1 × exp(100 + µ G + β ⊤ G E ik ) and analogous structures for α G 0 ,k and α H , ensures adequate n umbers of sub-p opulations and satisfactory MCMC mixing. Remo ving the constant oﬀset of 100 results in signiﬁcantly smaller α v alues, leading to few er distinct mixture comp onents and p o orer mixing. Alternativ e prior distributions for µ G , µ G 0 , µ H and β G , β G 0 , β H (normal versus uniform) p erform similarly , with uniform priors showing sligh tly b etter conv ergence prop erties. The key conclusions regarding gene-gene interactions and the inﬂuence of environmen tal factors on dep endence structures are robust across all sp eciﬁcations examined. 7.7 Sensitivit y to link age disequilibrium and lo cus lab el permutation A critical concern asso ciated with Section 3 is whether sharing the parameters u r and v r across all genes implies non-exc hangeability of lo cus lab els. T o address this, we conduct p erm utation exp erimen ts wherein the lab els of lo ci within eac h gene are randomly p erm uted prior to re-analysis. The results are en tirely consistent with the original analyses based on non-p erm uted data. F or the gene-gene interaction simulation, the p osterior probabilit y measures of genetic eﬀect remain strongly suggestive of signiﬁcance; for the null simulation, these measures correctly indicate no genetic inﬂuence. These results conﬁrm that our mo del does not imp ose non-exc hangeability of lo cus lab els. 7.8 Summary of sensitivit y analysis Across all sensitivit y in v estigations, consisten t patterns emerge. Hyp othesis test conclusions regard- ing whic h genes are marginally signiﬁcant, which gene-gene interactions are present, and whether en vironmental v ariables aﬀect genetic interactions are highly robust to reasonable v ariations in prior sp eciﬁcations and mo deling c hoices across all three modeling frameworks. Posterior magnitudes of correlation parameters and distance measures sho w some sensitivit y to prior informativeness, but these v ariations do not aﬀect the binary decisions in our Bay esian testing framework. Con ver- gence and mixing of MCMC algorithms are sensitiv e to certain choices, particularly the Dirichlet pro cess precision parameters and the oﬀset parameter in HDP sp eciﬁcations, but our primary sp eciﬁcations consistently provide adequate p erformance. The hierarchical nature of our Bay esian mo dels pro vides natural protection against prior misspeciﬁcation, with data information dominat- ing prior information for key parameters of interest. These ﬁndings collectively demonstrate that our Ba yesian nonparametric approaches are not unduly sensitive to sp eciﬁc prior choices and mo d- eling assumptions, lending credibilit y to the conclusions dra wn from both simulation studies and the real m yocardial infarction data analysis. 8 Discussion 8.1 Comparison with existing metho ds Our Ba y esian nonparametric approach diﬀers fundamen tally from standard GW AS metho ds in m ul- tiple asp ects. T raditional GW AS typically emplo y logistic regression mo dels that examine disease status conditional on genotypes ( P ( Y | X )). In contrast, our approach mo dels genot yp es condi- tional on disease status ( P ( X | Y )), which provides a diﬀeren t p ersp ective on genetic asso ciations. This inv ersion of the mo deling relationship allows us to directly address p opulation stratiﬁcation 18 through mixture mo dels while naturally accommo dating complex dep endence structures among genetic v arian ts. F or handling population structure, traditional metho ds often rely on principal comp onen t analy- sis (PCA) for adjustment or conduct stratiﬁed analyses by presumed ancestry groups. Our approach uses Dirichlet pro cess mixtures to nonparametrically mo del p opulation substructure, allowing the n umber and characteristics of sub-p opulations to b e inferred from the data rather than pre-sp eciﬁed. This can pro vide more accurate adjustmen t for p opulation stratiﬁcation, particularly in admixed p opulations where discrete ancestry categories may not adequately represen t the con tinuum of genetic bac kgrounds. In mo deling in teractions, traditional approaches t ypically use regression co eﬃcients for sp eciﬁc in teraction terms (e.g., pro duct terms in logistic regression). Our HDP approac h represen ts in- teractions through hierarc hical sharing of parameters, capturing dependencies without requiring sp eciﬁcation of particular in teraction forms. This allo ws us to disco ver complex interaction pat- terns that migh t not corresp ond to simple product terms, p otentially revealing ric her biological relationships. Compared to our previous matrix-normal based mo dels (Bhattachary a and Bhattachary a, 2020), the HDP mo del oﬀers several adv an tages: sub ject-sp eciﬁc rather than population-wide gene-gene in teractions, nonparametric rather than parametric dependence on environmen t, and more in ter- pretable sharing structures through the Chinese restauran t pro cess analogy . Uncertain ty quan tiﬁcation diﬀers substantially betw een the approac hes. T raditional metho ds pro vide conﬁdence in terv als based on frequen tist inference, while our fully Ba y esian approach yields p osterior distributions for all quan tities of in terest. This includes not only p oint estimates but also complete c haracterizations of uncertaint y ab out p opulation structure, interaction patterns, and mo del complexit y . The Bay esian approac h naturally incorp orates multiple testing considerations through prior distributions and posterior probabilities, a voiding the need for arbitrary signiﬁcance thresholds or correction pro cedures. Computational scaling presents diﬀerent c hallenges. T raditional pairwise testing scales as O ( p 2 ) where p is the n umber of SNPs, b ecoming prohibitive for genome-wide data. Our gene-lev el analysis scales as O ( J 2 ) where J is the num b er of genes, which is t ypically m uc h smaller than the n umber of SNPs. While our HDP mo del is computationally intensiv e p er test, the reduction in dimensionality through gene-level mo deling and eﬃcien t parallel implementation makes it feasible for realistically sized datasets. T able 1: Comparison of metho dological approaches for genetic asso ciation studies. Our Bay esian nonparametric framework oﬀers complementary strengths to traditional metho ds, particularly for complex in teraction analysis and uncertaint y quantiﬁcation. Asp ect T raditional GW AS Our Bay esian Nonparametric Approac h Mo deling framew ork Logistic regression: P ( Y | X ) Conditional genot yp e: P ( X | Y ) P opulation structure PCA adjustment or stratiﬁed anal- ysis Diric hlet pro cess mixtures In teraction mo deling Regression co eﬃcien ts Co v ariance structures Uncertain ty quantiﬁca- tion Conﬁdence in terv als P osterior distributions Computational scaling O ( p 2 ) for pairwise testing O ( J 2 ) for gene-lev el analysis Key adv an tage In terpretable eﬀect sizes Flexible dep endence structures 19 Our approac h complements rather than replaces traditional methods in several wa ys. Re- searc hers might use logistic regression for initial screening of marginal associations, providing in- terpretable eﬀect sizes for individual v ariants. Our Ba yesian nonparametric metho ds could then b e applied to selected genes or path wa ys to unco ver complex in teraction patterns and popula- tion structure that might modify or explain the marginal asso ciations. The ﬁndings from b oth approac hes could b e in tegrated through meta-analysis or hierarchical mo deling frameworks that com bine estimates from diﬀerent methodological p ersp ectiv es. F or genome-wide disco very , traditional metho ds remain essen tial due to their computational eﬃciency and interpretabilit y . F or fo cused inv estigation of sp eciﬁc biological pathw ays or complex traits where interactions are suspected, our approac h oﬀers additional insigh ts that might b e missed b y marginal asso ciation testing alone. The tw o approac hes can also inform eac h other: ﬁndings from our interaction analyses migh t suggest sp eciﬁc interaction terms to test in traditional mo dels, while signiﬁcant marginal asso ciations from traditional analyses might guide the selection of genes for more detailed in vestigation with our metho ds. 8.2 Limitations and robustness considerations 8.2.1 Fixed mixture w eights The c hoice of ﬁxed mixture weigh ts π m = 1 / M w arrants careful consideration, as it represents a departure from the more common approach of placing a Diric hlet prior on the mixture weigh ts. Our c hoice is empirically justiﬁed by ﬁndings from previous work, which demonstrated that ﬁxed equal weigh ts outp erform Diric hlet priors in estimating the true num b er of mixture comp onents in ﬁnite mixture models with Diric hlet process priors. The Diric hlet prior tends to pro duce man y very small weigh ts, eﬀectively underestimating the num b er of comp onents that con tribute meaningfully to the mixture. Fixed equal weigh ts av oid this shrink age tow ard fewer comp onents, allo wing b etter reco very of the true mixture structure. F rom a computational p ersp ective, ﬁxed weigh ts simplify the Gibbs sampling up dates by elim- inating the need to sample the w eight parameters. This reduces computational complexity and impro ves mixing of the Mark ov chains, particularly in high-dimensional settings with man y mix- ture components. The simpliﬁcation comes with the cost of assuming equal prior weigh t for all comp onen ts, but in practice, the p osterior distribution adapts to the data through the allo cation of sub jects to comp onen ts, eﬀectiv ely learning the relativ e importance of diﬀerent comp onents despite the equal prior w eights. Theoretical support for ﬁxed w eights comes from results on p osterior consistency of Dirichlet pro cess mixture mo dels (Mukhopadhy ay and Bhattachary a, 2021). Under mild conditions, both random w eights (from a Diric hlet prior) and ﬁxed equal weigh ts lead to the same asymptotic p osterior inference as the sample size increases. 8.2.2 Mo del complexit y and interpretabilit y While our HDP mo del oﬀers substan tial ﬂexibility in capturing complex genetic architectures, this ﬂexibilit y comes with challenges related to mo del complexity and interpretabilit y . The three-level hierarc hical structure with Diric hlet pro cesses at eac h lev el inv olv es many parameters with complex dep endencies. The Chinese restaurant pro cess analogy helps with conceptual understanding, but detailed in terpretation of p osterior results requires careful examination of sharing patterns across m ultiple levels. Substan tial computational resources are required to ﬁt the HDP model, particularly for datasets with large num b ers of genes or samples. Our parallel implemen tation helps address this challenge, 20 but users still need access to computing clusters or high-performance workstations. The MCMC algorithms require careful tuning and con vergence assessmen t, with runtimes measured in days for full analyses. While these computational demands are substan tial, they are b ecoming increasingly feasible with mo dern computing infrastructure and algorithmic impro vemen ts. Careful prior sp eciﬁcation is essen tial for obtaining stable and meaningful results from the HDP mo del. The precision parameters α G,ik , α G 0 ,k , α H and their regression co eﬃcients require thoughtful sp eciﬁcation based on domain kno wledge or sensitivity analysis. Inappropriate prior c hoices can lead to po or mixing, conv ergence issues, or biased inference. W e pro vide default settings based on our experience with genetic data, but users should assess sensitivit y to these choices in their sp eciﬁc applications. Exp ertise in Bay esian nonparametrics is v aluable for prop erly implemen ting and in terpreting the HDP mo del. Concepts suc h as Dirichlet pro cesses, Chinese restaurant pro cesses, hierarchical mo deling, and MCMC diagnostics ma y be unfamiliar to researc hers trained in traditional genetic epidemiology . T o improv e accessibility , we shall pro vide softw are with user-friendly in terfaces, detailed do cumentation, and tutorial examples. W e shall also oﬀer workshops and training materials to help researc hers develop the necessary skills to apply these metho ds eﬀectiv ely . Despite these challenges, the in terpretability of the HDP mo del can b e enhanced through appro- priate visualization and summary measures. Posterior distributions of key quantities–suc h as the n umber of mixture comp onents at eac h level, gene-gene correlation patterns for individual sub jects, or sharing probabilities b etw een cases and controls–can be presented in in tuitive graphical formats. Comparativ e analyses showing ho w results diﬀer from simpler models can highlight the v alue added b y the additional complexity . Biological interpretation can b e facilitated by connecting statistical ﬁndings to kno wn pathw ays and functional annotations. 8.3 F uture directions Sev eral promising directions exist for extending and improving our Bay esian nonparametric frame- w ork for genetic in teraction analysis. First, high-dimensional extensions are needed to make the metho ds scalable to whole-genome data. Current implementations fo cus on gene-level analysis with selected genes, but applications to genome-wide data would require further computational enhancemen ts. Developing sparse versions of our mo dels that encourage parsimon y in in teraction structures could also impro ve scalabilit y while maintaining interpretabilit y . Second, integration with functional genomic data could enhance biological interpretation and statistical p o w er. Incorp orating information from gene expression, meth ylation, chromatin ac- cessibilit y , or other omics la yers could help prioritize genes and interactions based on functional relev ance. Hierarc hical mo dels that join tly analyze m ultiple data t yp es could reveal connections b et w een genetic v ariation, regulatory mechanisms, and phenot ypic outcomes. Suc h integrativ e ap- proac hes align with systems biology p ersp ectives that view biological pro cesses as interconnected net works rather than isolated components. Third, longitudinal mo deling approaches could capture dynamics in genetic and environmen tal factors ov er time. Man y complex diseases dev elop through pro cesses that unfold ov er years or decades, with genetic risk factors p otentially in teracting diﬀeren tly with environmen tal exp osures at diﬀeren t life stages. Extending our framew ork to handle repeated measures or time-to-ev ent data w ould enable in v estigation of ho w genetic interactions inﬂuence disease progression and timing. This could pro vide insigh ts in to critical p erio ds for interv ention and personalized risk prediction o ver the life course. F ourth, causal inference extensions could help distinguish correlation from causation in gene- en vironment in teractions. While observ ational data alone cannot establish causation, incorp orating 21 instrumen tal v ariables, Mendelian randomization principles, or structural equation modeling ap- proac hes could strengthen causal interpretations. Ba yesian methods are particularly well-suited for causal inference b ecause they naturally incorp orate uncertain t y ab out causal structures and mec hanisms. Dev eloping causal versions of our mo dels would enhance their utilit y for informing in terven tions and public health decisions. Additional future directions include extending the models to handle more complex p edigree or family data, incorporating spatial information for geographically structured p opulations, dev eloping online learning algorithms for streaming genetic data, and creating user-friendly soft w are pac k ages with graphical in terfaces for non-statisticians. Cross-disciplinary collab oration b et w een statisti- cians, geneticists, and biologists will b e essential for adv ancing these metho dological developmen ts while ensuring their relev ance to substan tive scientiﬁc questions. 8.4 Conclusion W e hav e synthesized a comprehensive Bay esian nonparametric framew ork for studying genetic inter- actions in case-con trol studies. Our HDP mo del represents a signiﬁcan t adv ance ov er previous ap- proac hes by pro viding a ﬂexible, nonparametric represen tation of gene-gene and gene-en vironment in teractions that accommo dates sub ject-sp eciﬁc eﬀects, population structure, and complex dep en- dence patterns. The three-lev el hierarch y with Dirich let pro cesses at each lev el creates a ric h dep endence structure that can capture ho w environmen tal factors mo dulate genetic interactions at m ultiple levels. Key inno v ations of our approach include: sub ject-sp eciﬁc rather than p opulation-wide gene- gene interactions, allo wing for heterogeneit y in how genes in teract across individuals; environmen tal mo dulation of dep endence structures at multiple lev els (individual, gene-group, and case-control), pro viding a nuanced representation of gene-en vironment interactions; automatic determination of p opulation structure through Dirichlet pro cess mixtures, av oiding the need to pre-sp ecify the num- b er of sub-p opulations; and eﬃcient computational implementation through parallel MCMC algo- rithms that lev erage the conditional indep endence structure of the mo del. Applications to my o cardial infarction data demonstrate how these metho ds can uncov er biolog- ical insigh ts not readily obtainable from standard approaches. W e identiﬁed signiﬁcan t gene-gene in teractions with a protective eﬀect, disco v ered sub ject-sp eciﬁc patterns of genetic risk, and re- v ealed how sex mo diﬁes genetic architecture. These ﬁndings illustrate the v alue of mo ving b eyond marginal asso ciation testing to consider interactions and p opulation structure in genetic epidemi- ology . The hierarc hical Dirichlet pro cess mo del represen ts a particularly ﬂexible extension that accom- mo dates heterogeneous and context-dependent gene-en vironment in teractions. By allowing sharing patterns to v ary across individuals, genes, and groups in a data-driv en manner, this mo del can disco ver complex relationships that migh t b e missed by parametric approaches. The application to m yocardial infarction data rev ealed diﬀerences in genetic clustering patterns and latent subgroups with distinct risk proﬁles, suggesting p oten tial etiological heterogeneity . While our metho ds require substan tial computational resources and statistical exp ertise, we b eliev e these barriers are diminishing as computing p o w er increases and statistical training im- pro ves. By making our implemen tation publicly av ailable and providing educational resources, w e hop e to facilitate wider adoption of Ba yesian nonparametric metho ds in genetic epidemiology . The increasing recognition of complexit y in genetic architectures–with interactions, heterogeneity , and con text-dep endence pla ying imp ortan t roles–creates a growing need for metho dological approaches that can address this complexit y in a principled manner. F uture metho dological developmen ts should fo cus on impro ving scalabilit y , integrating m ultiple 22 data types, extending to longitudinal settings, and strengthening causal interpretations. Substan- tiv e applications should explore diverse complex diseases and populations, p otentially revealing new biological insigh ts and therapeutic targets. As genetic data con tin ue to gro w in size and complexit y , Ba yesian nonparametric metho ds oﬀer a ﬂexible framework for disco very that resp ects the inheren t uncertain ty and in terdep endence in biological systems. Ac kno wledgmen ts W e thank the reviewer for constructive comments that improv ed this man uscript. W e also thank DeepSeek for some pro ofreading. Conﬂict of in terest The authors declare no conﬂicts of in terest. References Bhattac harjee, S., W ang, Z., Ciampa, J., Kraft, P ., Chano ck, S., Y u, K., and Chatterjee, N. (2010). Using Principal Comp onents of Genetic Variation for Robust and Pow erful Detection of Gene- Gene In teractions in Case-Con trol and Case-Only Studies. The Americ an Journal of Human Genetics , 86 , 331–342. Bhattac harya, D. and Bhattachary a, S. (2018). A Bay esian Semiparametric Approach to learning Ab out Gene-Gene Interactions in Case-Con trol Studies. Journal of Applie d Statistics , 45 , 1–23. Bhattac harya, D. and Bhattachary a, S. (2020). Eﬀects of Gene-Eevironment and Gene-Gene Inter- actions in Case-Control Studies: A Nov el Ba y esian Semiparametric Approach. Br azilian Journal of Pr ob ability and Statistics , 34 , 71–89. Bhattac harya, D. and Bhattac harya, S. (2024). Gene-Gene and Gene-En vironment Interactions in Case-Con trol Studies Based on Hierarchies of Diric hlet Pro cesses. Statistics and Applic ations , 22 , 327–360. Dey , K. K. and Bhattac harya, S. (2016). On Geometric Ergo dicity of Additiv e and Multiplicative Transformation based Mark ov Chain Monte Carlo in High Dimensions. Br azilian Journal of Pr ob ability and Statistics , 30 , 570–613. Also av ailable at “http://arxiv.org/pdf/1312.0915.p df ”. Dutta, S. and Bhattac hary a, S. (2014). Mark ov Chain Monte Carlo Based on De- terministic T ransformations. Statistic al Metho dolo gy , 16 , 100–116. Also av ailable at h ttp://arxiv.org/abs/1106.5850. Supplement a v ailable at Larson, N. B. and Schaid, D. J. (2013). A Kernel Regression Approach to Gene-Gene In teraction Detection for Case-Con trol Studies. Genetic Epidemiolo gy , 37 , 695–703. Ma jumdar, A., Bhattachary a, S., Basu, A., and Ghosh, S. (2013). A Nov el Bay esian Semiparametric Algorithm for Inferring Population Structure and Adjusting for Case-control Association Tests. Biometrics , 69 , 164–173. 23 Mukhopadh ya y , S. and Bhattac hary a, S. (2021). Bay esian MISE Conv ergence Rates of Mixture Mo dels Based on the Poly a Urn Mo del: Asymptotic Comparisons and Choice of Prior Param- eters. Statistics:.A Journal of The or etic al nd Applie d Statistics , 55 , 120–151. Also av ailable at h P apaspiliop oulos, O. and Rob erts, G. O. (2008). Retrosp ective Marko v Chain Mon te Carlo metho ds for Diric hlet Pro ce sses Hierarc hical Mo dels. Biometrika , 95 , 169–186. T eh, Y. W., Jordan, M. I., Beal, M. J., and Blei, D. M. (2006). Hierarchical Dirichlet Pro cesses. Journal of the A meric an Statistic al Asso ciation , 101 , 1566–1581. W ang, X., Elston, R. C., and Zh u, X. (2010). The Meaning of In teraction. Human Her e dity , 70 , 269–277. 24

Bayesian Nonparametrics for Gene-Gene and Gene-Environment Interactions in Case-Control Studies: A Synthesis and Extension

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment