Enes Causal Discovery

Enes The proposed architecture is a mixture of experts, which allows for the model entities, such as the causal relationships, to be further parameterized. More specifically, an attempt is made to exploit a neural net as implementing neurons poses a …

Authors: Alexis Kafantaris

Enes Causal Disco v ery Alexis Kafan taris Marc h 2026 Abstract This pap er is ab out a neural ev olutionary metho d that is meant to address causal disco very . F urthermore, an interesting approac h is implemen ted to train and ev aluate a MoE and compare it to baseline causal disco very algorithms; this approach exploits the MoE’s abilit y to learn b oth linear and nonlinear distributions of causal relationships among pairs of graph no des and classify them. More specifically , the Enes(Edge no de edge similarit y) program, uses a mixture of exp erts neural net to determine the graph edge-no de-edge triplet class from random non linear generated SEM; this wa y training is ac hieved in resp ect to causal pattern. The mo del inputs are graphs and their outputs are predictions regarding the triplet patterns and that is essentially the causal map for the problem, the p enalties used are DA G enforcement p enalty , and P earson correlation p enalty . In addition, this architecture is ev aluated using the linear Pearson co efficient baseline as well as other state of the art solutions. Nonetheless, the primary fo cus is the sachs dataset, but then Mic haelis-Menten dynamics is used in order to determine efficiency and scalability . Finally the mo del when compared with a statistical and a constrained programming baseline and it pro duces go o d results that indicate that it is performing in a stable and efficien t w ay . 1 1 In tro duction This paper is ab out a highly sophisticated arc hitecture that addresses causal disco very using a neural net; the arc hitecture is b oth robust and w ell p erforming. T o begin with, causal disco very on sachs dataset is one interesting as w ell as classical problem. Causal discov ery has many real life applications, such as finding the ground truth in service design processes. Nevertheless, starting it mak es more sense to try and generalize from sac hs data to other problems due to the complexit y and the nature of it. Determining the model is also imp ortant, as there are v arious to ols that solv e observ ational causal discov ery and are also considered SOT A; most of them are not neural nets. In this particular instance, a neural net is used although it is rarely considered the b est alternativ e due to the lack of formal training data. More specifically , causal disco v ery is observ ational and in terven tional; ho wev er, using random noise seems to suffice. Moreo v er, the mo del is a mixture of exp erts whic h has excellent modularizing capabilities. In case of interv en tional data, the mo del w ould b e trained to understand patterns within a giv en context of interv entions. That is not the case, as here randomly generated data and some MM dynamics hav e b een used in order to iden tify the sachs protein and other MM dynamics data. F urthermore, the mo del is used to classify the sac hs data b oth real and artificial. The results are ev aluated and, compared to some baselines, they are go o d. Lastly , the situation is analyzed and a brief discussion about the setup as w ell as the results explains the technical aspect of that ven ture. The pap er is analyzed as follows, section t wo is the literature review, thereafter, the metho d and the mo del are describ ed in section three, and the results are presen ted in section four, finally section five is the conclusion. 2 2 Literature Review There are many programs used for causal disco very . Ranging from a simple P earson coefficient[6], to more complicated programs[3] to ev en more sophisti- cated neural nets[10]. An attempt is made every-time to determine the causal edges of a difficult and complicated graph[4]. The graph describes a protein, a biological entit y that is astutely organized b y nature. Emerging are some forms of relationships, i.e. causal relationships, and these can b e further examined; moreo ver, causal relationships can be classified to determine which edges are related and whic h are not. T o achiev e causal discov ery , one needs to pro vide a graph as input and determine some relationships. And the observ ational approach has b een used, as the main fo cus is the Karen Sac hs b enchmark[4]. There are other ob jectives in mind, like discov ering the silver standard ground in service design datasets but these are secondary . In other words, this mo del do es not know anything about the real data, so it uses or some math in order to determine causal relationships among the nodes-edges. Additionally , many other approac hes ha ve been implemented, especially sta- tistical algorithms like PC[9], or mathematical algorithms lik e NOTEARS[3]. F rom these and others, it seems like the best alternative is a fast to ol with no induction base. Although, there are notably machine learning mo dels and neural net works for causal disco very , presumably b oth interv en tional and observ ational[10], there are few sp ecific systems[13][14]. Moreo ver, only recen tly[14] has the in- duction base been addressed using the notears formula and a neural net. One of the ma jor issues that arise is the data; usually the saying garbage in garbage out refers to a mo del where the training o ccurs with p o or data. How- ev er, in the scheme of observ ational causal discov ery the GIGO is unav oidable. F or this reason a p oin t is made to obtain at least partially useful induction basis, due to the fact that an induction basis has its merits[14]. Here training a neuron for data similarity mak es absolute sense as according to the actual exp erimen t[4] there are similar motifs to be found. F urthermore scaling is an issue[3]; although the notears method for w orks it struggles due to the mathematical ov erhead. While on the other hand methods lik e PC[9] an induction basis and hence scale but can b e easily confused due to their static nature. How ev er, recently bridging this gap is the dagpa system[14] that uses the trace ob jective and can b e b oth scaled and p erform w ell in real datasets. A thorough system, how ever, would b e better; when it comes to sys- tems that are explainable, where the MoE arc hitecture mo dularizes ev erything, and is completely transparent. The prop osed architecture is a mixture of exp erts, whic h allo ws the mo del en- tities, suc h as causal relationships, to be further parameterized[5]. More sp ecif- ically , an attempt is made to exploit a neural net, as implementing neurons p oses a great c hallenge for this dataset. T o explain, a simple and fast Pearson co efficien t linear mo del usually achiev es go o d scores. An aggressive baseline that requires a really go o d mo del to ov ercome that is. Moreov er, there are ma- jor limitations when it comes to causal discov ery of observ ational data. Unlike 3 the sac hs[4] one did not use in terven tions but only prior kno wledge; the most prohibitiv e limitation is that of the data which is addressed. 3 Metho dology As mentioned ab ov e the prop osed architecture is a mixture of exp erts ph ysics informed; a physics informed net work[12] is a net work that uses ph ysical con- strain t as it’s core learning mechanism. Moreov er, to achiev e the ob jective, these constraints are merged in the training matrix. In addition, this net work has also p enalized the constraint as it w as found to pro duce more stable and robust results this wa y[5]. There are three primary similarities, whic h gov ern the ph ysics of the net work, namely cosine similarity , P earson coefficient, and ad- jacency matrix constraint. The mo del is, therefore, trying actively to minimize the similarities, in b oth a linear and a non linear fashion, of an edge-no de-edge prob e of the data. One of the key adv an tages of this method is the induction basis; induction allo ws faster inferences and scalabilit y . How ever, there is a sa ying : garbage in garbage out. Regarding causal disco very , one is confronted b y GIGO when it comes to training Enes. On the one hand, using relev ant data is lik e c heating b e- cause the mo del kno ws where the patterns come from, whic h is interv en tional[4]. On the other hand, using other data is confounded; at least that is ho w obser- v ational classification has to b e. Moreo ver, shap es and forms are generated and a middle ground of some MM kinetics is also emplo yed[15]. These are not fully adopted to main tain the un biased persp ective of the program. Lastly a case of stabilit y during the training is made; the program is already confron ted b y GIGO. There are also multiple ob jectives at hand and a compli- cated fusion los s. In order to op erate at its p eak capacity stabilization during its training would really help. F or this reason Langevin dynamic and sim ulated annealing optimization[16] are employ ed. This w ay the c haotic gradient descen t slop e b ecomes more stable and more predictable allowing for higher precision. A classifying task is at hand. The mo del is set up and the hyper-parameters are optimized; it is a fast and a stable mo del called Enes. Now predictions will b e examined for both the sachs dataset and the MM kinetics, without c hanging the unbiased data and using fully MM generator. F ollowing is a drawing of the neural net arc hitecture: 4 Input: Raw Observ ational T riplet X i,j,k ∈ R 3 × T Shared F eature Extraction Expert A: Non linear Expert Expert B: Linear Expert Exp ert Gating Mec hanism S cor e = w · A + (1 − w ) · B Softmax Classification Output: Causal Motif Prediction Adjacency Constrain W 2 Penalt y Pearson correlation P ( edge ) · (1 − | r | ) Cosine Similarity cos( θ A,B ) Consistency Figure 1: Final arc hitecture of the Enes Mo del. The mo del is comprised of a dual exp ert arc hitecture and graph metrics that are treated as the physics of the netw ork. It inputs graph no des and outputs causal pattern classification. 4 Results and Discussion The Enes mo del classifies edge no de edge similarity based on GIGO data and some MM that it has received during the training. The gigo can t get b etter without c heating. F or the ev aluation part some the following metrics are defined; 5 Ev aluation Metrics TP = X i,j I ( ˆ W i,j = W i,j  = 0) Prec = TP P I ( ˆ W i,j  = 0) , Rec = TP P I ( W i,j  = 0) , F1 = 2 · Prec · Rec Prec + Rec SHD = X i

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment