A statistical perspective on higher-order interactions modeling

A statistical p ersp ectiv e on higher-order in teractio ns mo deling Catherine Matias 1,2,3* 1* Sorb onne Univ ersité, F-75005, P aris, F rance. 2 Univ ersité Paris Cité. 3 Lab oratoire de Probabilités, Statistique et Mo délisation, CNRS. Corresp onding author(s). E-mail(s): catherine.matias@math.cnrs.fr ; Abstract Mo deli ng higher-order in teractions (HOI) has emerged as a c rucial c hallenge in co m plex systems ana l ysis, as ma n y phenomena cannot b e fully captured by pairwise relationships a l one. Hyp ergraphs, whic h generalize g raphs b y allo wing in teractions among more than two entities, provide a p o werful f ramew ork for represen ting suc h in tricate dep endencies. Adopting a statisti cal a n d probabilisti c p e rspective on h yp erg ra ph mo deling, we prop ose a guided tour through this emerging researc h area. W e begin b y illus trating the ubiquity of HOI in real-w o rl d systems, w here inte rac- tions of ten inv olve groups of entities rather than isolated pa i rs. W e then introduce the foundatio n al concepts and notations o f hypergraph s , discussing their descrip- tive statistics, graph-b ase d representations, and the chall enges asso ci ated with their complexit y . W e further explo re a v ariet y o f statistical mo dels for hyper- graphs a nd a ddres s the cri tical task of no de clusteri ng. W e conclude b y o utl ining some op en c hallenges in the ﬁeld. Keyw ords: h yp ergraph model, node clustering, 1 In tro duction The growing in terest in mo deling higher-order interactions (HOI) arises from the ackno wledgement that man y phen omena a re fu ndamen tally more complex than what pairwise relationships alone ca n capture. While netw orks and their mathematical represent ation a s graphs capt ure interactions bet w een pairs of en tities, HOI ar e inher- en tly of a diﬀeren t nature, as they may inv olve the interaction of more than t wo 1 elemen ts. T aking int o accoun t HOI oﬀers a r icher a n d more expressiv e w ay to mo del complex interactions across diverse ﬁeld s, r anging f rom so cial net work analysis (ear ly ackno wledged in Simm el 190 2 a , b ) o r co-authorship relations ( Ro y and Ravindran 2015 ) to eco logical systems ( Muyinda et al. 2 020 ), neurosciences ( Chelaru et a l. 2021 ) or c hemistry ( Restrep o 20 26 ), among others. Recen t reviews on HOI include Battiston et a l. ( 20 2 0 ); Bic k et al. ( 2023 ) ; T or r es et al. ( 20 21 ) and mostly fo cu s on the complex systems po i n t of view from physics. W e choose to focus o n the statist ic a l mo d e ling and pr ob abilisti c p o int of view of HOI (also adopted in Lee et al. 20 25 ) and will mostly fo cus o n hypergraphs. What this r eview is not ab out. HOI analysis comes after a data-collection step , in wh ic h HOI could either b e directly observed, or inf e rr e d from preliminary data (e.g. Lizotte et al. 2023 ). The construction or the inference of these HOI is not discussed here. Simplicial complexes ( Bianconi 2021 ) are often presented as an alternative to h yp ergraphs for mo deling HOI. These come with no de p ositions in a topolog ic al space, a feature that co uld reveal q uite useful. Ho wev er valid structures impose a nestedness prop ert y , where every subset of in teracting ent ities is assumed to b e in teracting. Whil e this assumpt ion m ay b e appro- priate for e.g. for proximit y interactions (see next section), in most applications this appea r s too restrictive (the co-authorship exemple b ein g the most pro emin en t situ- ation where this assumption is not appropriate). Moreov er, even in cases where this assumption migh t not b e a strong constrain t, one migh t question the appropriateness of introducing this suppl emen tar y information int o the mo deling (for instance b ecause it might in troduce additional noise). An in teresting explor a t ion ab out the level of sim- pliciality (i.e. the inclusion structure) of HOI ma y b e found in Landry et al. ( 2024 ). T o keep our cont ribution relatively concise, neither dynamics on HOI nor tempora l asp ec ts o f HOI will b e cov ered here. Finally , Bay esian h yperg raphs are probabilistic models where dependencies of a set of random v a riables is descr ibed by HOI, g ener a l- izing Ba yesian netw orks ( Javidi an et a l. 2020 ). This topic is thus not concerned with observe d HOI on whic h we fo cus here. 2 Examples of s yste ms showing HOI Borrowing from the approa c h of Holme in his review o f tempora l netw orks ( Holme 2015 ), we start by a quick guided tour o n dataset t y p es and more generally on systems where HOI naturally o ccur. Interestin gly , an important part of the systems cited b y Holme app ear to b e HOI in their r a w format, subsequen tly reduced to pairwise in teractions. Rather than providin g an exhaustiv e list of datasets or publications where HOI appear , we stress the p oten tial u biquit y of these t ype o f data. Notice also that any bipartite net work naturally pro duces a HOI, or said diﬀerently many HOI hav e been considered up to now as bipartite net works. W e discu ss in Section 3.2 the diﬀerences betw een these tw o approaches. 2 So cial Scienc es and Etholo gy. Socia l in teractions are of pr imar y in terest and motiv ated a v ast ma jority of the mo d- eling developmen ts in netw o rk science. While dyadic in teractions are the simplest, early ackn owled gemen t of th e role and importance of larger in teractions app eared in the Sociolog y literature ( Simmel 190 2 a , b ) . Hum ans and animals (separately) are the classical en tit y sets considered in so cial interactions. No w, most of these in teractions are either sa m pled as raw HOI or ma y b e naturally constructed from raw data in the s a m e wa y as pairwise interactions did. This is the case in particular for radio- frequency iden tiﬁcation data where individual p ositions are recorded and in teractions o ccu r b et ween ent ities lying within a ball of a g iven radius; observ a t ions from the ﬁeld where humans / animals gathering are recorded 1 . Comm unications may include the cla s sical email exchanges (with multiple r ecei vers) o r conference calls (betw een h umans) a s well as non-v erbal gro u p interactions ( W ebb et al. 2023 ). Scientiﬁ c collab- oration (e.g. co-authorship) are probably the most proeminent example of HOI (e.g. Battiston et al. 2025 ; Roy and Ravindran 2015 ), and is also the perfect to y example to explain the diﬀerence betw een HOI and the clique of all pairwise int eractions. V ariants of these data include softw are developmen t a s pu blished in web platforms ( Sch ueller et al. 2022 ). A broader approa c h to collaboration inv olves systems wh ere individuals form a H OI wh en they serv e together o n the same company b oard ( Aksoy et al. 2020 ). In the same wa y , t he no w classical “Les Misérables” dataset describing ho w th e c har - acters from Victor Hugo’s no v el in teract in the diﬀerent scenes of the b ook has b een studied from the HOI p o in t of view ( Aksoy et a l. 2020 ). The same ca n b e done for actors playing in movies (e.g. wit h data extracted from t he Internet Movie DataBase). Natur al Scienc es. Neuroscience and connectomics is an imp ortan t so u rce of HOI data inside h uman brain, with recen t approac hes relying on fun ctional magnetic resonance imaging data ( San toro et al. 2024 ) or electro- and magneto-encephalogr a m s sig na ls ( Bilbao et al. 2026 ). HOI also app ear when co n sidering genetic disorders, with genes m utations implicated in a sp eciﬁc disease ( Aksoy et al. 20 20 ) or metab olic path w ays where in teracting ent ities are metabolites ( Cervellini et al. 202 6 ) . Mor e generally , HOI ar e used in chemistry to describe comp o n en ts inv olved in a chemical reaction ( Flamm et al. 2015 ). E co logy has seen a surg e in atten tion tow ards HOI, ar oun d the idea that most pairwise interactions are in fact mediated by additional actors ( Bimler and Mayﬁ eld 2023 ; Mayﬁeld and Stouﬀer 2017 ). 3 Concepts, no tati on and r epresen tations of HOI W e start this section by providing th e basic deﬁniti ons around the concept of hyper- graphs, that will b e o u r standard represen tation of HOI. W e then con tin ue with gr a p h represent ations of HOI, emphasizing their limi tations. 3 1 2 3 4 5 6 7 e 1 e 2 e 3 Fig. 1 A h yp ergr aph with 7 nodes and 3 hyperedges: e 1 = { 1 , 2 , 3 , 4 } , e 2 = { 5 , 6 , 7 } and e 3 = { 3 , 6 } . 3.1 Hypergraphs A h yperg raph (Fig. 1 ), denoted H = ( V , E ) , comprises a set of (undistinguishable) nodes V = { 1 , . . . , n } and a set of hyperedges E ⊂ P ( V ) , where P ( V ) is the set o f all subsets o f V . In other words, eac h hyperegde e ∈ E is a subset o f no d es in V and represent s an interaction betw een th ose entities. The or der of H is its num b er of no des |V | = n ; while its si ze is its n um b er of hyperedges |E | = M . The s i mplest h yp ergraphs are binary (hyperedges record the presence/absence of in teractions betw een subsets of no d es) but ma y b e generalized to multi ple (or w eigh ted) in teractions. Then the h yp ergraph H = ( V , E , w ) comes with a weigh t function w : P ( V ) → N ∪ { 0 } such that ∀ e / ∈ E , we ha ve w ( e ) = 0 , and w ( e ) ∈ N ⋆ otherwise. The weigh t counts ho w man y times a h yperedge a p pea rs in th e hypergraph. Multiple h yp ergraphs can be view ed as hypergra p hs where the set of hyperedges E is allow ed to b e a multiset (i.e. some hyperedges may app ear s everal times). A binary h yperg raph is a particular ca se of a w eighted hypergr aph with wei ght function being the indicator fu nction w ( e ) = 1 { e ∈ E } (i.e., each hyperedge has m ultiplicit y 1). The incidenc e mat rix H of the hypergra ph has dimension |V | × |E | and entries H ( v , e ) = 1 { v ∈ e } . A hypergraph is said to be s -unifo rm if it only con tains hyper- egdes of cardinality s (also called the hyperedge size ), in whic h case, it can b e represent ed through a tensor matrix A ⊂ V s with dimension s and entry indexed b y ( i 1 , . . . , i s ) given by 1 {{ i 1 , . . . , i s } ∈ E } . A graph is a particular case of a 2-uniform h yperg raph with A being the classical adjacency matrix. Sometimes hyperedges a re allow ed to be multiset s, in which ca s e a same node may be in v olved several times (i .e. with some multip licity ) in a same h yperedge. W e call t hese multiset hyp er gr a phs . F or example a s elf-lo o p { v , v } ∈ E is a m ultiset h yp eredge with size 2. Descriptive statistics on hyp er gr aphs. Some of the concepts introduced to descr ib e graphs ﬁnd a direct g eneralization in h yperg raphs, while other, because of the increased complexit y o f h yp ergr a phs v er sus graphs, induce more v ariability in their deﬁnitions. This is the case for the den- sit y . A basic deﬁnition woul d simply coun t the num b er of hyperedges divided by th e 1 see for e.g https://sociopatterns.org/ 4 maxim um n umber of suc h, th us in tro ducing d ( H ) = |E | P s max s =2  n s  , where s max is the largest h yper edge size observed. Note that such a deﬁnition implic- itly assumes that hyperedges of size larg er than s max are impo ssible. A more r eﬁned deﬁnition w ould consider that eac h hypergr aph H = ( V , E ) is a collection o f s -unif orm h yperg raphs H s = ( V , E s ) ov er a common set o f no des V , th us in troducing the sequence ( d s = d s ( H s )) s ≥ 2 of the frequencies of h yp eredges with size s , namel y d s ( H s ) = |E s |  n s  . Other v aria nts coul d be designed, relying on av erages o ver hyperedge sizes and measuring sligh tly diﬀerent characteristics of the data. In contrast to this ﬂexib ilit y and v ar iet y , the d egree of a no de is simply the n umber of h yp eredges it b elongs to: deg H ( v ) = P e ∈E 1 { v ∈ e } ; while the s ize o f a h yp eredge e is the num b er of nodes it contains: | e | = P v ∈V 1 { v ∈ e } . Node degrees (resp. h yperedge sizes) corresp ond to the row (resp. column) sums of the incidence matrix H . A weigh ted v ersion with ent ries H ( v , e ) = w ( e ) 1 { v ∈ e } gives rise to weighte d no de de gr e es obtained a s row sums, and weighte d hyp er e dges sizes obtained b y column sums. Cen tralit y measures rely on the notion of paths and describ e the prop ensit y of a no de (or an int eraction) to b e such that any information ﬂow passing b etw een 2 random nodes in the system will (frequen tly) pass through that no de (or in teraction). A k -path is a (ﬁnite) sequence of h yperedges where 2 successive elemen ts shar e a t least k common no des, with k = 1 being the weak est notion (in force in the cont ext of graphs). Note that introducing a width ov erlap k is cr ucial to capture the higher- order asp ect o f those structures. This further gives rise to k -distances b etw een t w o nodes, deﬁned b y the smallest length of any k -path bet w een them ( Aksoy et al. 2020 ). Cen tralit y measures can then be deﬁned from these distances. In the g r aph statistics literature, an important role is pla yed b y the concepts of tr ansitivity or clusterin g me asur es . These ar e inherent ly based on the notion of pairwise in teractions, as they qua ntify the prop ensit y that “a friend of your friend is your friend”. Such concepts do not hav e a natural g eneralization in the hypergraph w orld (though so me tentativ e deﬁn ition exist, s ee for e.g. K im et al. 2 0 23 ). Nonetheless these quan tities are also link ed wit h the frequency of “triangles” (i.e. cycles with length 3) and moving to the more genera l concept of moti f frequencies, one may naturally generalize these to the hypergra ph con text, with the only limitation of the increasing complexit y in the v a riety of motifs ( Juul et al. 20 24 ; Lotito et al. 2022 ). Large-scale h yp ergraphs c haracteristics. Whereas in th e early 2000s, a large bo dy o f literature explored the c har acteristics of real graphs on a lar ge sca le, lea ding to the formulation of g eneral laws suc h as t he degrees scale-free distribution or th e s mall- w orld prop erty , such large scale exploration has received little atten tion up to now. This co uld b e either due to the computational complexit y of these data or a p otentially 5 1 2 3 4 5 6 7 e 1 e 3 e 2 1 2 3 4 5 6 7 e 1 e 2 e 3 (a) (b) (c) Fig. 2 Graph represen tations of the h yp ergraph from Fi g. 1 . (a) Clique graph; (b) Line graph; (c) Bipartite graph. larger diversit y of the structures that would preven t from the emergence of general rules. On a mo derate scale, we ment ion that Do et al. ( 2020 ); Lee et a l. ( 2021 ) ha ve explored the characteristics of thirteen real- world h yp erg raphs from v arious domains, with a fo cus on the o v erlaps of h yperedg es for the latter r eference. Complexit y . W hile the n um b er of po ssible edges in a graph g rows quadratically with the n umber of nodes, the n umber of possible h yperedg es in a hypergraph gro ws exponentially with that n um b er. Indeed, a (simple) hypergr aph with n no des ma y cont ain at most P n s =2  n s  = 2 n − n − 1 h yperedges. This raises non t rivial c hallenges from the statistical inference p oint o f view and one p ossible approa ch to addressing this issue is men tioned in Secti on 4.3 when discussing the work by F ritz et al. ( 2026 ). 3.2 Graph represe n tations Due to their complexity , it is tempting to reduce h yp ergr aphs to simpler ob jects such as graphs (see Fig. 2 for an illustration), that are easier to handle. How ever this is at the cost of either lo osing information or rela x ing some constraint s, as we no w explain. Clique gr aph. The clique g raph o f a h yperg raph (also called 2- section, clique expansion or clique reduction) has the same set of nodes, and edges b etw een no des t hat share a h yperedge. Each h yp eredge e ∈ E in the hypergraph is in fact r e d uc e d into a complete clique in the gr a ph. A w eighted version can also be used, transferring par tial inf ormation about the hyperedges sizes to the pro jected graph. In any case, this naive repr es entation lo oses a lot of information and it is impossible to reco nstruct hyperedges from the clique graph. Line gr aph. The line graph of a h yp ergraph has vertices co rresp onding to the h yp eredges of that h yperg raph, and edges betw een ov erlapping hyperedges ( i.e. that share at least one node). Again, this representation lo oses information (ab out how man y and which nodes are shar ed) and (unique) recov ery of a h yper g raph from its line gra ph is not pos sible. The line graph is mostly used to summarize adjacency relations b etw een h yperedges (tw o hyperedges being adjacent when they share a no de). 6 Bipartite graphs space 1 2 3 e 1 e 2 e 3 e 4 (a) 1 2 3 e 1 e 2 (d) Hypergr aphs space 1 2 3 (b) 1 2 3 (c) Fig. 3 (a) A b ipartite graph G ; (b) Pr o jection of G into the space of m ultisets h yp ergraphs with self - lo ops, choosing the top nodes of G as the new set of no des. Hyperedges are { 1 , 2 } , { 1 } , { 1 , 2 , 3 } and { 1 , 2 } . The applications from (a ) to (b) are in vertible bijections, one being the inv erse of the ot her; (c) Pro jection of G on t he simple h yp ergraphs subspace: the multiplicit y of h yp eredge { 1 , 2 } and the self-lo op { 1 } hav e b een remov ed. (d) Embedding of the si mple hypergrap h from (c) i n the bipartite graphs space. Note that (a) and (d) are not the same bipartite graph. Bip artite gr aph. A mor e elabor a te gra ph r epresent ation of a hypergraph co nsists in considering its bipartite representation (or star-expansion graph), in which the hypergr aphs no des form a ﬁrst nodes part, while the set of hyperedges forms the second nodes part. An edge in the bipartite graph is drawn from an original node to an original hyperedge (no w a second part’s node) whenever it belong s to it in the hypergra ph. Un der some conditions, this is a lossless pro cess. More precisely , given a (simp le) bipartite g r aph and the choice of one part as the original set of no des, one can r econstruct a unique (m ultiset) hypergraph ov er this set o f no des, whic h may even tually co nt ain multiple h yperedges and self-loops ( see Fig. 3 ). In other words, bipartite gr a phs ma y be em bed- ded into a general space of hypergraphs and simple hypergra phs may b e pro jected in to bipartit e graphs. 4 Statistical mo dels of h yp ergr aphs 4.1 Randomness is in the hy p eredge: limitations with mo dels on bipartite graphs F rom the previous section, it seems natural to use bipartite graph models in order to derive hyperg raph mo dels. Ho wev er, this migh t be done only at some additional cost, as w e now explain. Random graphs mo dels always consider the set of no des V as deterministic and focus on the randomness in the links, a k a the edges in the gra ph. In particular, the n um b er of suc h links is most often ra ndom, excepted for the E r dős-Rényi v ar iant 7 G ( n, M ) , where the num b er M of edges is ﬁxed and their lo cations (among the  n 2  pairs of no des) a re random. This v ariant is asymptotically equiv alent to the G ( n, p ) one (where all edges appear independently with p robabilit y p ) in that if M = M n and p ∈ (0 , 1 ) sa tisfy | M n −  n 2  p | = O ( n p p (1 − p ) ) , then if th e pro babilit y of a n even t E tends to some c ∈ [0 , 1] under the dist ribution G ( n, M n ) , it also co n verges to the same v alue under th e distribut ion G ( n, p ) (see Łuczak 1990 ). No w, a statistical model ov er the bipartite representation of a h yp erg raph will also alwa ys consider a ﬁxed set o f bipartite no des, resulting in a ﬁxed n um b er of h yperedges in the h yper g raph. The only randomness w e ca n g et lies in which no des are inv o lved in each of the M in teractions, cor resp onding to the randomness in the formation of the links in the bipartite gr aph. Co n trarily to the G ( n, M ) case, ﬁxing the num b er of h yper edges in a mo del do es not in general lead to an a symptotically equiv alen t reformulati on of another mo del with random n um b er of hyperedges. As a consequence, hypergra ph mo dels deriv ed from bipartite gra phs are not the most gen- eral. F o r instance, a hypergraph stochastic blo ckm o del (SBM, see Section 5 b elow) is more genera l than the corresp onding bipartite SBM form ulation, as t he latter im pos es a group s tructure on the h yperedges (see Section A3 in the Supp. Mat. of Brusa and Matias 2024 ). 4.2 Uniformly random, conﬁguration, and preferen tial attach men t mo dels Generalizing the Erdő s - Rén yi random gra ph mo del yields uniformly ra ndom hyper- graphs. This approach in v olves uniformly sampling from the set of all s -uniform h yperg raphs deﬁ ned o ver a set of n nodes. Ho w ev er, m uch like the Erdős-Rényi mo del for graphs, this hypergr a ph mo del is ov er ly simplistic and ho mogeneous for meaningful statistical analysis o f real-w orld datasets. The exp onenti al ra ndom graph approach led to the prop osal of a β -mo del for h yperg raphs ( Stasi et al. 2014 ), where hyperedges o ccur independen tly , a nd the suf- ﬁcien t statistic of the mo del is the degree sequence (or nodes degrees speciﬁc to the hyperedges sizes). This mo del was further theoretically studied in Nandy and Bhattac harya ( 2024 ). Conﬁguration mo dels for random gra phs consist in uniformly sampling from the set o f all p oss ible graphs o v er n no des, while adhering to a prescribed d egree sequence. F or h yper g raphs, these w ere ﬁrst in troduced by Ghoshal et al. ( 2009 ), focusing on tri- partite and 3-uniform hypergraphs. Later, Cho drow ( 2020 ) extended this framew o rk to the non-uniform ca se. In these works, b oth no de degrees and h yperedge sizes remain ﬁxed—a consequence of relying on bi partite representations of hypergr a phs. The con- ﬁguration model is particularly v aluable for sampling gr a phs (resp. h ypergr aphs) that matc h the degree sequence (resp. and hyperedge sizes) of a n o bserved dataset, typ- ically via shuﬄ ing algorithms. As suc h, it is frequen tly emplo y ed as a null model in statistical analyses. How ever, exact sampling (as opp osed to approxim ate sa mpling) from this mo del presen ts signiﬁ cant challenges, especially for hypergraphs (see Section 4 in Ch o drow 20 2 0 , for more details). Preferent ial attac hmen t (P A) mo dels hav e b een prop osed in W ang et al. ( 201 0 ), where both the idea of hyperedge growth a nd hyperedge preferen tial att achm en t w er e 8 in troduced. Those ideas were later reﬁned in Guo et a l. ( 2016 ) and more r ecen tly in Jung et al. ( 2026 ). Barthelem y ( 2022 ) prop oses a v ery general form ulation for the probabilit y that a vertex b elo ngs to an edge, whic h th us boils dow n to relying on the bipartite graph representation. His approach comprises Erdős -Rén yi-lik e, conﬁgura- tion, (a sort of ) P A, a nd random geometric mo dels. The latter mo dels are designed to be generative and are not rea dily amenable to statistical inference. 4.3 Laten t space and blo ck mo dels Laten t space models (LSM) for hypergra phs r aise the issue of constructing a proxi mit y indicator or measure for a subset of more than 2 latent p ositions. T urnbu ll et al. prop o se a random geometric hypergraph model, where hyperedges form betw een no des as so on as latent-position balls o f some radius intersect. T o avoid impos ing a simplicial complex structure, the radii diﬀer by no de subset size, increa sing with it and th us preven ting automatic inclusion of smaller subsets. This deterministic framework is then augmen ted with a random step , though this in tro duces identiﬁ abilit y challenges, whic h ar e mitigated via prio r distributions during inference. Lyu et al. ( 2023 ) proposed a tensor-based LSM, ho w ev er limited to 3- uniform h yp ergraphs. Pr oximit y measures for subsets of nodes may also rely on averages of their (relativ e) latent positions (e.g. arithmetic, geometric, Hölder, . . . ). This is the av enue pursued in F ritz et a l. ( 2026 ), and co m bined with a latent h yp erb olic s pace, taking a dv antage of a mor e expressive geometry tow ar ds hierar ch ical a nd em bedded structures. That work a lso cont ains a most promising to ol for the statistical analysis o f hypergr aphs: a sa mple- to-po pulation estimation pro cedure, that consists in r epla cing the mo del likelih o o d b y an approximation where non-occurr ing interactions are only s a mpled while the o ccurring ones (the hyperedges) are all included. Blo ck mo dels will be discussed in Sect ion 5 , as their discrete laten t space is directly link ed to node clustering. W e men tion here the w ork b y Ng and Murph y ( 2022 ) that prop o ses a mixtu re model on the h yp eredges and is th us not linked to node clu stering. Finally , Balasubramanian ( 2021 ) propo ses a nonparametric h yp ergra phon model, but limited to the uniform case. 5 No de clustering on h yp ergr a phs What cluster typ es ar e we lo oking for? The simp lest t yp e of cluster is a comm unity , c haracterized in the con text of graphs b y groups of no des that are stro ngly co nnected internally but weakly connected exter- nally . The ﬁrst question that arises is: What is a commun it y in a hypergraph? One could consider that hyperedges co nstitute (ov er lapping) comm unities, in which case clusters are directly observed. A more reﬁned deﬁnition would state that no des that often share the same hyperedges form a communit y . A key challenge here is wether the size of those hyperedges should be tak en int o acco un t or not ? F or instance, is there a comm unit y structure in the to y hypergraph from Fig. 1 ? There, node groups { 1 , 2 , 3 , 4 } a nd { 5 , 6 , 7 } hav e as many int ernal as external hyperedges (one, r espec- tiv ely) but the sizes of the in ternal hyperedges are larger. In fact, a wide v ariety of deﬁnitions are p ossible, giving rise to equally div erse proposa ls in the literature. 9 Can we hop e to dete ct them? The no de clustering issue is intim ately link ed to the existence of information-theoretic limits that preven t from recov er ing or detecting those clusters. That question has been init ially appro ached in the co ntext of uniform h yperg raphs, thereby li miting the scop e of the res ults. Indeed, though hypergr a phs may b e seen as a collection of s - uniform h ypergr aphs for v arying v alues of s , it is not necessary that all la yers be informativ e to reco v er th e underlying laten t structure. Non uniform results in sparse h yperg raphs include Zhen and W a ng ( 2023 ) which cont ains conv erg ence b ounds for both the mo del parameters and the co mm unities, and Dumitriu et al. ( 20 25 ) that provides a weak co nsistency result on the communit ies when mo del par a meters are known . Mor e r ecen tly , Ruggeri et al. ( 2024 ) established a ﬁrs t detection result v alid in a non-uniform h yp erg raph, how ev er restricted to a particular setting where the probabilit y of a h yper edge is expressed as the sum of pairwise probabilities. This dyadic res triction mak es the model more similar t o th e graph settin g. Mo del b ase d appr o aches - SBM Beyond comm unities, the blo ckmodel approaches simply deﬁne clusters as gro ups of nodes with same (conditional) interaction probabilities. Many propo s als hav e emerged in the literature these past few year, together wi th degree- corrected v aria n ts ( Ghosh- dastidar and Dukkipati 2014 ; Cho drow et al. 2021 ; Y uan et al. 20 22 ; Brusa and Matias 2024 ). Other appr o aches. Other approac hes to no de clustering in h yp ergr aphs include modularity-based met h- o ds. Mo dularity deﬁnitions heavily rely on the deﬁnition of a commun it y and v ario us directions hav e been follo w ed in that a rea. The reader will ﬁnd a comparison of these methods in Poda a nd Matias ( 2024 ). An alternative is provided by sp ectral cluster- ing. Most existin g methods heavily rely on the (w eighted) cli que graph represen tation ( Ghoshdastidar and Dukkipati 20 17 ), at th e co st o f lo o sing information, whi le others are either restricted to simplicial structures or un iform h yp ergra phs del Gen io ( 2025 ). Finally , some approaches based o n r andom walks hav e b een su ggested Sw an and Zhan ( 2021 ). 6 Conclusions and next c hallenges Scalabilit y is certainly one of the most challenging issue in hypergr aphs modeling. It comes in tw o w ays: being able to handle potentially large hyperedges s izes in one hand, and more genera lly large systems (in the n um b er o f individuals and in teractions) in the other hand. Appro ximate inference is certainly a promising a v en ue in that directi on, as initiated for instance in F ritz et al. ( 2026 ). Eﬃcient softw ares for statistical analysis need to b e developed, in line with the existing libraries s uch as Hyp erNetX ( Prag- gastis et al. 202 4 ). Impossibility results or phase transition thresholds for comm unit y detection in non-uniform h yp ergra phs seem diﬃcult to obtain and are certainly one of the next c hallenges in this ar ea. As already stressed, the unif orm hyperg r aphs results w on’t help in that direction as not all layers need to b e informative. Moreov er, the 10 only av ailable threshold ( Ruggeri et al. 2024 ) h eavily relies on a dyadic t yp e mo deling assumption. Syn thetic b enc hmark data for commu nit y detection in hypergraphs are urgent ly needed. These may not rely on hypergra ph SBM, so that the mo del-based methods are not fav ored in the comparison wit h th e others. So me non convincin g pro- pos als ha ve b een made (see th e discussion in P o da and Matias 202 4 ), and again, this raises the delicate question of comm unit y deﬁnition in the hypergraph con text. A ckno wledgements. I deeply thank the o r ganizers of the w orkshop “New T rends in Statistical Netw o r k A nalysis” during whic h the idea of th is special issue a rose, namely Carsten Jen tsc h, Göran Ka uermann and Alexander Kreiss, esp ecially for fostering fruitful and engaging exchanges among all participan ts. Statemen ts and Declarations • F unding Not applicable • Competing interests Not applicable • Ethics appro v a l and co nsen t to participate Not applicable • Consen t for publication Not applicable • Data a v a ilabilit y Not applicable • Materials a v ailabilit y Not applicable • Co de a v a ilabilit y Not applicable • Author con tribution Not applicable References Aksoy , S.G., Joslyn, C., O rtiz Marrer o, C., Pra g gastis, B., Purvine, E.: Hyp ernet- w ork science via high-or der hyperg r aph walks. EPJ Data Science 9 (1), 16 (2020) h ttps://doi.org/10.114 0/ep jds/s136 8 8- 020- 00231- 0 Balasubramanian, K.: Nonparametric modeling o f higher-o rder in tera ctions via h yp er- graphons. J. Mach. Learn. Res. 22 (146), 1–35 (2021 ) Barthelem y , M.: Class of mo dels for random hypergraphs. Physical Review E 106 (6), 064310 (2022) h ttps://doi.org/10.110 3/PhysRevE.106.064310 Bilbao, D., Aimar, H., T orterolo, P ., Mateos, D.M.: Higher-order in tera c- tion analysis via hypergr aph mo dels for studying m ultidim ensional neuro- science data. Biomedi cal Signal Pro cessing and Con trol 112 , 108564 ( 2026) h ttps://doi.org/10.101 6/j.bspc.20 25.10856 4 Battiston, F., Cencetti, G., Iacopini, I., Latora, V., Lucas, M., Patania, A., Y oung, J.-G., Petri, G.: Net works b eyond pairwise in teractions: Structure and dynamics. Phys Rep 874 , 1 –92 (2020) https://doi.org/10.1016 /j.ph ysrep.2020.05.004 Battiston, F., Capraro, V., Karimi, F., Lehmann, S., Migliano, A.B., Sadek ar, O., Sánc hez, A., P erc, M. : Higher-order in teractions shape collec- tiv e hu man b ehaviour. Nature Human Behaviour 9 (12 ), 2441–2 457 (202 5) 11 h ttps://doi.org/10.103 8/s4156 2- 025- 02 373- 5 Bic k, C., Gross, E., Harrington, H.A., Sc haub, M.T.: What are higher- order net w orks? SIAM Review 65 (3), 686–731 (2023) ht tps://doi.org/10.113 7/21M141 4024 Bianconi, G.: Higher-Order Netw orks. Elemen ts in the Structure a nd Dynamics of Complex Net works. Ca m bridge Univ ersity P ress, Cam bridge (2021) Bimler, M.D., Ma yﬁeld, M.M.: Ecology: Lif ting the curtain on higher-order int eractions. Current Biology 33 (2), 77 –79 (2023) h ttps://doi.org/10.101 6/j.cub.2022.11.051 Brusa, L., Matias, C.: Model-based clustering in simple h yper g raphs through a s to chastic blockmodel. Scand. J. Stat. 51 (4), 1661 – 1684 (2024 ) h ttps://doi.org/10.111 1/sjos.1275 4 Chelaru, M.I., Ea gleman, S., Andrei, A.R., Milton, R., Kharas, N., Drag o i, V.: High-or der correlations explain the collective behavior of cortical p opula- tions in executive, but not sensory ar eas. Neuron 109 (24), 3 9 54–396 1 ( 2021) h ttps://doi.org/10.213 9/ssrn.3803 611 Chodrow, P .S.: Conﬁguration mo dels of random hypergraphs. J. Complex Netw orks 8 (3), 018 (202 0) h ttps://doi.org / 10.1002 /rsa.203 2 6 Cervell ini, M., Sinaimeri, B., Matias, C., Martino, A.: Comparing the abilit y o f embed- ding methods on metab olic hypergraphs for capturing taxonomy-based features. Algo Mol Biol (2026 ) Chodrow, P .S., V eldt, N., Benson, A.R.: Generative hypergraph clustering: F rom blo ckmodels to mo dularit y . Science Adv ances 7 (28), 1303 (2021) h ttps://doi.org/10.112 6/sciadv.abh1303 del Geni o, C.I.: Hyp ermo dularity and commu nit y detection in hypergra phs. Physical Review Research 7 (3), 033045 (202 5) https://doi.org/10.1103 /58dr- wktc Dumitriu, I., W a ng , H.-X., Zhu, Y.: Partial recov er y and weak consistency in the non-uniform hypergra ph sto chastic block mo del. Combinatorics, Probability and Computing 34 (1), 1–51 (20 25) h ttps://doi.org/1 0 .1017/S09 635483 24000166 Do, M.T., Y o on, S.-e., Hooi, B., Shin, K.: Structural Patterns a nd Generativ e Models of Real-world Hyp ergraphs. In: Pr o ceedings of the 26th AC M SIGKDD In ternational Conference on Knowledge Discov ery & Da ta Mining. KD D ’20, pp. 176–186 . Association for Computing Machinery , New Y ork, NY, USA (2020). h ttps://doi.org/10.114 5/3394 4 86.3403 060 Flamm, C., Stadler, B.M.R., Stadler, P .F.: Chapter 13 - generalized top ologies: Hyper - graphs, chemic al rea ctions, and biological evolut ion. In: Ba sak, S.C., Restrep o, 12 G., Villa v eces, J.L. (eds.) A dv ances in Mathematical Chem istry and Applica- tions (V ol 2), pp. 300–3 2 8. Bentham Science Publishers, Netherlands (2015). h ttps://doi.org/10.217 4/9781 6 81080529115020017 F ritz, C., Y uan, Y., Sch wein b erger, M. : Scalable Sample-to-Populati on Esti- mation of Hyperb olic Space Models for Hyp ergraphs. a rXiv (2026). h ttps://doi.org/10.485 50/arXiv.250 9 .07031 Ghoshdastidar, D ., Dukkipati, A.: Consistency of sp ectral partitioning of uniform h yperg raphs under plan ted partition mo del. In: A dv a nces in Neural Information Pro cessing Systems, v ol. 27 (2014) Ghoshdastidar, D., Dukkipati, A.: Consistency of sp ectral hypergra ph par ti- tioning under planted partition mo del. Ann. Stat. 45 (1), 289– 315 (2017) h ttps://doi.org/10.121 4/16- AOS1453 Ghoshal, G., Zlatić, V., Caldar elli, G., Newman, M.E.J.: Random h yper- graphs and their applications. Ph ys. Rev. E 79 , 066118 (2009) h ttps://doi.org/10.110 3/PhysRevE.79.066118 Guo, J.-L., Zhu, X.-Y., S uo, Q., F orrest, J.: Non-uniform Ev olving Hyp ergraphs and W eigh ted Evolving Hyperg raphs. Scientiﬁc Rep orts 6 (1), 3 6 648 (2016) h ttps://doi.org/10.103 8/srep3664 8 Holme, P .: Modern temporal netw ork theory: a collo quium. Eur. Phys. J. B 88 (9), 234 (2015) https://doi.org/10.1140 / ep jb/e2015- 60 657- 4 Juul, J.L., Benson, A.R ., Kleinberg, J.: Hypergra ph patterns and collaboration struc- ture. F ron tiers in Physics 11 (2024) h ttps://doi.org/10.3389 /fph y .2023.13 0 1994 Jung, H., Phoa, F.K.H., Kim, S.-H.: Preferent ial A ttac hment Hypergraph Mo del Wit h Randomized Hyp eredge Coun t and Size. IEEE T r ansactions on Net w ork Science a nd Engineering 13 , 5145–5157 (202 6) https://doi.org/10.110 9 /TNSE.2025.3643 452 Javidian, M.A., W ang, Z., Lu, L., V altorta, M.: On a hyperg r aph probabilistic graph- ical mo del. Annals of Mathematics and Artiﬁcial Intell igence 88 (9), 100 3–1033 (2020) h ttps://doi.org/1 0.1007/ s 10472- 0 2 0- 09701- 7 Kim, S., Bu, F., Cho e, M., Y o o , J., Shin, K.: Ho w tra nsitiv e are rea l- world group in teractions? - measur ement and repro duction. In: Pro ceedings o f the 29th ACM SIGKDD Conference on Knowl edge Discovery a nd Data Mining. KDD ’2 3 , pp. 1132–11 43. Asso ciation for Computing Machinery , New Y ork, NY, USA (202 3). h ttps://doi.org/10.114 5/3580 3 05.3599382 Lee, G., Bu, F., Eliassi-Rad, T., Shin, K .: A Surv ey on Hyp ergraph Mining: Pat- terns, T o o ls, and Generator s. AC M Comput. Surv. 57 (8), 203–1 20336 (2025) h ttps://doi.org/10.114 5/3719 0 02 13 Lee, G., Cho e, M., Shin, K.: How do hyperedges ov erlap in rea l-world h yp ergra phs? - patt erns, measures, and generators. In: Pro ceedings of the W eb Conference 2021. WWW ’21, pp. 339 6–3407 . Asso cia tion for Computi ng Mach inery , New Y ork, NY, USA (2021). h ttps://doi.org/1 0 .1145/3 4 42381.3 450010 Lotito, Q.F., Musciotto, F., Mon tresor, A., Battiston, F.: Hi gher-order motif a nalysis in hypergra phs. Communications Physics 5 (1), 79 (2022) h ttps://doi.org/10.103 8/s4200 5- 022- 00 858- 7 Łuczak, T.: On the equiv alence of tw o basic mo dels of random graphs. In: Karoński, M., Jaw ors ki, J., Rucinski, A. (eds.) Pro ceedings of Random Graphs’87, pp. 151 – 158. Wiley , Chic hester (1990) Lyu, Z., Xia, D., Zhang, Y. : Laten t Space Mo del for Higher-Order Net works a nd Gen- eralized T ensor Decomposition. Journal of Computational and Graphical Statistics 32 (4), 1320–1336 (20 2 3) h ttps://doi.org/10.108 0/1061 8 600.202 2.2164289 Lizotte, S., Y oung, J.-G., Allard, A.: Hyperg raph reconstruction fro m uncertain pairwise observ ations. Scien tiﬁc Reports 13 (1), 213 6 4 ( 2023) h ttps://doi.org/10.103 8/s4159 8- 023- 48 081- w Landry , N.W., Y oung, J.-G., Eikmeier, N.: The simplicialit y of higher-order net w orks. EPJ Data Scien ce 13 (1), 17 (2024) h ttps://doi.org/10.114 0/ep jds/s136 8 8- 024- 00 458- 1 Muyinda, N., De Baets, B., Rao, S.: Non-king elimination, intransitiv e triad interac- tions, and sp ecies co existence in ecological competition netw or ks. Theor Ecol 13 , 385–397 (2020) https://doi.org/10.100 7 /s1208 0 - 020- 00459- 6 Mayﬁ eld, M., Stouﬀer, D.: Higher-order in teractions capture unexplained complexit y in diverse comm unities. Nat Ecol Evol 1 , 00 62 (2017) h ttps://doi.org/10.103 8/s4155 9- 016- 00 62 Nandy , S., Bhattachary a , B.B.: Degree Heterogeneity in Higher-Order Netw orks : Infer- ence in the Hypergraph β -Mo del. IEEE T ransactions on Information Theor y 70 (8), 6000–60 24 (2024) h ttps://doi.org/10.1109 / TIT.2024.3411 523 Ng, T.L.J., Murph y , T.B.: Mo del-based clustering for random hypergr a phs. Adv Data Anal Classif 16 , 691–723 (202 2) https://doi.org/10.1007 /s1163 4 - 021- 00454- 7 Praggas tis, B., Akso y , S., Arendt , D., Bonicillo, M., Jo slyn, C., Purvine, E., Shapiro, M., Y un, J.Y. : HyperNetX: A Python pack age for modeling complex net - w ork data as hypergraphs. Journal of Op en Source Softw are 9 (95), 6016 (202 4) h ttps://doi.org/10.211 05/joss.060 1 6 Poda, V ., Matias, C.: Comparison of mo dularity-based approaches for nodes clustering in hyperg r aphs. Peer Comm unit y Journal 4 (202 4 ) 14 h ttps://doi.org/10.240 72/p cjournal.404 Ruggeri, N., Contisciani, M., Battist on, F. , Bacco, C.D.: Communit y detection in large h yp ergr aphs. Science A dv ances 9 (28), 9159 (2023) h ttps://doi.org/10.112 6/sciadv.adg915 9 Restrepo, G.: Higher o rder structu res in c hemistry: h yp ergr aphs reshape t he molecule and the reaction. Digital Discov ery (2026) https://doi.org/10.1039/D 5DD005 33G Ruggeri, N., Lonardi, A., De Ba cco , C.: Message-passing on hypergraphs: detectabilit y , phase transitions and higher-order information. Journal of Statistical Mec ha nics: Theory and Exp eriment 2024 (4), 04 3403 (20 24) h ttps://doi.org/10.108 8/1742 - 5 468/ad343b Roy , S., Ra vindran, B.: Measuring net work centralit y using h yp ergra phs. In: Pro ceed- ings of the Second ACM IKDD Conference on Data Sciences. CoDS ’15, pp. 59–68 (2015). h ttps://doi.org/1 0.1145/2 732587 .2732595 San toro, A., Battiston, F ., Lucas, M., P etri, G., A mico, E.: Higher-or der connectomics of human brain funct ion reveals local topological signatures of task deco ding, indi- vidual iden tiﬁcation, and b ehavior. Nature Communi cations 15 (1), 102 44 (202 4) h ttps://doi.org/10.103 8/s4146 7- 024- 54472- y Simmel, G.: The Num b er of Members as Determining the So ciological F orm of the Group. I. American Journal of Sociology 8 (1), 1–46 (1902) Simmel, G.: The Num b er of Members as Determining the So ciolo gical form of the Group. II. American Journal of So ciology 8 (2), 158–19 6 (1902) Stasi, D., Sadeghi, K., Rinaldo, A., Petro vić, S., Fien berg , S.E.: Beta mo dels for random hypergr aphs with a given degree sequence. In: Pr o ceedings of 21st In ter- national Conference on Computational Statistics. In ternational Statistical Institute (ISI), Genev a, Switzerland (2014) Sc h ueller, W., W ach s, J., Servedio, V.D.P ., Thurner, S., Loreto, V.: Ev olving col- labo ration, dependencies, and use in the Rust Op en Source So ft w are ecosystem. Scien tiﬁc Data 9 (1), 703 (20 22) h ttps://doi.org/10.103 8/s4159 7- 022- 01 819- z Sw an, M., Zhan, J.: Clustering h ypergr aphs via the MapEquation. IEE E Access 9 , 72377–7 2386 (2021) h ttps://doi.org /10.1109 /ACC ESS.2021.3075 6 21 T orres, L., Blevins, A.S., Bassett, D., Eliassi- Rad, T.: The why , how, and when of representations for complex systems. SIAM Rev 6 3 (3), 4 35–485 (202 1) h ttps://doi.org/10.113 7/20M135 5 896 T urnbu ll, K., Lunagómez, S., Nemeth, C., Airoldi, E.: Laten t Space Mo d- eling of Hyper graph Data. J. Amer. Stat. Asso c. 119 (548 ), 2 6 34–264 6 15 h ttps://doi.org/10.108 0/0162 1 459.202 3.2270750 W ebb, N., Giuliani, M., Lemaignan, S.: Sogrin: a non-verbal dataset of so cial group-level interactions. In: 20 23 32nd IEE E International Co nference on Robot and Human Interactiv e Communicati on (RO-MA N), pp. 2 632–26 37 (2023). h ttps://doi.org/10.110 9/RO- MAN57019.2023 .10309351 W ang, J.-W., Rong, L.-L., Deng, Q.-H., Zhang , J.-Y.: E volvi ng hypernet- w ork mo del. The Eur o pea n Physical Jo urnal B 77 (4), 493 – 498 (2010) h ttps://doi.org/10.114 0/ep jb/e2010 - 0 0297- 8 Y uan, M., Liu, R., F eng, Y., Shang, Z.: T esting commun it y s truc- ture for h yperg raphs. The Annals of Statistics 5 0 (1), 14 7–169 (2022) h ttps://doi.org/10.121 4/21- AOS2099 Zhen, Y., W ang, J.: Comm unity Detection in General Hypergraph Vi a Graph Em b ed- ding. Jo urnal of the American Statistical Asso ciation 118 (543), 16 20–162 9 (2023) h ttps://doi.org/10.108 0/0162 1 459.202 1.2002157 16

A statistical perspective on higher-order interactions modeling

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment