A federated learning framework with knowledge graph and temporal transformer for early sepsis prediction in multi-center ICUs

A fed er ate d lea rn ing fr am ew ork w it h k now led ge g raph an d t emp or al tra nsformer for early se psis predictio n in mul ti - cent er ICUs Yue Chang * #a , Guangsen Lin b# , Jyun Jie Chua ng c , Shunqi Liu d , Xinkui Li a , Yaozheng Li e a Chengdu Medical C ollege, Chengdu, Sichuan 61 0500, China ; b Kunming Medical U niversity , Kunming, Yunnan 650500, China; c National Yang M ing Chiao Tung University , T aipei, T aiwan 900, China; d University of Southern California, Los Angeles, CA 90089, USA; e Yanshan University , Qinhuangdao, Hebei 066004, China # Yue Chang a nd Guangsen Lin contributed equally to this wo rk. * chang15303225243@qq.com ABS TRACT The early predict ion of seps is in in tensive c are un it (ICU) patients is cru cial for im pr oving surv ival rat es. Howev er, the developmen t of accurate predi ctive models is hampe red b y data fragment ation a cross h ealthcare institut ions and the complex, temporal na ture of me dic al data, a ll u nder stringent privacy cons traints. To a ddres s thes e ch al len ges, we propose a novel framework that u niquely i ntegrates feder ated learn ing (FL) with a medic al know le dge graph and a temporal transfor mer mo del, enhanc ed by meta - learning capabi li ties . Our appro ach enables collabora tive model tr aining across multiple hosp itals wi thout sharing raw patient da ta, thereby pres erving privacy. Th e mode l l eve r ages a know ledge graph to in corporate struc tured m edical relations hips a nd employs a tempora l transfor mer to c apture long -range dependenci es in clin ical time - s eri es d at a. A mo del - ag no sti c me ta - le arning (MAM L) stra tegy is fu rther inco rporated t o facilitate rap id adapta tion of the g lobal mod el to local d ata distributions . Evalua te d on the MIMIC - IV and eICU datase ts, our method ach ieves an area under the curve (AUC) of 0.9 56, which repr esents a 22.4% improv ement over conv entional centralized models and a 12 .7% i mprove ment over stan dard federat ed l earning, de monstrating strong predictive capability for se psis . This work presen ts a reliable and priv acy - preserving so lution for multi - c en ter co llab or ativ e ear ly warning of seps is . Ke ywords: F ede r ated L ea rning , Know ledge Gra ph, Te mporal Tr ansformer , Meta - Learning, S epsis Pre diction, Multi - Center ICU 1. INTRO DUCTIO N Sepsi s, a lif e - thr eatening organ dysfunction tr iggered by a dys regulated hos t response to infection, remains a leading cause of mortality in ICUs worldwide . The effectiveness of treatme nt is h ighly tim e - sensitive, w ith e ac h hour of delay significant ly incre asing mortality risk . Cons equently, the deve lopment of a ccurate and t imely predictiv e mode ls is of paramount importan ce. How ever, th is end ea v or faces two major obs tacles: the fr agmentatio n of p atient data a cross mul tiple h ealth car e inst itu tio ns, crea ti ng "da ta s ilo s," an d th e s trin gen t priv acy r egu latio ns th at g over n m edica l r eco rds. While centralized mod els that pool data from all sources can offer hi gh performan ce, th ey often viol ate priv acy laws and ethical guidelines . Feder ated learning has emerged as a promising altern ative, enabling col la bora tive model training without centralizing r aw d ata. Despi te its po tential, s tandard F L approa ches fre quently fa ll short i n cap turing the rich, structured r elationships b etween c linical concepts a nd th e comple x tempor al dyn amics i nherent in ICU data streams, wh ich ar e cr itica l for accur at e sep sis pred icti on . Rec ent res ea rch has b egun to explo re th e int egr at ion of a dvan ced ar tif icia l in tel lig en ce t echn iqu es to add res s thes e limitations. For insta nce, Fed erated Lear ning (FL) address es the pr ivacy and legal b arriers to medical da ta shar ing through a d istribut ed train ing p aradigm. In th e ICU se tting, multi - ce n te r co ll abora tion is cruc ial for d eveloping robus t AI models; how ever, pat ie n t data are often restri cted by priva cy regul ations and ca nnot be central ized [1] [2 ] [3] . FL enab les institutions to co llaborativel y train models w ithout s haring raw d ata, for instan ce, by exchanging model p arameters or encrypted inter mediate results [4] [5 ] [6] . Fo r ex amp le , the EX AM mo del — an AI mode l develop ed joint ly by 20 inst itutions worldwide using F L -- suc cessfully pr edicted future oxygen requirem ents in COVID - 19 pa tients bas ed on electron ic health r ecords a nd ch est radiogr aphs [7][8] . Nev ertheless, he terogeneity in ICU data distributi ons -- such as vari ations in equipment or case mix a cross hospitals -- c an l ead to degr aded p erformanc e in F L models, necessi tating optimized aggregation algorith ms [9 ] . Mos t FL studies suffer fr om m et ho dological l imitations, includin g inad equate privacy guarantees and high comm unication costs , or ex hibit g eneralizatio n issues , w ith only a minor ity de monstrating po te nti al for clinical translat ion [1 0 ] . In this p aper, w e propos e a un ified fra me wo rk tha t syner gistically co mbines fed erated learning, m edical k nowledge graphs, tempor al transform ers, and meta - l ea rning to addre ss th e critica l chal le nge of e arly s epsis pre diction across multiple ICU s. The no velty of our ap proach lies in the comprehe nsive integration of thes e four compon ents, cr eating a holistic sys tem th at address es data privacy, clinical sem antics, tempora l dyna mics, an d institut ional he te r ogeneity simultaneous ly. Our wo rk ma k es three primary c ontributio ns: (1) W e desig n a nov el system archite ct ure that enri ches patient r epresenta ti ons by dyn amical ly cons tructing and int egrating informatio n from a medical know ledge graph; (2 ) We d evelop a hybri d pre diction model comprisi ng a temp oral tra nsfo rmer for l ong - ran ge d ep end ency cap tur e and a graph atte ntion networ k for know ledge f usion, enhanced wi th a f irst - or de r me ta - lear ning str ategy for r apid personaliza tion to l ocal hospi tal dat a; (3) W e establish an d rigorously evalua te a co mprehens ive feder ated learn ing system th at incorp orates diff erential pr ivacy guarante es, dem ons trating si gnificant perfor ma nce improve ments ov er strong baselines wh ile maintaining robus t privacy protection in a m ulti - center s etup. 2. RES EARCH FOUNDAT ION 2.1 Knowl edge Graph in Medic al Informa tics Medical knowledg e grap hs provid e a structured fra mework for repr esenting h ealthca re know ledge by modeling entities — such as d iseases, s ymptoms, me dications , and laboratory t ests — and the relationsh ips be tween th em. In o ur fra mework , we const ruct a seps is - oriented kn owledge graph b y integrati ng concepts from established medical onto logies, inc luding SNOMED CT, IC D - 1 0, and the Hu man Ph enotype O ntolog y. Fo rmally, we defi ne the knowledg e grap h as G =(E,R ,T ), where E is th e s et of med ic al en tit ies, R is the s et o f re la tion ship ty pes , an d T  E × R × E is the s et o f fa ctu al tri pl es. We utilize th e Tr ansE embe dding model to learn distribu ted ve ctor r epresentations f or entities an d rel ations by optim izing the scor ing fun ction:  (  ,  ,  ) =    +         (1) where   ,   ,   denote th e e mbeddings of the head en tity, relation, and tail entity, r espective ly. T hese pre - tra i ne d K G embeddings s erve as a semant ic feature reposit ory, wh ich is subs eque n tl y integra ted w it h pa tient - spec ifi c cl ini ca l dat a to create a more informe d and c on text - aware r epresen tation. W e specifi cally chos e to i ncorpora te know ledge gra phs because th ey enh ance m odel int erpretability and he lp c apt ur e complex clinica l relationshi ps that are oft en he te r ogeneous across different medi cal institutions . 2.2 Te mpora l Mode ling in C lin ica l Tim e - S eri es Data acquired in the ICU typically consists of multiv ariate tim e - ser ies char acter iz ed b y ir reg ular s amp lin g r ates and frequent m issing va lues, presenting challeng es for tradit ional rec urrent neu ral n et wor ks in captur ing long -ran ge dependenci es. T ransfor mer architectur es, with their self - at ten tion m echan is ms, h av e pro ven par tic ul arly eff ec tiv e in modeling su ch co mplex t emporal p atterns. The core self - attention funct ion i s define d as: Attention (  ,  ,  ) = s of tmax (       )  (2) where Q ， K, a n d V repr esent th e quer y, key, and value matrices derived from the input s equence. We se lected transform ers over trad itional RNNs because they cons istentl y demonstrat e superior pe rforman ce in captur ing long - r ang e dependenci es in ICU tim e - ser ies d ata, effe ctiv ely addr ess in g the li mita tio n of v anish ing g rad ien ts th at p lagu es RNN arch it ectu res . To a cco mmo date t he irr egu lar tim e inter v als pres ent i n clini ca l d ata, we imp le men t learn abl e temp or al encodings . These encod ings ad apt the st andard si nusoidal po sitional en codings by expl ic itly incorpora ting the time de lta (Δt) since the preceding measure me nt, t h ereby endow ing the m odel w ith an explici t awaren ess of the pass age of time between obs ervations . 2.3 F ederated Le arning w ith M eta - Learni ng Federated learnin g is a distri buted machine learning para digm that enables mode l training across multiple decentra lized data holde rs without exchangi ng their r aw dat a. The f ounda ti onal Federated Averaging (F edAvg) algorith m aggrega tes model updates from par ticipating clients as follows :               (3) where w   is th e model update fr om c lient k , n  is its loca l data siz e, and n is th e tot al data s ize acro ss all cli ents . A sign ific ant ch allen ge in FL is s ta tist ical he ter ogen eity (n on - IID data) a cross clients. T o mitig ate this, w e incor porate a first- order approxim ation of M ode l - Agn ostic Meta - Learn ing (F oMAML) . Thi s techniqu e seeks a g lobal mod el init iali zatio n tha t can be r apid ly ad apt ed to new tas ks — or in th is con tex t, l oc al ho sp ital da ta — w ith a minima l number of gradient st eps. T he loca l adapt ation proc ess for a hos pital k is formalize d as:   󰆒 =       (   ) (4) where θ is the global model param eters, α is t he adaptatio n learnin g rate , and   is the loss comput ed on hospital  's loc al d at a. The f ede rat ed s er ver th en agg reg at es thes e ad ap ted par am ete rs {   󰆒 } t o update the glob al model, thereby fostering personaliz ed learn ing within a c ollaborative frame work. 3. METHOD 3. 1 Syst em Architec ture Design Fi g ur e 1. 3D V isu a liza t ion . As shown in Fig. 1, o ur proposed sys tem ado pts a lay ered a rchitecture compos ed of four distinc t laye rs: th e data layer , knowledge layer, model layer, and federa ti o n layer . The d ata flow and the ro le of each l ayer are detailed below: Data Lay er: Res iding withi n each p articipati ng hospit al, t his layer is respons ible for local d ata prepro cessing. T his includes ex tracting s tructured clinical f eatures fro m Electr onic H ealth R ec or ds (EHRs) — s uch a s vital signs , l aboratory results, and medication ord ers — coupled with handling m issing values vi a forward - f illing and f eature norma lization. A fixed- length sliding w indow of 48 hours is applied to f ormul ate te mporal inst ances for predic ti o n. Knowledge Layer: This l ayer hosts the ce ntral m edical know ledge grap h. For each pa tient, based on their record ed clinical fe atures (e.g ., di agnosis cod es, lab resul ts), a patien t - spec ific subgraph is dynamica lly extract ed by querying all interconne cted en ti ties within a 2 - h op r ad ius in the glo ba l KG . This pr oces s con tex tuali zes th e p atie nt' s i mmed iat e clinical st ate w ithin the bro ader landscape of m edical knowle dge. Model Layer : This lay er contains the core pr edictiv e model , which oper ates on a du al - path ar chitecture . On e path employs a Gra ph Att ention Ne twork (G AT) to en code the patient - sp ecific s ubgraph into a fixed - size ve ct or representat ion. The other path uti li zes a Tempora l Transf ormer to process the mult ivariate clini cal time - ser ies . Th e outputs fro m both paths are th en fused to ge nerate the fina l prediction . Federation Layer: T his layer or chestrates the col laborativ e training pro cess. It collec ts model upd ates (or the ada pted parameters from the FoM AML pro cess) fro m all parti cipating hos pitals, p erforms a weighted aggregation (c onsiderin g both datase t size an d loc al model quality) , and subsequ ently distr ibutes the updat ed global model. This enti re workfl ow ensures end - to - end priva cy pres ervation by design , as r aw da ta never leaves the hospi tal pre mises. 3. 2 Kno wledge G raph - E nhanced P atie nt Repre sentati on To const ruct a personal ized con te x t for each p atient, we ex tract a relevan t subgrap h from the glob al kn owledge gr aph based on the ir spe cif ic c lin ical f eatur es {   ,   , … ,   } . This i s achi eve d by r etr ievin g all en titi es con ne ct ed to t hes e fea tures w ith in a tw o - hop d istanc e, the reby captu ring a cl inically releva nt neighborho od. Th is patient - sp ecific subg raph is then encoded us ing a Graph Atten ti o n Networ k (G AT) to produc e a compreh ensive e mbedding vector   . T he G AT computation for a node  is given by:   󰆒 =   ,    +    ,      (  ) (5) where  (  ) de no te s the n eighbor s et of nod e  ，  is a sh ared w eigh t ma trix , an d   ,  are the attention coeffici ent s that quantify the importance of n ei g hbor  t o n o de  .The r esulting knowle dge - informe d e mbedding   is co nca tenat ed wi t h the temp oral fea tures   extracted by th e t ransform er to form the final joint represen tation:  fina l = [   ;   ] . 3.3 Te mporal T ransforme r with Meta - L earn ing As show n in F ig. 2, o ur Tempor al Transform er is specifi cally designed to hand le the i d iosyncrasies of clini cal time - series. The mod el's arch itectural sp ecifications are as fol lows: Fi g ur e 2 . Te m p or a l Tran sfor mer Model A rchit ectur e D iagra m. 3.4 F ederated Tr aining P rotocol To provi de for mal priv acy guar antees, our fed erated learning protoco l incorpor ates D ifferenti al Privacy (DP) . We imp lemen t the DP - F edAvg a lgorithm , which inv olves c li p ping and no ising th e local gradients before they ar e transmit ted to the s erver. The pr ocess for each client k s:  ~   Clip (   ,  ) +  ( 0,      ) (6) where g  is the client's raw gradient , C is the cli pping threshold , a n d  (0, σ  C  I) represe nts G aussian noise s caled by C and th e noise m ultiplier σ .The ser ver th en perfor m s a w eighte d aggregat ion of these nois e d updates:          n       ~    (7) He re, η is t he gl o ba l l e a r ni ng r at e, n  i s th e dat a siz e of cl ien t k , Q  is a qua lity m etric (e . g ., local validation AU C), and N =  n   Q    is the nor maliz ation facto r. Th is qua lity - w eigh ted agg reg ation incen tiviz es p artici pan ts to m ainta in high data an d model sta ndards. A b lockchain - based ledg er is utilized to immut abl y record par ameter hashes an d versioning inform ation, ensuring full trace ability and auditab ility of all model updates. 4. EXPE RIMENT AL R ES ULTS AND ANALYSIS 4.1 Expe rimenta l Setup Our exper imental en vironm ent emulat es a multi - hospi tal fed eration usin g a d istributed computing clust er. Each no de was equipped w ith In t el X eon P latinum 83 60Y CPUs , NVID IA A 100 GP Us ( 80GB V RAM), and 512GB of memory. Th e software st ack incl uded Ubun tu 22.0 4, Python 3.10, Py T orch 2. 3, and th e PySyft library for FL s i mulation . We u tilized two publ icly available c riti cal c are dat abases: M IMIC - I V (v2 .2) and the eICU Collaborativ e Res ea rch Data base, encompass ing 76 ,540 and 200 ,859 ICU s tays, resp ectively. S epsis labels w ere ass igned a ccording to th e Sepsis - 3 c riteri a, and mod el input was derived fro m the f irst 48 ho urs of ICU adm ission. The data was pa rtitioned across 5 to 20 sim ulated hospital nod e s, ref lecting realistic geograph ical and i nstitutio nal dist ributions. 4.2 Comp arative Experim ent Design To ev aluate th e perfo rmance of our pro posed fr amework, w e designed a comprehensi ve se t of comparisons against four baseline m ethods: ① C entr ali zed Mod el: A tr ad itio nal LS TM mod el tr ain ed on a hyp oth eti cal c entra l d atas et co nta ining all pa tient d ata . ② Stand ard Feder ated Learnin g (FL): A b asel ine implement at ion of the FedAv g algor ithm. ③ Knowl edge- Enhanced FL: A federa te d m od el that incorpor ates the k nowledge graph but uses a st andard LSTM for temporal modeling. ④ Tempora l FL: A feder ated mode l t h at employs the temporal tra nsformer bu t omits th e knowledge gr aph. ⑤ Our Ful l Metho d: The complet e framewor k integr ating the knowle dge gr aph, te mporal tr ansformer, and met a -learning (FoMAML ). All models were tr ained and evaluated using 5 - fol d cross - validation w i th co nsistent data spl its an d hyper parameter tuning pro tocols. Ev al u ation metri cs included Ar ea Unde r the ROC C urve (AUC), Accuracy , F1 - Score , Precision, Recall, and form al Differ ential Pr ivacy par ameters. 4. 3 Experiment al Results Ana lysis Tabl e 1 . Perfor mance Comp arison for S epsis P r edic tion (6 hours be fore ons et) . Method AUC Accur acy F1 - S cor e Prec ision Reca ll Pr i vac y Guarant ee Cent ralize d LS TM 0.781 0.762 0.754 0.768 0.741 None Standar d FL 0.848 0.823 0.816 0.831 0.802 ( ε, δ) - DP Knowle dge- Enh anced FL 0.882 0.851 0.843 0.857 0.829 ( ε, δ) - DP Te mpora l FL 0.901 0.874 0.868 0.882 0.854 ( ε, δ) - DP Our F ull M ethod 0.956 0.932 0.927 0.941 0.914 ( ε, δ) - DP As sum marized in Table 1, the propos ed fram ework d emonstrates superior perfor mance across a ll evaluation me trics, surpassing current bench marks for seps i s p rediction . The observe d perf ormance gain i s attributed to t he syn ergistic combination of i ts components: the knowledg e graph p rovides a struc tured, dom ai n - sp ecific pr ior, enhancin g the mode l's understand ing of clin ical se mantics, par ticularly for c ompl ex comorbi dities; the temporal transfor me r effectivel y models long- rang e, nonlinear dependencies in phys iologic al data, ca pturi n g subtl e patholog ical trends pr e ceding s epsis; and the meta - l earni ng componen t enabl es swift pe rsonali zation to l ocal hos pital d at a dis t ributions , effec ti v el y m itigating the challenges posed by s tatistica l hetero geneity. A lt hough t he introdu ction of these adv anced co mponents i ncurs a m odest increase in computati onal and communicat ion overhead, the substantial imp rovement in predictiv e a ccur ac y validat es the efficacy of our integr ated design. 4.4 Comp rehensiv e Mod el Evaluation Tabl e 2 . Mu lt i - D ime n si o na l M ode l Per f or m a nc e Com p ar i s on . M ode l Clin ica l Effica cy Pr i vac y Pr ot e ct io n Commun ication Effic iency Sc ala bili ty Overal l Score Cent ralize d LS TM 0.72 0 0.95 0.65 0.58 Standar d FL 0.79 0.85 0.82 0.81 0.82 Knowle dge-Enh anced FL 0.83 0.88 0.8 0.84 0.84 Te mpora l FL 0.86 0.9 0.78 0.87 0.85 Our F ull M ethod 0.94 0.94 0.83 0.92 0.91 A mul ti - facet ed evalu ation was conducted, s c oring each m odel across four c ritical d i mensi ons: Cl inical Effica cy ( a composite of A UC and F1 - Sc ore), P rivacy Pro t ection (based on D P fulf illment), Commun ication E fficiency , and Scal abili ty. As d ep icted in T abl e 2 , the p roposed fra mewor k achieves the highest s c ores in all categor ies. The overall performanc e score of 0.91 reflec ts a 56 .9% improv ement ov er the c entralized baseline and an 11.0% i mprove me n t over standard federated l earning, subst antiating t he value of th e integr ated ar chitectural d e sign. 5. CONCLUS IO N This paper pres e nts a novel and comprehens ive framewor k for e arly sepsis pr ediction that leverag es federat ed learning, knowledge graphs, and tempora l transform ers to address t h e crit ical challe nges of data priv ac y, temporal dyna mics, and insti tution al he te roge ne ity i n mu lti - c enter ICU settings. The propos ed s ystem en riches patien t represen tations w ith str uctu red med ical know ledge , captur es co mpl ex temp or al p atter ns in clin ical dat a, and faci litat es pers on ali zed mod el adaptation a cross differ ent hospitals , all with in a priva cy - preserving collabora ti ve tr aining par adigm. Ex te nsive e x pe ri m e nt at i o n on r ea l - wor ld ICU datasets demonstrates t hat our f ramewor k a chieves a hig h pred ictive A UC of 0.956 while pro viding formal d ifferenti al priva cy guarant ee s , dem onstrating s trong predictive capability that s urpasses c urrent benchmarks in th e field. Future work w ill focus on severa l promising di rections: integrating mu ltimoda l data sourc es such as clin ical not es and medical i mages , optim izing communic ation pro tocols to enhan ce effici ency furth er, exploring cross - border federat ed learning under diverse regu latory framewor ks, and u ltimatel y trans itioning the s ystem t owards rea l - ti me c l inic a l d eci s ion support. By pursu ing these avenu es, w e aim to augmen t t h e practica l ut ility of our approach and contribute meaningfully to improving patien t outcomes through ear lier and more accu rate se psis d etection. REFE RENCES [1] Teo ZL, Zhang X , Yan g Y, Priv acy - Preserving Technolo gy Using F ederated Learning and Blockch ai n in Protecting against A dversar ial Attacks for Retin al Imag ing. Ophtha lmology. 2025 Apr ; 132(4):484 -494. [2] Matschinske J, Späth J, Bakhti ari M, The FeatureClo ud Platform for Feder ated Le arning in Bi omedicine: Unified Ap proach . J Med Intern et Res. 2023 Jul 12 ;25:e4262 1. [3] Pati S, Kumar S, Var ma A , Privacy preserv ation for fe d erated learning in heal th care . P atterns (N Y). 2024 J ul 12;5(7):1009 74. [4] Gong X, Song L , Vedu la R , F ederated Learning With Pri vac y - Pres erv ing Ens emb le A tten tion D isti llati on. I EEE Trans Med Imag ing. 2023 Jul;42(7):2057 - 2067. [5] Ris chk e R , Sc hn eider L, M üll er K , Federated L earning in Dent istry: Ch ances and Chal le nges. J D ent R es. 2022 Oct;101(11) :1269-1273 . [6] Liu, S ., Wang, Y ., & H e , H. (2020). A new p l aying metho d of th e guessing foo tball lott ery. IOP Conferenc e Series: M aterials S cience and Engin eering, 790(1), 012100. [7] Da y a n I, Rot h H R, Z hon g A , Fe de r at e d le a r n in g fo r pr e di ct i n g cl i ni ca l ou t co me s i n pa ti e nt s w i th C OV I D - 19. Nat Med . 2021 Oct;27(1 0):1735 -174 3. [8] Dou Q, So TY, Jiang M, Fed era ted dee p l earnin g f or d etect i ng CO VID - 19 lung abno rmalities in CT: a privacy - preserving multinatio nal val idation st udy. NP J Digi t Med. 20 21 Mar 29 ;4(1):60. [9] Liu L, Jiang X, Zheng F . A Bayesi an Federat ed Learn ing Fra me work Wi th Online Laplac e Approxi mation. IEEE Tr ans Patt ern Ana l Mach In t ell . 2 024 Jan; 46(1):1 - 16. [10] Li M, Xu P, H u J, Fro m challenges and pitfal ls to recom mendations and opp ort unit ies: Impleme nting feder ated learning in health care. M ed Im age An al. 2025 A pr;101:10 3497.

A federated learning framework with knowledge graph and temporal transformer for early sepsis prediction in multi-center ICUs

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment