MIDST Challenge at SaTML 2025: Membership Inference over Diffusion-models-based Synthetic Tabular data

MIDST Challenge at SaTML 2025: Membership Inference o v er Dif fusion-models-based Synthetic T ab ular data Masoumeh Shaﬁeinejad 1 , Xi He 1,2 , Mahshid Alinoori 1 , John Je well 1 , Sana A yromlou 3 , W ei Pang 2 , V eronica Chatrath 4 , Gauri Sharma 5 , De val P andya 1 1 V ector Institute, 2 University of W aterloo, 3 Google, 4 University of T or onto, 5 McGill University { masoumeh, xi.he, mahshid.alinoori, de v al.pandya } @vectorinstitute.ai jjewell6@uw o.ca, sayromlou@gmail.com, w3pang@uwaterloo.ca, veronica.chatrath@robotics.utias.utoronto.ca, gauri.sharma@mail.mcgill.ca Abstract —Synthetic data is often perceived as a silver -bullet solution to data anonymization and privacy-pr eserving data publishing. Drawn from generative models like diffusion models, synthetic data is expected to preserv e the statistical properties of the original dataset while remaining resilient to privacy attacks. Recent developments of diffusion models ha ve been effecti ve on a wide range of data types, b ut their privacy r esilience, particularly f or tabular f ormats, r emains largely unexplor ed. MIDST challenge sought a quantitative evaluation of the priv acy gain of synthetic tabular data generated by diffusion models, with a speciﬁc focus on its resistance to membership inference attacks (MIAs). Given the heter ogeneity and complexity of tabular data, multiple target models were explored for MIAs, including diffusion models f or single tables of mixed data types and multi-r elational tables with interconnected constraints. MIDST inspir ed the development of novel black-box and white- box MIAs tailored to these target diffusion models as a key outcome, enabling a comprehensive ev aluation of their privacy efﬁcacy . The MIDST GitHub repository is av ailable at: https: //github .com/V ectorInstitute/MIDST I . I N T R O D U C T I O N Priv acy regulations across the globe — European Gen- eral Data Protection Regulation (GDPR), the Canadian Per- sonal Information Protection and Electronic Documents Act (PIPED A), California’ s Consumer Priv acy Act (CCP A), etc — call for data anonymization as a main priv acy principle. Industries are exploring priv acy-preserving technologies to comply . JP Morgan has published an extensiv e report [20] stating “for a highly regulated ﬁnance industry , synthetic data is a potential approach for dealing with issues related to pri vac y , fairness, and explainability”. Similarly , the US Department of Health and Human Services regards synthetic data as a powerful tool that can address main challenges in making health care data accessible [8]. V ector Institute, a national AI institute in Canada, organized a boot camp for industry sponsors to promote dif fusion models for synthesizing tabular data [27]. Howe ver , without effecti ve measures against priv acy attacks, true anonymity may not be achiev ed. V ector Institute, organized a membership inference attack challenge on synthetic tabular data generated by dif fusion models, named MIDST challenge, as part of the IEEE SaTML 2025 confer- ence. This competition is a contribution to the broader , real- world challenge of translating industry and regulatory risk assessments into technical terms. A. Novelty The usage of membership inference attacks for anonymity measurement is not ne w . Ho wev er , their application on com- plex tabular data generated by diffusion models had not yet been addressed by either the research community or the industry prior to MIDST . MIDST also extended this ev aluation to multi-table data synthesis, a popular application in industry , which had not been addressed for ev aluation in research before, even outside the dif fusion model context. A relev ant membership inference competition was held by Microsoft in SaTML 2023, MICO project [16], ev aluating the effecti veness of dif ferentially pri vate model training as a mitigation against white-box membership inference attacks. Howe ver , MIDST focuses on complex tabular data and ev aluates the pri vac y efﬁcac y or limitations of synthetic data generators. I I . B AC K G RO U N D A. T abular Data Synthesis Synthetic data has attracted signiﬁcant interest for its ability to tackle key challenges in accessing high-quality training datasets. These challenges include: i) priv acy [1, 9], ii) bias and fairness [26], and iii) data scarcity [5, 30]. The interest in synthetic data has extended to v arious commercial settings, notably in healthcare [8] and ﬁnance [20] sectors. The syn- thesis of tabular data, among all data modalities, is a critical task with approximately 79% of data scientists working with it on a daily basis [23]. B. Diffusion Models Diffusion models hav e emerged as powerful tools for data synthesis, demonstrating remarkable success in various do- mains [21]. These models are particularly noted for their strong capabilities in controlled (conditioned) generation. The y hav e been used for generating synthetic data in both uncondi- tional settings, for single tables, [11, 29, 13, 10] and [17, 14] conditioned setting for multiple interconnected tables. C. Membership Infer ence Attacks (MIAs) In tabular synthesis literature, the generative model is com- monly ev aluated by a basic distance to the closest record (DCR) metric for its priv acy protection. Howe ver , the gen- erated synthetic data indicates its priv acy deﬁciencies when assessed by stronger priv acy measurements such as member - ship inference attacks [22, 7]. The ev aluation of diffusion models by membership inference attacks was lar gely limited to computer vision [24, 4, 3], prior to the MIDST competition. W ith the success of diffusion models in generating tabular synthetic data, MIDST highlighted the need to ev aluate the models with proper pri vacy metrics. I I I . M I D S T C H A L L E N G E D E S I G N The generativ e models are dev eloped on the training data set to generate synthetic data. They are e xpected to learn the statistics without memorizing the individual data. T o ev aluate this promise, membership inference attacks assess whether the model distinguishes between the training data set and a holdout data set , both are deriv ed from the same, lar ger data set. For each of the four tasks, a set of models were trained on different splits of a public dataset. For each of these models, m challenge points were provided; exactly half of which are members (i.e., used to train the model) and half are non- members (i.e., from the holdout set; they come from the same public dataset as the training set, but were not used to train the model). The goal of the participants is to determine which challenge points are members and which are non-members. A. Challenge T racks MIDST challenge was composed of four different tracks, each associated with a separate category . The categories are deﬁned based on the access to the generati ve models and the type of the tab ular data as follo ws: 1) Access to the models: black-box, Data: single table 2) Access to the models: white-box, Data: single table 3) Access to the models: black-box, Data: multi-table 4) Access to the models: white-box, Data: multi-table In white-box attacks, the participants had access to the models and their generated synthetic output. T raining sets for these models are selected from a public dataset. In black-box attacks, the access to the same information as the white-box attack was granted, except for the models. B. Models and Datasets V ector Institute held a boot camp on diffusion-model- based tabular synthesis used for a variety of applications for industry stakeholders on September 3-5, 2024 [27]. The plug-and-play dif fusion-model-based reference implementa- tions include: T abDDPM (single table) [11], T abSyn (single table) [29], and Clav aDDPM (multi-table) [17]. The repos- itory of the models and the utilized datasets are publicly av ailable through the link: https://github .com/V ectorInstitute/ MIDSTModels. The same models were included in the MIDST challenge for priv acy ev aluation. T o facilitate the participation of all interested researchers regardless of their computing capabilities, MIDST also provided a set of 30 shadow models for each of these three models. The shadow models were the same for black-box and white-box tasks. The participants were free to choose these shadow models and/or generate their own if needed in dev eloping their MIAs. These models were each trained on a 20,000-record subsample of the T ransaction 1 table from the publicly av ailable Berka dataset [2]. The Berka dataset is a collection of 8 tables representing ﬁnancial information from a Czech bank. The dataset deals with over 5,300 bank clients with approximately 1,000,000 transactions. Additionally , the bank represented in the dataset has extended close to 700 loans and issued nearly 900 credit cards. C. Submission, Evaluation and Scoring MIDST included three phases: train, dev , and ﬁnal. For 30 shadow models in train (for each model in each track), the full training dataset, and consequently the ground truth membership data for challenge points were revealed. The participants could use these [shadow] models to de velop their attacks. For 20 models in the dev and 20 models in ﬁnal sets, no ground truth was rev ealed and participants needed to submit their membership predictions for challenge points. During the competition, a li ve scoreboard showed the results on the de v challenges. The ﬁnal ranking was decided on the ﬁnal set; scoring for this dataset was withheld until the end of the competition. Submissions were ranked based on their performance in membership inference against the associated models. For each challenge point, the submission provided a value, indicating the conﬁdence level with which the challenge point is a mem- ber . Each v alue being a ﬂoating point number in the ra nge [0.0, 1.0], with 1.0 indicating certainty that the challenge point is a member , and 0.0 indicating certainty that it is a non-member . Submissions were ev aluated according to their T rue Positive Rate at 10% False Positiv e Rate (i.e. TPR 0.1 FPR). In this context, positiv e challenge points were members and negati ve challenge points were non-members. For each submission, the scoring program concatenated the conﬁdence values for all models (dev and ﬁnal treated separately) and compared these to the reference ground truth. The scoring program determined the minimum conﬁdence threshold for membership such that at most 10% of the non-member challenge points are incorrectly classiﬁed as members. The score captured the T rue Positiv e Rate achiev ed by this threshold (i.e., the proportion of correctly classiﬁed member challenge points). The li ve scoreboard showed additional scores (i.e., TPR at other FPRs, membership inference advantage, accuracy , A UC-ROC score), but these are only informational. D. The competition timeline The models and dataset were published by December 1 st , 2024. The submission to the live scoreboard was open from December 1 st , 2024 until February 27 th , 2025. The submis- sions for the ﬁnal phase were collected on February 28 th . 1 For multi-table tracks, the corresponding records from the other tables in the Berka dataset were also included. T rack W inner Success Runner-up Success White-box Single T able T artan Federer [28] 46% Y an Pang [18] 39% White-box Multi-table T artan Federer 35% ** ** Black-box Single T able T artan Federer 25% CIT ADEL & UQAM [12] 22% Black-box Multi-table T artan Federer 23% Cyber@BGU [6] 20% T ABLE I: Competition results across tracks. **W e recei ved se veral submissions for the white-box multi-table task; ho wev er , their performance did not signiﬁcantly exceed that of random guessing. The winners of all tasks were announced to the competition participates on March 10 th , 2025, as well as ofﬁcially during SaTML conference, April 9-11, in Copenhagen. I V . R E S U LT S W e recei ved ov er 700 submissions from 71 participants across four tracks. The T artan Federer team placed ﬁrst in all tracks, and we announced runner-ups in three of the four tracks. For the remaining track, white-box multi-table, we did not announce a runner-up, as the submitted approaches did not signiﬁcantly outperform random guessing. A. W inning solutions The MIDST results are provided in T able I. In the white-box track, our top-performing teams used different approaches in their attack de velopment. T artan Federer [28] used SecMI [4] as a starting point for their attack design. While SecMI has shown success in image-based diffusion models, its original design prov ed less effecti ve for tabular data – highlighting the effect of data domain on attack dev elopment. T artan Federer identiﬁed noise initialization as a key factor inﬂuencing attack efﬁcac y and proposed a machine-learning-driv en approach that lev erages loss features across dif ferent noises and time steps. Inspired by the success of their GSA approach in computer vision [19], Y an Pang lev eraged the differences in gradients between member and non-member samples for their attack dev elopment [18]. In the black-box track, the best performing submissions employed a diverse set of techniques too. Cyber- BGU team [6] leveraged shado w models, auxiliary machine learning models, and an attack classiﬁer to craft their attack. T artan Federer also used shadow model parameters for their attack dev elopment. CIT ADEL & UQAM [12] performed their MIA through an ensemble technique. In addition to shado w- model-based predictions of RMIA [15] and DOMIAS [25] their meta-classiﬁer takes continuous features of the data as well as sev eral measurements of Go wer distance between the data points and the synthetic dataset as inputs. B. Dir ections for futur e resear ch 1) T abSYN vs T abDDPM: MIDST provided two models with dif ferent structures for single table tracks: T abSYN and T abDDPM. The competition considers the highest score achiev ed in attacking either of the models for ranking. Most of the attacks submitted tar geted T abDDPM, a fe w that attacked both achieved higher scores for T abDDPM. It remains an open question whether the preference comes from the fact that latent space diffusion models like T abSYN are less explored, or that the structure makes these models more resilient against mem- bership inference attacks.Evidence of the former argument is the SecMI attack, where latent space dif fusion models are considered in attack extension rather than in the default design of the attack itself. 2) Single table vs multi-table: MIDST uses Transaction table from the Berka dataset for the single table tracks. For multi-table tracks, the other tables from the Berka dataset are added as well. Ho wever , the MIDST challenge points for all tracks were restricted to the T ransaction table. An intuitiv e consequence from this setup would be that the attacks designed for single table models are applicable to multi-table ones, with similar success rate if they opt to not use the additional information from the other tables, and higher success rates if they opt to do so. Howe ver , the submitted results – particularly on white-box track, do not follow this intuition. 3) Comparison with the other AI-Gen for tabular synthe- sis: Diffusion models perform exceptionally well for tab ular synthesis. MIDST results show that this synthesis is not free of priv acy leakage. Howe ver , without further in vestigation and comparison with other GenAI approaches, it remains unclear whether this priv acy leakage is speciﬁc to dif fusion models. V . C O N C L U S I O N MIDST challenge represents a milestone in assessing the priv acy limitations of diffusion-based synthetic tabular data. The competition yielded nov el membership inference attacks that expose the vulnerabilities of dif fusion models, under- scoring that synthetic generation is not necessarily a default priv acy guarantee. These results highlight the ur gent need for better assessments and audits of the pri vac y risks in the life cycle of synthetic tabular data. V I . A C K N OW L E D G M E N T W e are deeply grateful to the University of W aterloo Cy- bersecurity & Priv acy Institute (CPI) and the Data Systems Group (DSG) for their generous sponsorship of the MIDST competition. Also, we’ d like to thank MICO organizers, for their open source project, and very helpful comments. R E F E R E N C E S [1] S. A. Assefa, D. Derv ovic, M. Mahfouz, R. E. Tillman, P . Reddy , and M. V eloso. Generating synthetic data in ﬁnance: opportunities, challenges and pitfalls. In Pr oceedings of the F irst A CM International Confer ence on AI in F inance , pages 1–8, 2020. [2] P . Berka et al. Guide to the ﬁnancial data set. PKDD2000 discovery challeng e , 2000. [3] N. Carlini, J. Hayes, M. Nasr , M. Jagielski, V . Sehwag, F . T ram ` er , B. Balle, D. Ippolito, and E. W allace. Ex- tracting training data from diffusion models. In USENIX Security 23 , pages 5253–5270, 2023. [4] J. Duan, F . Kong, S. W ang, X. Shi, and K. Xu. Are diffusion models vulnerable to membership inference attacks? In ICML , 2023. [5] J. Fonseca and F . Bacao. T abular and latent space synthetic data generation: a literature revie w . Journal of Big Data , 10(1):115, 2023. [6] E. German and D. Samira. Mia-ept: Membership inference attack via error prediction for tabular data. https://github .com/eyalgerman/MIA- EPT, 2025. GitHub repository . [7] M. Giomi, F . Boenisch, C. W ehmeyer , and B. T asn ´ adi. A uniﬁed framew ork for quantifying priv acy risk in synthetic data. arXiv pr eprint arXiv:2211.10459 , 2022. [8] A. Gonzales, G. Guruswamy , and S. R. Smith. Synthetic data in health care: A narrative revie w . PLOS Digital Health , 2(1):1–16, 01 2023. [9] M. Hernandez, G. Epelde, A. Alberdi, R. Cilla, and D. Rankin. Synthetic data generation for tabular health records: A systematic revie w . Neur ocomputing , 493:28– 45, 2022. [10] J. Kim, C. Lee, and N. Park. Stasy: Score-based tabular data synthesis. arXiv pr eprint arXiv:2210.04018 , 2022. [11] A. K otelnikov , D. Baranchuk, I. Rubache v , and A. Babenko. T abddpm: Modelling tabular data with diffusion models. In ICML , pages 17564–17579, 2023. [12] H. Lautraite, L. Herbault, , Y . Qi, J.-F . Rajotte, and S. Gambs. Ensemble mia: The 2nd place solu- tion to the midst black-box mia on the single-table competition. https://github .com/CRCHUM- CIT ADEL/ ensemble- mia, 2025. GitHub repository , accessed: 2025- 12-10. [13] C. Lee, J. Kim, and N. Park. Codi: Co-ev olving con- trastiv e diffusion models for mixed-type tabular synthe- sis. In ICML , pages 18940–18956, 2023. [14] T . Liu, J. Fan, N. T ang, G. Li, and X. Du. Controllable tabular data synthesis using dif fusion models. Pr oc. ACM Manag. Data , 2(1), 2024. [15] M. Meeus, L. W utschitz, S. Zanella-B ´ eguelin, S. T ople, and R. Shokri. The canary’ s echo: Auditing pri vac y risks of llm-generated synthetic te xt. In Pr oceedings of the 42nd International Confer ence on Machine Learning , volume 267, pages 43557–43580. PMLR, 2025. [16] Microsoft. MICO: Membership inference competition. https://github .com/microsoft/MICO, 2023. GitHub repos- itory . [17] W . Pang, M. Shaﬁeinejad, L. Liu, and X. He. Clav addpm: Multi-relational data synthesis with cluster -guided dif fu- sion models. arXiv pr eprint, arXiv:2405.17724 , 2024. [18] Y . P ang. Solution for MIDST. https://github .com/ py85252876/MIDST, 2025. GitHub repository . [19] Y . Pang, T . W ang, X. Kang, M. Huai, and Y . Zhang. White-box membership inference attacks against dif fu- sion models. Pr oceedings on Privacy Enhancing T ec h- nologies , 2025(2):398–415, 2025. [20] V . K. Potluru, D. Borrajo, A. Coletta, N. Dalmasso, Y . El- Laham, E. Fons, M. Ghassemi, S. Gopalakrishnan, V . Go- sai, E. Krea ˇ ci ´ c, G. Mani, S. Obitayo, D. Paramanand, N. Raman, M. Solonin, S. Sood, S. Vyetrenko, H. Zhu, M. V eloso, and T . Balch. Synthetic data applications in ﬁnance. arXiv pr eprint arXiv:2401.00081 , 2024. [21] R. Rombach, A. Blattmann, D. Lorenz, P . Esser , and B. Ommer . High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF confer ence on computer vision and pattern r ecognition , pages 10684–10695, 2022. [22] T . Stadler , B. Oprisanu, and C. T roncoso. Synthetic data – anonymisation groundhog day . In USENIX Security 22 , pages 1451–1468, 2022. [23] B. van Breugel, N. Seedat, F . Imrie, and M. v an der Schaar . Can you rely on your model ev aluation? im- proving model ev aluation with synthetic test data. In Advances in Neur al Information Pr ocessing Systems , 2023. [24] B. v an Breugel, H. Sun, Z. Qian, and M. v an der Schaar . Membership inference attacks against synthetic data through overﬁtting detection. In F . J. R. Ruiz, J. G. Dy , and J. v an de Meent, editors, International Confer - ence on Artiﬁcial Intelligence and Statistics , volume 206 of Pr oceedings of Machine Learning Researc h , pages 3493–3514, 2023. [25] B. v an Breugel, H. Sun, Z. Qian, and M. v an der Schaar . Membership inference attacks against synthetic data through overﬁtting detection. In F . J. R. Ruiz, J. G. Dy , and J. v an de Meent, editors, International Confer - ence on Artiﬁcial Intelligence and Statistics, 25-27 April 2023, P alau de Congressos, V alencia, Spain , volume 206 of Pr oceedings of Machine Learning Researc h , pages 3493–3514. PMLR, 2023. [26] B. v an Breugel and M. v an der Schaar . Beyond priv acy: Navigating the opportunities and challenges of synthetic data. arXiv pr eprint arXiv:2304.03722 , 2023. [27] V ector Institute. Diffusion models for tabular and time series bootcamp. https://github.com/V ectorInstitute/ diffusion-models, 2024. GitHub repository . [28] X. W u, Y . Pang, T . Liu, and S. W u. Winning the midst challenge: New membership inference attacks on diffu- sion models for tabular data synthesis. arXiv preprint , 2025. [29] H. Zhang, J. Zhang, B. Sriniv asan, Z. Shen, X. Qin, C. Faloutsos, H. Rangwala, and G. Karypis. Mixed-type tabular data synthesis with score-based diffusion in latent space. arXiv pr eprint arXiv:2310.09656 , 2023. [30] S. Zheng and N. Charoenphakdee. Diffusion models for missing value imputation in tabular data. arXiv pr eprint arXiv:2210.17128 , 2022.

MIDST Challenge at SaTML 2025: Membership Inference over Diffusion-models-based Synthetic Tabular data

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment