Synthetic Tabular Data Generation for Privacy-Preserving Machine Learning

Emory Callahan; Liora MacNeill

pdf

Published: Jul 1, 2025

Emory Callahan

Liora MacNeill

Abstract

The increasing demand for machine learning models in sensitive domains such as finance and healthcare has raised significant privacy concerns about training on real-world data. Synthetic tabular data generation offers a promising solution by creating artificial datasets that preserve the statistical properties of the original while mitigating privacy risks. In this paper, we present a comprehensive experimental study on generating privacy-preserving synthetic tabular data using three state-of-the-art generative models: CTGAN, TVAE, and Gaussian Copula. Using real-world datasets including the UCI Adult Income and the U.S. Medical Cost dataset, we compare the generated synthetic data based on three key metrics: utility (measured by downstream task performance), fidelity (statistical similarity to original data), and privacy risk (membership inference attack susceptibility). Our results show that CTGAN achieves superior utility in classification tasks, while Gaussian Copula offers higher privacy robustness. We also propose a hybrid generation-evaluation pipeline that balances data utility and privacy. These findings provide critical insights for practitioners seeking to deploy synthetic data in regulated environments.

How to Cite

Callahan, E., & MacNeill, L. (2025). Synthetic Tabular Data Generation for Privacy-Preserving Machine Learning. Journal of Computer Science and Software Applications, 5(7). Retrieved from https://mfacademia.org/index.php/jcssa/article/view/236

Issue

Vol. 5 No. 7 (2025)

Section

Articles

This work is licensed under a Creative Commons Attribution 4.0 International License.

Mind forge Academia also operates under the Creative Commons Licence CC-BY 4.0. This allows for copy and redistribute the material in any medium or format for any purpose, even commercially. The premise is that you must provide appropriate citation information.

Article Sidebar

Main Article Content

Abstract

Article Details