GANS

Using generative machine learning for synthetic microdata for research in the Social Sciences

Shutterstock 631813253

Goal

The project's primary goal was to empirically assess the privacy guarantees and the associated privacy risks of synthetic data generators trained with machine learning, ensuring safer deployment of synthetic data technologies in fields like social sciences and healthcare.

Results

This project assessed the existing methods to test privacy-preserving capabilities of models trained on large datasets, considering real-world adversarial scenarios. We Highlighted the trade-off between computational feasibility and realistic threat models in privacy evaluations. We reviewed and listed existing tools and guidelines for practitioners, improving the transparency and usability of privacy evaluations. The work resulted in a perspective paper and resources repository that discusses privacy audits and membership inference attacks, providing new insights into the vulnerabilities of generative models and predictive systems. The result establishes a foundation for standardized privacy evaluations, empowering stakeholders to adopt synthetic data technologies with confidence.

Impact and Future Directions

The findings empower organizations like statistical agencies and healthcare providers to adopt synthetic data technologies with greater confidence, knowing the privacy risks have been thoroughly tested. This work could pave the way for broader access to sensitive datasets, enhancing research capabilities while maintaining stringent privacy protections.

Call-to-Action

Stakeholders are invited to explore our findings and adopt privacy evaluation practices based on use cases. By advancing these efforts, we encourage researchers in the field of privacy-preserving machine learning and synthetic data generation to conduct formal and standard privacy evaluation process.

Participating organisations

Netherlands eScience Center
Maastricht University
Social Sciences & Humanities
Social Sciences & Humanities

Output

Team

CS
Chang Sun
Principal Investigator
Maastricht Univeristy
Flavio Hafner
Flavio Hafner
Research Software Engineer
Netherlands eScience Center
Erik Tjong Kim Sang
eScience Research Engineer
Netherlands eScience Center
Jisk Attema
Programme Manager
Netherlands eScience Center