Goal
The project's primary goal was to empirically assess the privacy guarantees and the associated privacy risks of synthetic data generators trained with machine learning, ensuring safer deployment of synthetic data technologies in fields like social sciences and healthcare.
Results
This project assessed the existing methods to test privacy-preserving capabilities of models trained on large datasets, considering real-world adversarial scenarios. We Highlighted the trade-off between computational feasibility and realistic threat models in privacy evaluations. We reviewed and listed existing tools and guidelines for practitioners, improving the transparency and usability of privacy evaluations. The work resulted in a perspective paper and resources repository that discusses privacy audits and membership inference attacks, providing new insights into the vulnerabilities of generative models and predictive systems. The result establishes a foundation for standardized privacy evaluations, empowering stakeholders to adopt synthetic data technologies with confidence.
Impact and Future Directions
The findings empower organizations like statistical agencies and healthcare providers to adopt synthetic data technologies with greater confidence, knowing the privacy risks have been thoroughly tested. This work could pave the way for broader access to sensitive datasets, enhancing research capabilities while maintaining stringent privacy protections.
Call-to-Action
Stakeholders are invited to explore our findings and adopt privacy evaluation practices based on use cases. By advancing these efforts, we encourage researchers in the field of privacy-preserving machine learning and synthetic data generation to conduct formal and standard privacy evaluation process.