Network analysis is an increasingly vital tool in the social sciences. It enables researchers to study how information, behaviours, and attitudes spread through social structures. Statistics Netherlands provides a unique and powerful resource for such analysis: a full-scale population network of the Netherlands.
In parallel, machine learning has introduced tools like embeddings to represent complex data (such as text or networks) as low-dimensional numeric vectors. These embeddings can capture meaningful patterns and are commonly used for tasks such as similarity search or attribute prediction. In the NetAudit project, we bring these two worlds together by learning embeddings for the entire Dutch population network.
However, one challenge remains: interpretability. Unlike traditional social science variables, embedding dimensions often lack clear meaning. To address this, we applied a transformation that makes the dimensions sparse and orthogonal, ensuring they capture distinct and interpretable aspects of the population network. This makes the embeddings more useful not only for the prediction tasks, but also for exploratory research and hypothesis generation.
The untransformed and transformed population network embeddings are available for the years 2020, 2021, and 2022 within the secure remote access environment by Statistics Netherlands through the Storage Facility (in collaboration with ODISSEI).
This project is funded by NWA ODISSEI Roadmap grant, task 4.4.