The increasing adoption of Artificial Intelligence in scientific research demands models that are not only powerful but also interpretable and trustworthy. Foundation models (FMs), pre-trained on broad datasets and fine-tuned for specialized tasks, offer tremendous potential for accelerating discovery across disciplines—from life sciences and health to materials research and environmental studies. However, their "black-box" nature poses significant challenges for scientific adoption, where understanding how and why a model produces specific outputs is as crucial as the outputs themselves.
This project investigates the explainability of foundation models by examining their latent space structures and the mechanisms underlying their generative processes. We focus on understanding how different model architectures, training methodologies, and loss functions influence the disentanglement and interpretability of learned representations. Central to this investigation is the exploration of how these models progressively transform noise into meaningful patterns—a fundamental process that underpins modern generative AI.
Our work examines state-of-the-art generative models and their latent space dynamics, with particular attention to how information is encoded, transformed, and decoded during the generation process. By analyzing the relationship between noise patterns, intermediate representations, and final outputs, we aim to uncover interpretable structures that can guide scientific applications. This understanding is crucial for applications where explainability is paramount, such as medical imaging, drug discovery, and materials design.
The insights gained from this research will provide the scientific community with practical guidance for developing more transparent and interpretable AI systems. By bridging the gap between powerful generative capabilities and human-understandable explanations, we aim to foster greater trust in AI-assisted scientific discovery and enable researchers to leverage foundation models more effectively across diverse domains. The methodologies and findings from this work have direct applications to ongoing projects in neuroimaging, biomarker discovery, and personalized medicine, while establishing broader principles applicable to any scientific field requiring explainable AI.