Event
DEXTER: Diffusion-Guided EXplanations with TExtual Reasoning for Vision Models
Location
Date
Abstract
Understanding and explaining the behavior of machine learning models is a critical step toward advancing explainable and trustworthy AI. We introduce DEXTER, a data-free framework that combines the generative power of diffusion models with large language models to provide global, textual explanations of visual classifier behavior. By optimizing text prompts in an activation maximization framework, DEXTER generates visual samples that align with model predictions and enables detailed textual reasoning about a classifier’s decision-making process. We validate DEXTER across three use cases: 1) activation maximization to visually uncover what features the model has learned; 2) slice discovery and debiasing to identify and characterize subpopulations within datasets, enabling to debias pre-trained classifiers, without using either training data or ground truth annotations; and 3) bias explanation through natural language descriptions. The generated explanations are validated quantitatively and qualitatively, including a user study, demonstrating the method’s ability to produce meaningful and interpretable results. Experiments on SalientImageNet, Waterbirds, CelebA and FairFaces demonstrate that DEXTER outperforms state-of-the-art methods in multiple tasks and, for the first time, enables global textual reasoning about model behavior.
Bio
Simone Carnemolla is a second-year PhD student in Artificial Intelligence at Perceive Lab, University of Catania. His research focuses on Multimodal Learning, Explainable AI, and Natural Language Understanding, particularly through text and speech. His recent works explore techniques such as diffusion-based generation conditioned on audio-visual cues aligned with textual prompts, textual reasoning to explain visual classifiers behavior using activation maximization and prompt tuning, and speech segmentation through frame classification.