teaching

Fall 2025: Explainable AI (CS 485/698)

This course introduces technical methods for making machine learning models more transparent and understandable. Topics include intrinsically interpretable models, post hoc explanations (e.g., Shapley values, saliency maps), visualization of model internals (e.g., attention maps, neuron activations), surrogate modeling, mechanistic interpretability, and communication bottlenecks. Visualization is a central theme throughout the course, both as a practical tool and as a key research frontier.

Course Schedule

Week Topic
1 Introduction, Linear models
2 Decision trees, ensembles
3 Feature importance: partial dependence, permutation, Shapley values
4 Following information usage (SAGE, information bottleneck)
5 Surrogate modeling, exemplars, counterfactuals
6 Gradient-based methods for saliency, adversarial examples
7 CNNs and feature visualization
8 CLIP+Concept Bottleneck Models, Midterm
9 Attention: the good and the bad
10 Mechanistic interpretability
11 Jailbreaking, intervening, chain of thought
12 Automated interpretability, evaluation of explanations
13 Zoom out: current trends in research, policy
14 Student presentations
15 Final exam


The syllabus can be found here.