teaching
Fall 2025: Explainable AI (CS 485/698)
This course introduces technical methods for making machine learning models more transparent and understandable. Topics include intrinsically interpretable models, post hoc explanations (e.g., Shapley values, saliency maps), visualization of model internals (e.g., attention maps, neuron activations), surrogate modeling, mechanistic interpretability, and communication bottlenecks. Visualization is a central theme throughout the course, both as a practical tool and as a key research frontier.
Course Schedule
Week | Topic |
---|---|
1 | Introduction, Linear models |
2 | Decision trees, ensembles |
3 | Feature importance: partial dependence, permutation, Shapley values |
4 | Following information usage (SAGE, information bottleneck) |
5 | Surrogate modeling, exemplars, counterfactuals |
6 | Gradient-based methods for saliency, adversarial examples |
7 | CNNs and feature visualization |
8 | CLIP+Concept Bottleneck Models, Midterm |
9 | Attention: the good and the bad |
10 | Mechanistic interpretability |
11 | Jailbreaking, intervening, chain of thought |
12 | Automated interpretability, evaluation of explanations |
13 | Zoom out: current trends in research, policy |
14 | Student presentations |
15 | Final exam |
The syllabus can be found here.