On this page, you will find an overview of the lectures and the materials that we plan to cover in the seminar.
For each week, we also list some recommended preparatory reading that you should read before the lecture.
In case you want to add the lectures to your calendar, we are providing an ICS file to which you can subscribe (e.g., File
→ New Calendar Subscription
in the default Calendar app on macOS).
— Lecture 0 —Abstract:
Brief meeting, discussion of course schedule, exam.
— Lecture 1 —Abstract:
The lecture will summarize the main ideas of statistical learning theory.
We will revisit the standard generalization bounds that characterize the difference between true and empirical risk.
We will critically discuss the underlying assumptions and show examples where these are violated.
We will also discuss the dependence of the bounds on the number of parameters, which is important for understanding the success of overparametrization in today’s machine learning practice.Recommended reading:
- von Luxburg, U., & Schölkopf, B. (2011). Statistical Learning Theory: Models, Concepts, and Results. In: Handbook of the History of Logic, Volume 10: Inductive Logic (pp. 651–706). Amsterdam: Elsevier. DOI: 10.1016/b978-0-444-52936-7.50016-1.
— Lecture 2 —Abstract:
The lecture will summarize key ideas in game theory.
Game theory provides a means for modelling interactions between machine learning algorithms and their environment.
We will revisit zero-sum games and von Neumann’s minimax theorem and introduce the concept of Nash equilibria.
We will then discuss repeated games and adaptive decision-making algorithms (follow the leader, follow a random leader, multiplicative weights).Recommended reading:
- Karlin, A. R., & Peres, Y. (2017). Game theory, alive. American Mathematical Society. ISBN: 978-1-4704-1982-0. PDF version available online. [Chapter 2: Section 2.1–2.3; Chapter 18: Section 18.1–18.3]
— Lecture 3 —Abstract:
The two fields of machine learning and graphical causality arose and developed separately.
However, there is now cross-pollination and increasing interest in both fields to benefit from the advances of the other.
In the present paper, we review fundamental concepts of causal inference and relate them to crucial open problems of machine learning, including transfer and generalization, thereby assaying how causality can contribute to modern machine learning research.
This also applies in the opposite direction: we note that most work in causality starts from the premise that the causal variables are given.
A central problem for AI and causality is, thus, causal representation learning, the discovery of high-level causal variables from low-level observations.Recommended reading:
- Schölkopf, B. et al. (2021). Towards causal representation learning. arXiv:2102.11107.
— Lecture 4 —Abstract:
This lecture will provide an introduction to (non-statistical) online learning and multi-armed bandits.
We will discuss the multiplicative weights algorithm Hedge, and its partial information counterpart EXP3, as well as some applications to learning in games.Recommended reading:
- Hazan, E. (2019). Introduction to Online Convex Optimization. arXiv:1909.05207v1. [Chapter 6.2]
- Sessa, P. G., et al. (2019). No-Regret Learning in Unknown Games with Correlated Payoffs. Available online.
— Lecture 5 —Abstract:
The lecture will summarize the basics of dynamical systems and control theory.
We will discuss discrete-time and continuous-time dynamical systems, introduce the concept of equilibria and Lyapunov stability.
An important aspect of the lecture will be to emphasize the difference between noise and structural (epistemic) uncertainty and show how uncertainty can be reduced with feedback.
We will also discuss connections to game theory and generalization (Lecture 1).Recommended reading:
- Strogatz, S. H. (2015). Nonlinear Dynamics and Chaos: With Applications to Physics, Biology, Chemistry, and Engineering. 2nd edition. Boca Raton: CRC Press. DOI: 10.1201/9780429492563. [Chapter 2]
— Lecture 6 —Abstract:
Counterfactual Explanations are a relatively recent form of explanation designed to explicitly meet the needs of non-technical users.
Unlike methods such as Shapley values that providing measures of the relative importance of features, counterfactual explanations offer simple direct explanations of the form:
You were not offered a loan because your salary was $30k, if it had been $45k instead, you would have been offered the loan.
These forms of explanation are very popular in legal and governance areas of AI, and are cited in the guidelines to the GDPR.
We will discuss the development of counterfactuals explanations, including why previous forms of explanation are often unsuitable for end-users, and the limitations of counterfactual explanations, and what they can and can not tell you, and the challenges of extending them to high-dimensional problems in computer vision.Recommended reading:
- Wachter, S., et al. (2018). Counterfactual explanations without opening the black box: automated decisions and the GDPR. Harvard Journal of Law & Technology, 31(2). Available online. [Chapters I–IV]
- Elliott, A. et al. (2021). Explaining Classifiers using Adversarial Perturbations on the Perceptual Ball. Available online.
— Lecture 7 —Abstract:
Machine learning has recently made significant advances in single-agent learning challenges, much of that progress being fueled by the empirical success of gradient descent-based methods in computing local optima of non-convex optimization problems.
In multi-agent learning challenges, the role of single-objective optimization is played by equilibrium computation.
On this front, however, optimization methods have remained less successful in settings, such as adversarial training and multi-agent reinforcement learning, motivated by deep learning applications.
Gradient-descent based methods commonly fail to identify equilibria, and even computing local approximate equilibria has remained daunting.
We discuss equilibrium computation challenges motivated by machine learning applications through a combination of learning-theoretic, complexity-theoretic, game-theoretic and topological techniques, presenting obstacles and opportunities for machine learning and game theory going forward.
No deep learning / complexity theory knowledge will be assumed for this talk.Recommended reading:
- Daskalakis, C. et al. (2018). Training GANs with Optimism. arXiv:1711.00141. [Pages 1–10].
- Daskalakis, C. et al. (2020). The Complexity of Constrained Min-Max Optimization. arXiv:2009.09623. [Pages 1–8; optional!].
- Daskalakis, C. et al. (2021). Near-Optimal No-Regret Learning in General Games. arXiv:2108.06924. [Optional!]
— Lecture 8 —Abstract:
A major challenge in the study of dynamical systems is that of model discovery: turning data into reduced order models that are not just predictive, but provide insight into the nature of the underlying dynamical system that generated the data.
We introduce a number of data-driven strategies for discovering nonlinear multiscale dynamical systems and their embeddings from data.
We consider two canonical cases:
(i) systems for which we have full measurements of the governing variables, and
(ii) systems for which we have incomplete measurements.
For systems with full state measurements, we show that the recent sparse identification of nonlinear dynamical systems (SINDy) method can discover governing equations with relatively little data and introduce a sampling method that allows SINDy to scale efficiently to problems with multiple time scales, noise and parametric dependencies.
For systems with incomplete observations, we show that the Hankel alternative view of Koopman (HAVOK) method, based on time-delay embedding coordinates and the dynamic mode decomposition, can be used to obtain a linear models and Koopman invariant measurement systems that nearly perfectly captures the dynamics of nonlinear quasiperiodic systems.
Neural networks are used in targeted ways to aid in the model reduction process.
Together, these approaches provide a suite of mathematical strategies for reducing the data required to discover and model nonlinear multiscale systems.Recommended reading:
- Champion, K. et al. (2019). Data-driven discovery of coordinates and governing equations. Proceedings of the National Academy of Sciences, 116(45), 22445–22451. DOI:10.1073/pnas.1906995116.
— Lecture 9 —Abstract:
In classical machine learning, regression is treated as a black box process of identifying a suitable function without attempting to gain insight into the mechanism connecting inputs and outputs.
In the natural sciences, however, finding an interpretable function for a phenomenon is the prime goal as it allows to understand and generalize results.
Following the theme of the lecture by Nathan Kutz, we will consider the search for parsimonious models.
In this lecture we will consider non-linear models represented by concise analytical expressions.
The problem of finding such expressions is generally called symbolic regression.
Traditionally, this problem is solved with evolutionary search.
In recent years, machine learning methods have been proposed for this task. One of the first works in this direction was the “equation learner” (EQL), which I will cover in some detail.
Very recently, a different approach was presented that uses pretraining and language models to predict likely equations (NeSymReS).
I will also talk about this method and compare them.
An interesting aspect of the search for the most compact description of data is its connection to causal models and the identification of the true underlying relationships.
In general, prior knowledge is required to make this work, but if successful, we can enjoy great generalization capabilities.
— Lecture 10 —Abstract:
While humans often draw causal conclusions from putting observations into the broader context of causal knowledge, AI still needs to develop these techniques.
I show how causal insights can be obtained from the synergy of datasets referring to different sets of variables and argue that causal hypotheses then predict joint properties of variables that have never been observed together.
This way, causal discovery becomes a prediction task in which additional variable sets play the role of additional data points in traditional iid learning.
For instance, a causal DAG can be seen as a binary classifier that tells us which conditional independences are valid, which then enables a statistical learning theory for learning DAGs.
I describe “Causal MaxEnt” (a modified version of MaxEnt that is asymmetric with respect to causal directions) as one potential approach to infer DAGs and properties of the joint distribution from a set of marginal distributions of subsets of variables and derive causal conclusions for toy examples.Recommended reading:
- Janzing, D. (2018): Merging joint distributions via causal model classes with low VC dimension. arXiv:1804.03206.
- Garrido Mejia, S. et al. (2021): Obtaining causal information by merging data sets with MaxEnt. arXiv:2107.07640. [optional]
- Janzing, D. (2021): Causal versions of Maximum Entropy and Principle of Insufficient Reason arXiv:2102.03906. [optional]
— Lecture 11 —Abstract:
Under algorithmic triage, a machine learning model does not predict all instances but instead defers some of them to human experts.
The motivation that underpins learning under algorithmic triage is the observation that, while there are high-stake tasks where machine learning models have matched, or even surpassed, the average performance of human experts, they are still less accurate than human experts on some instances, where they make far more errors than average.
The main promise is that, by working together, human experts and machine learning models are likely to achieve a considerably better performance than each of them would achieve on their own.
In this talk, I will present several algorithms to learn under algorithmic triage that we have developed in recent years, discuss their theoretical properties, and present a variety of experimental results demonstrating their potential in improving medical diagnosis, content moderation and scientific discovery.Recommended reading:
- Okati, N. et al. (2021): Differentiable Learning Under Triage. arXiv:2103.08902.
— Lecture 12 —Abstract:
Stein’s method is a powerful tool from probability theory for bounding the distance between probability distributions.
In this talk, I will describe how this tool designed to prove central limit theorems can be adapted to assess and improve the quality of practical inference procedures.
Along the way, I will highlight applications to Markov chain Monte Carlo sampler selection, goodness-of-fit testing, variational inference, de novo sampling, post-selection inference, and non-convex optimization, and close with several opportunities for future work.Recommended reading:
- Anastasiou, A. et al. (2021): Stein’s Method Meets Statistics: A Review of Some Recent Developments. arXiv:2105.03481. [Sections 1, 2, 4, and 5]
- Gorham, J. and Mackey, L. (2017): Measuring Sample Quality with Kernels. arXiv:1703.01717. [optional]
- Gorham, J. and Mackey, L. (2015): Measuring Sample Quality with Stein’s Method. arXiv:1506.03039. [optional]