Schedule for: 22w5092 - Advances in Stein’s method and its applications in Machine Learning and Optimization

Beginning on Sunday, April 10 and ending Friday April 15, 2022

All times in Banff, Alberta time, MDT (UTC-6).

Sunday, April 10
16:00 - 17:30 Check-in begins at 16:00 on Sunday and is open 24 hours (Front Desk - Professional Development Centre)
17:30 - 19:30 Dinner
A buffet dinner is served daily between 5:30pm and 7:30pm in Vistas Dining Room, top floor of the Sally Borden Building.
(Vistas Dining Room)
20:00 - 22:00 Informal gathering (TCPL Foyer)
Monday, April 11
07:00 - 08:45 Breakfast
Breakfast is served daily between 7 and 9am in the Vistas Dining Room, the top floor of the Sally Borden Building.
(Vistas Dining Room)
08:45 - 09:00 Introduction and Welcome by BIRS Staff
A brief introduction to BIRS with important logistical information, technology instruction, and opportunity for participants to ask questions.
(TCPL 201)
09:00 - 10:00 Adil Salim: Stein Variational Gradient Descent, an optimization algorithm for sampling.
Sampling and optimization are fundamental tasks of machine learning. While the literature on optimization for machine learning has developed widely in the past decade, with fine convergence rates for some methods, the literature on sampling remained mainly asymptotic until very recently. Stein Variational Gradient Descent (SVGD) algorithm is a sampling algorithm that was derived in 2016 by Liu & Wang by taking advantage of a "kernelized" version of Stein's method. Besides, SVGD can be seen as an optimization algorithm over a space of probability measures to minimize the Kullback-Leibler divergence w.r.t. the target distribution. I will review these two points of view on SVGD, and show how they are equivalent through the prism of differential calculus over the Wasserstein space. In particular, this analogy allows to provide a quantitative convergence rate for SVGD in the so called population limit.
(TCPL 201)
10:00 - 10:30 Coffee Break (TCPL Foyer)
10:30 - 11:25 Giovanni Peccati: An introduction to the Malliavin-Stein method
Introduced by I. Nourdin and G. Peccati in 2009, the "Malliavin-Stein method" is a collection of probabilistic techniques, allowing one to derive explicit analytic bounds on the normal and non-normal approximation of smooth functionals of Gaussian fields or point processes, by means of infinite-dimensional integration by parts formulae. Originally introduced in order to quantitatively study the fluctuations of Gaussian-subordinated sequences, the scope of applications of the Malliavin-Stein method has never ceased to grow - and now touches domains as diverse as computer sciences, stochastic geometry, mathematical statistics, mathematical physics, and compressed sensing. In this talk, I will introduce the main elements of the Malliavin-Stein method, and describe some distinguished applications - with special emphasis on functional estimates related to the classical Poincaré and log-Sobolev inequalities.
(Online)
11:25 - 11:30 Lester Mackey: Virtualchair walkthrough
VC cheatsheet: https://docs.google.com/document/d/1vv9wFidXrbOuP2gdrOV_zujbBp7qjZTvz3YruMUtsio/edit VC link: https://link.virtualchair.net/birs/QXKvhx69xSvAA2Syg3qj/join
(Online)
11:30 - 13:00 Lunch
Lunch is served daily between 11:30am and 1:30pm in the Vistas Dining Room, the top floor of the Sally Borden Building.
(Vistas Dining Room)
13:00 - 14:00 Chris Oates: Sampling and Stein’s method
As a statistical paradigm, Bayesian inference is conceptually simple and elegant. However, the computational challenge of sampling from the posterior distribution represents a major practical restriction on the class of models that can be analysed. Stein's method has recently emerged as a powerful tool in computational statistics, being used to construct novel sampling methods that have, in certain situations, out-performed the state-of-the-art. This tutorial will explain how Stein's method can be used to transform a sampling problem into an optimisation problem, before introducing several different optimisation algorithms that each give rise to a practical computational method for Bayesian inference.
(Online)
14:00 - 14:20 Group Photo
Meet in foyer of TCPL to participate in the BIRS group photo. The photograph will be taken outdoors, so dress appropriately for the weather. Please don't be late, or you might not be in the official group photo!
(TCPL Foyer)
14:15 - 15:15 Guided Tour of The Banff Centre
Meet in the PDC front desk for a guided tour of The Banff Centre campus.
(PDC Front Desk)
15:00 - 15:30 Coffee Break (TCPL Foyer)
15:45 - 17:00 Krishnakumar Balasubramanian: Working group (Online)
17:30 - 19:30 Dinner
A buffet dinner is served daily between 5:30pm and 7:30pm in Vistas Dining Room, top floor of the Sally Borden Building.
(Vistas Dining Room)
Tuesday, April 12
07:00 - 08:45 Breakfast
Breakfast is served daily between 7 and 9am in the Vistas Dining Room, the top floor of the Sally Borden Building.
(Vistas Dining Room)
09:00 - 10:00 Max Fathi: Stein's method for stability of variational problems in spaces of probability measures
In this talk, I will explain how approximate optimizers in certain variational problems involving measures satisfy approximate integration-by-parts formulas. As a consequence, we can use Stein's method to show that approxmate minimizers are close to actual minimizers in some sense. This method will be illustrated by several examples of applications to sharp concentration inequalities and to Riemannian geometry.
(Online)
10:00 - 10:30 Coffee Break (TCPL Foyer)
10:30 - 11:30 Ye He: Regularized Stein Variational Gradient Descent
In this talk, we will present a regularized formulation of the SVGD algorithm and present several results in the mean-filed limit setting. We start with a motivation for the proposed regularization and highlight how the regularized SVGD formulation relates to the Langevin diffusion. Next, we provide convergence results for the regularized SVGD in both the continuous-time and discrete-time setting (with the space being continuous in both cases). Finally, we provide existence and stability results for the non-linear PDE that arises as the mean-field limit of the regularized SVGD formulation.
(Online)
11:30 - 13:00 Lunch
Lunch is served daily between 11:30am and 1:30pm in the Vistas Dining Room, the top floor of the Sally Borden Building.
(Vistas Dining Room)
12:30 - 15:00 Krishnakumar Balasubramanian: Working Group (TCPL 201)
15:00 - 15:30 Coffee Break (TCPL Foyer)
15:30 - 17:00 Krishnakumar Balasubramanian: Working Group (Online)
17:30 - 19:30 Dinner
A buffet dinner is served daily between 5:30pm and 7:30pm in Vistas Dining Room, top floor of the Sally Borden Building.
(Vistas Dining Room)
Wednesday, April 13
07:00 - 08:45 Breakfast
Breakfast is served daily between 7 and 9am in the Vistas Dining Room, the top floor of the Sally Borden Building.
(Vistas Dining Room)
09:00 - 10:00 Joseph Yukich: Multivariate Second Order Poincare Inequalities for Statistics in Geometric Probability
We establish multivariate second order Poincar\'e inequalities for Poisson functionals. The bounds are expressed as integrated moments of first and second order difference operators. These general results are shown to provide rates of multivariate normal convergence for a large class of vectors $(H_s^{(1)},\hdots,H_s^{(m)})$, $s \geq 1$, of statistics of marked Poisson processes on $\R^d$, $d \geq 2$, as the intensity parameter $s$ tends to infinity. The results are applicable whenever the functionals $H_s^{(i)}$, $i\in\{1,\hdots,m\}$, are expressible as sums of exponentially stabilizing score functions satisfying a moment condition. The rates are for the $d_2$-, $d_3$-, and $d_{convex}$-distances and are in general unimprovable.
(Online)
10:00 - 10:30 Coffee Break (TCPL Foyer)
10:30 - 11:30 Jiaxin Shi: Sampling with Mirrored Stein Operators
Accurately approximating an unnormalized distribution with a discrete sample is a fundamental challenge in machine learning and statistical inference. Particle evolution methods like Stein variational gradient descent tackle this challenge by applying deterministic updates to particles to sequentially minimize Kullback-Leibler divergence. However, these methods break down for constrained targets and fails to exploit informative non-Euclidean geometry. In this talk, I will introduce a new family of particle evolution samplers suitable for constrained domains and non-Euclidean geometries. These samplers are derived from a new class of Stein operators and have deep connections with Riemannian Langevin diffusion, mirror descent, and natural gradient descent. We demonstrate that these new samplers yield accurate approximations to distributions on the simplex, deliver valid confidence intervals in post-selection inference, and converge more rapidly than prior methods in large-scale unconstrained posterior inference. Finally, we establish the convergence of our new procedures under verifiable conditions on the target distribution.
(TCPL 201)
11:30 - 13:00 Lunch
Lunch is served daily between 11:30am and 1:30pm in the Vistas Dining Room, the top floor of the Sally Borden Building.
(Vistas Dining Room)
13:30 - 17:30 Free Afternoon (Banff National Park)
17:30 - 19:30 Dinner
A buffet dinner is served daily between 5:30pm and 7:30pm in Vistas Dining Room, top floor of the Sally Borden Building.
(Vistas Dining Room)
Thursday, April 14
07:00 - 08:45 Breakfast
Breakfast is served daily between 7 and 9am in the Vistas Dining Room, the top floor of the Sally Borden Building.
(Vistas Dining Room)
09:00 - 10:00 Arthur Gretton: A Kernel Stein Test for Comparing Latent Variable Models
The kernel Stein discrepancy (KSD) is a measure of fit between model and data, based on a Stein-modified kernel integral probability metric. The KSD is widely used in statistical tests of goodness-of-fit, since it does not require the normalising constant of the model to be known. I will describe a generalisation of the KSD, which incorporates models with latent variables, even when marginalisation over the latents is intractable. This generalised KSD is used in a relative goodness-of-fit test, where two alternative models are compared against the data: this reflects our understanding that "all models are wrong," but that some models are better than others. The relative test will be demonstrated in two settings: probabilistic PCA, and a Latent Dirichlet Allocation topic model. I will compare against the relative Maximum Mean Discrepancy test, which is based on samples from the models, and does not exploit the latent structure.
(Online)
10:00 - 10:30 Coffee Break (TCPL Foyer)
10:30 - 11:30 Denny Wu: High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation
We study the first gradient step on the first-layer parameters $\boldsymbol{W}\in\R^{d \times N}$ of a width-N two-layer neural network, where the training objective is the empirical MSE loss: $\frac{1}{n}\sum_{i=1}^n (f(\boldsymbol{x}_i)-y_i)^2$. In the proportional asymptotic limit: $n,d,N\to\infty$ at the same rate, and an idealized student-teacher setting, we show that the first gradient update contains a rank-1 "spike", which results in the alignment between the first-layer weights and the linear component of the teacher model $f^*$. To characterize the impact of this alignment, we compute the prediction risk of ridge regression on the conjugate kernel after one gradient step on $\vW$ with learning rate $\eta$, when $f^*$ is a single-index model. We consider two scalings of the first step learning rate $\eta$. For small $\eta$, based on a recently established Stein-CLT in the feature space, we prove a Gaussian equivalence property for the trained feature map; this allows us to show that the learned kernel improves upon the initial random features model, but cannot defeat the best linear model on the input. Whereas for sufficiently large $\eta$, we prove that for certain $f^*$, the same ridge estimator on trained features can go beyond this "linear regime" and outperform a wide range of random features and rotationally invariant kernel models.
(TCPL 201)
11:30 - 13:00 Lunch
Lunch is served daily between 11:30am and 1:30pm in the Vistas Dining Room, the top floor of the Sally Borden Building.
(Vistas Dining Room)
12:30 - 14:30 Krishnakumar Balasubramanian: Working group (TCPL 201)
15:00 - 15:30 Coffee Break (TCPL Foyer)
15:30 - 17:00 Krishnakumar Balasubramanian: Working group (TCPL 201)
17:30 - 19:30 Dinner
A buffet dinner is served daily between 5:30pm and 7:30pm in Vistas Dining Room, top floor of the Sally Borden Building.
(Vistas Dining Room)
Friday, April 15
07:00 - 08:45 Breakfast
Breakfast is served daily between 7 and 9am in the Vistas Dining Room, the top floor of the Sally Borden Building.
(Vistas Dining Room)
10:00 - 10:30 Coffee Break (TCPL Foyer)
10:30 - 11:00 Checkout by 11AM
5-day workshop participants are welcome to use BIRS facilities (TCPL ) until 3 pm on Friday, although participants are still required to checkout of the guest rooms by 11AM.
(Front Desk - Professional Development Centre)
12:00 - 13:30 Lunch from 11:30 to 13:30 (Vistas Dining Room)