Probability and Statistics Seminar 2024-2025

NEW [May 2021]: Our YOUTUBE CHANNEL contains videos of some recent talks.

Fridays 3:30pm in KAP 414; tea is usually provided at 3:00pm

NEW [Aug 2023]: To sign up for our MAILING LIST, email: sympa@mymaillists.usc.edu with subject: Subscribe probstat-l Firstname Lastname   [besides the subject line, the email itself should be empty]

Organizers: Xiaohui Chen <xiaohuic@usc.edu> Evgeni Dimitrov <edimitro@usc.edu> Steven Heilman <stevenmheilman@gmail.com> Stanislav Minsker <minsker@usc.edu> Yizhe Zhu <yizhezhu@usc.edu>

For seminars/colloquia on other topics, see the Department of Mathematics webpage.

Older seminars: Fall 2018,  Spring 2018Fall 2017Spring 2017,  Fall 2016,  Spring 2016,  Fall 2015,  Spring 2015,  Fall 2014,  2013-2014


Spring 2025:


January 17: Bradley Rava (University of Sydney’s Business School)

Ask for More Than Bayes Optimal: A Theory of Indecisions for Classification

Selective classification frameworks are useful tools for automated decision making in highly risky scenarios, since they allow for a classifier to only make highly confident decisions, while abstaining from making a decision when it is not confident enough to do so, which is otherwise known as an indecision. For a given level of classification accuracy, we aim to make as many decisions as possible. For many problems, this can be achieved without abstaining from making decisions. But when the problem is hard enough, we show that we can still control the misclassification rate of a classifier up to any user specified level, while only abstaining from the minimum necessary amount of decisions, even if this level of misclassification is smaller than the Bayes optimal error rate. In many problem settings, the user could obtain a dramatic decrease in misclassification while only paying a comparatively small price in terms of indecisions.

February 14 (tentative): Po-Ling Loh (University of Cambridge)

February 21: Lijun Ding (UCSD)

March 7: Pedro Abdalla Teixeira (UCI)

March 28: Nikita Zhivotovskiy (UC Berkeley)

April 4: Bohan Zhou (UCSB)


Fall 2024:


August 30: Yiyun He (UC Irvine)

Differentially Private Algorithms for Synthetic Data

We present a highly effective algorithmic approach, PMM, for generating differentially private synthetic data in a bounded metric space with near-optimal utility guarantees under the 1-Wasserstein distance. In particular, for a dataset in the hypercube [0,1]^d, our algorithm generates synthetic dataset such that the expected 1-Wasserstein distance between the empirical measure of true and synthetic dataset is O(n^{-1/d}) for d>1. Our accuracy guarantee is optimal up to a constant factor for d>1, and up to a logarithmic factor for d=1. Also, PMM is time-efficient with a fast running time of O(\epsilon d n). Derived from the PMM algorithm, more variations of synthetic data publishing problems can be studied under different settings.

September 6: Omer Tamuz (Caltech)

Asymptotic Renyi Entropies of Random Walks on Groups

We introduce asymptotic Renyi entropies as a parameterized family of invariants for random walks on groups. These invariants interpolate between various well-studied properties  of the random walk, including the growth rate of the group, the Shannon entropy, and the spectral radius. They furthermore offer large deviation counterparts  of the Shannon-McMillan-Breiman Theorem. We prove some basic properties of asymptotic Renyi entropies that apply to all groups, and discuss their analyticity and positivity for the free group and lamplighter groups.

Joint with Kimberly Golubeva and Minghao Pan

September 20: Weixin Yao (UC Riverside)

New Regression Model: Modal Regression
Built on the ideas of mean and quantile, mean regression and quantile regression are extensively investigated and popularly used to model the relationship between a dependent variable Y and covariates x. However, the research about the regression model built on the mode is rather limited. In this talk, we propose a new regression tool, named modal regression, that aims to find the most probable conditional value (mode) of a dependent variable Y given covariates x rather than the mean that is used by the traditional mean regression. The modal regression can reveal new interesting data structure that is possibly missed by the conditional mean or quantiles. In addition, modal regression is resistant to outliers and heavy-tailed data, and can provide shorter prediction intervals when the data are skewed. Furthermore, unlike traditional mean regression, the modal regression can be directly applied to the truncated data. Modal regression could be a potentially very useful regression tool that can complement the traditional mean and quantile regressions.

October 4: Richard Y. Zhang (UIUC)

Rank Overparameterization and Global Optimality Certification for Large-Scale Low-rank Optimization

Numerous important problems across applied statistics reduce into nonconvex estimation / optimization over a low-rank matrix. In principle, these can be reliably solved to global optimality via convex relaxation, but the computational costs can become prohibitive on a large scale. In practice, it is much more common to optimize over the low-rank matrices directly, as in the Burer-Monteiro approach, but their nonconvexity can cause failure by getting stuck at a spurious local minimum. For safety-critical societal applications, such as the operation and planning of an electricity grid, our inability to reliably achieve global optimality can have significant real-world consequences.

In the first part of this talk, we investigate how overparameterizing the low-rank factorization can render its nonconvexity increasingly benign. In the smooth and strongly-convex setting, we rigorously show that, as the rank is increased, spurious local minima become increasingly rare in a step-wise fashion. In other words, rank-2 has fewer spurious local minima than rank-1, and rank-3 has fewer than rank-2, etc.  Once the rank exceeds an O(1) threshold, every remaining local minimum is a global minimum, and every saddle point can be escaped. In the second part of this talk, we use the rank deficiency brought on by rank overparameterization to certify convergence to global optimality after the fact. The certification is an a posteriori guarantee that is valid under much weaker assumptions than typical “no spurious local minima” guarantees. However, rank deficiency significantly slows down the convergence of gradient descent, from a linear rate to a sublinear rate. We propose an inexpensive preconditioner that restores the convergence rate of gradient descent back to linear in the overparameterized case.

Main related papers:
https://arxiv.org/abs/2207.01789
https://arxiv.org/abs/2206.03345 (joint work with Gavin Zhang and Salar Fattahi)

October 11: [no classes]

October 18: Camilo Hernández (USC ISE)

The mean field Schrödinger problem: a mean field control perspective

The mean field Schrödinger problem (MFSP) is the problem of finding the most likely path of a McKean-Vlasov type particle with constrained initial and final configurations. It was first introduced by Backhoff et al. (2020), who studied its existence and long-time behavior. This talk aims to show how ideas from mean field control theory allow us to derive new interesting results on the MFSP. In particular, we study its existence, characterization, and the so-called convergence problem. The method rests upon studying suitably penalized problems and stochastic control techniques. This talk is based on a joint work with Ludovic Tangpi (Princeton).

October 25: Yizhe Zhu (USC)

Non-convex matrix sensing: Breaking the quadratic rank barrier in the sample complexity

For the problem of reconstructing a low-rank matrix from a few linear measurements, two classes of algorithms have been widely studied in the literature: convex approaches based on nuclear norm minimization, and non-convex approaches that use factorized gradient descent. Under certain statistical model assumptions, it is known that nuclear norm minimization recovers the ground truth as soon as the number of samples scales linearly with the number of degrees of freedom of the ground truth. In contrast, while non-convex approaches are computationally less expensive, existing recovery guarantees assume that the number of samples scales at least quadratically with the rank. In this talk, we consider the problem of reconstructing a positive semidefinite matrix from a few Gaussian measurements. We improve the previous rank-dependence in the sample complexity of non-convex matrix factorization from quadratic to linear. Our proof relies on a probabilistic decoupling argument, where we show that the gradient descent iterates are only weakly dependent on the individual entries of the measurement matrices. Joint work with Dominik Stöger (KU Eichstätt-Ingolstadt).

November 1: Wojciech Ozanski (Florida State University)
[Joint with PDE seminar]

Instantaneous continuous loss of regularity for the SQG equation

The issue of loss of regularity of unique solutions to the 3D incompressible Euler equations is an important open question of fluid mechanics, and is closely related to the emergence of turbulence. We will discuss recent results regarding loss of regularity of solutions of the 2D and 3D Euler equations, and of the surface quasi-geostrophic equations (SQG), which is a well-established 2D model equation of the 3D Euler equations. We will discuss a result of continuous-in-time loss of Sobolev regularity of solutions to the SQG equation.  Namely, given $s\in (3/2,2)$ and $\varepsilon >0$, we will describe a construction of a compactly supported initial data $\theta_0$ such that $\| \theta_0 \|_{H^s}\leq \varepsilon$ and there exist $T>0$, $c>0$ and a local-in-time solution $\theta$ of the SQG equation such that $ \theta (\cdot ,t )$ belongs to ${H^{s/(1+ct)}}$ and does not belong to any other ${H^\beta }$, where $\beta > s/(1+ct)$. Moreover $\theta$ is continuous and differentiable on $\R^2\times [0,T]$, and is unique among all solutions with initial condition $\theta_0$ which belong to $C([0,T];H^{1+\alpha })$ for any $\alpha >0$.
This is the first result of this kind in incompressible fluid mechanics. It is also the first ill-posedness result in the supercritical regime which has compact support in space.

November 8: Johannes Wiesel (CMU)
[Joint with Mathematical Finance seminar]

Bounding adapted Wasserstein metrics

The Wasserstein distance $\mathcal{W}_p$ is an important instance of an optimal transport cost. Its numerous mathematical properties as well as applications to various fields such as mathematical finance and statistics have been well studied in recent years. The adapted Wasserstein distance $\mathcal{A}\mathcal{W}_p$ extends this theory to laws of discrete time stochastic processes in their natural filtrations, making it particularly well suited for analyzing time-dependent stochastic optimization problems.

While the topological differences between $\mathcal{A}\mathcal{W}_p$ and $\mathcal{W}_p$ are well understood, their differences as metrics remain largely unexplored beyond the trivial bound $\mathcal{W}_p\lesssim \mathcal{A}\mathcal{W}_p$. This paper closes this gap by providing upper bounds of $\mathcal{A}\mathcal{W}_p$ in terms of $\mathcal{W}_p$ through investigation of the smooth adapted Wasserstein distance. Our upper bounds are explicit and are given by a sum of $\mathcal{W}_p$, Eder’s modulus of continuity and a term characterizing the tail behavior of measures. As a consequence, upper bounds on $\mathcal{W}_p$ automatically hold for $\mathcal{AW}_p$ under mild regularity assumptions on the measures considered. A particular instance of our findings is the inequality $\mathcal{A}\mathcal{W}_1\le C\sqrt{\mathcal{W}_1}$ on the set of measures that have Lipschitz kernels.

Our work also reveals how smoothing of measures affects the adapted weak topology. In fact, we find that the topology induced by the smooth adapted Wasserstein distance exhibits a non-trivial interpolation property, which we characterize explicitly: it lies in between the adapted weak topology and the weak topology, and the inclusion is governed by the decay of the smoothing parameter.

This talk is based on joint work with Jose Blanchet, Martin Larsson and Jonghwa Park.

November 15: Greta Panova (USC)

Algebra meets probability: permutons from pipe dreams via integrable probability

Pipe dreams are tiling models originally introduced to study objects related to the Schubert calculus and K-theory of the Grassmannian. They can also be viewed as ensembles of lattice walks with various interaction constraints. We determine the typical permutations and reveal new interesting limiting objects (permutons), which is proved via the theory of the Totally Asymmetric Simple Exclusion Process (TASEP). Deeper connections with free fermion 6 vertex models (Alternating Sign Matrices) and domino tilings of the  Aztec diamond allowed us to describe the extreme cases of the original algebraic problem. As a byproduct we determine the permutations maximizing the principle specialization of the Grothendieck polynomials at beta=1. This sheds further light on the original question of Stanley to determine the maximal principle specialization of Schubert polynomials, which correspond to a statistical mechanics model with long-range interactions.
Based on joint work with A. H. Morales, L. Petrov, D. Yeliussizov.

November 22: Gil Kur (ETH Zürich)  

Note the non-standard time and location: 1pm in KAP 265

Minimum Norm Interpolation Meets The Local Theory of Banach Spaces

Minimum-norm interpolators have recently gained attention as an analyzable model to shed light on the double descent phenomenon observed for neural networks.  Most of the work has focused on analyzing interpolators in Hilbert spaces, where, typically, an effectively low-rank structure of the feature covariance prevents a large bias. This work takes a first step towards establishing a general framework that connects generalization properties of the interpolators to well-known concepts from high-dimensional geometry, specifically, from the local theory of Banach spaces.

November 29: [no classes]

December 6: Elliot Paquette (McGill) [2PM! Nonstandard time!]

Random matrix theory for high dimensional optimization, and an application to scaling laws

We describe a program of analysis of stochastic gradient methods on high-dimensional random objectives.  We illustrate some assumptions under which the loss curves are universal, in that they can completely be described in terms of some underlying covariance structure of the problem setup.   Furthermore, we give a description of these loss curves that can be analyzed precisely. 

As a motivating application, we show how this can be applied to the power-law-random-features model.  This is a simple two-hyperparameter family of optimization problems, which displays 5 distinct phases of SGD loss curves; these phases are determined by the relative complexities of the target, data distribution, and whether these are ‘high-dimensional’ or not (which in context can be precisely defined).  In each phase, we can also give, for a given compute budget, the optimal random-feature dimensionality.

Joint work with Courtney Paquette (McGill & Google Deepmind), Jeffrey Pennington (Google Deepmind), and Lechao Xiao  (Google Deepmind).


Spring 2024:


[CANCELLED] January 12: Roman Vershynin (UC Irvine) [Undergraduate lecture]

January 19: Grigory Franguridi, 2PM, (USC Center for Economic and Social Research)

Estimation of panels with attrition and refreshment samples

It has long been established that, if a panel dataset suffers from attrition, auxiliary (refreshment) sampling may restore full identification under weak nonparametric assumptions on attrition. Despite their generality, these identification strategies have not been employed in empirical research due to the intractability of induced estimation procedures.  We show that practical estimation is nevertheless possible without parametric approximations. Our two-step estimation algorithm utilizes the well-known iterative proportional fitting procedure, which does not require optimization and exhibits fast convergence even with continuous data. We show that our estimator is consistent and asymptotically normal under smoothness assumptions and appropriate choice of kernel and bandwidth. We also demonstrate its excellent performance in simulations and provide an empirical illustration using the Understanding of America survey panel.
(with Jinyong Hahn, Pierre Hoonhout, Arie Kapteyn, and Geert Ridder)

January 26: Mahdi Soltanolkotabi (USC)

Foundations for feature learning via gradient descent

One of the key mysteries in modern learning is that a variety of models such as deep neural networks when  trained via (stochastic) gradient descent can extract useful features and learn high quality representations directly from data simultaneously with fitting the labels. This feature learning capability is also at the forefront of the recent success of a variety of contemporary paradigms such as transformer architectures, self-supervised and transfer learning. Despite a flurry of exciting activity over the past few years, existing theoretical results are often too crude and/or pessimistic to explain feature/representation learning in practical regimes of operation or serve as a guiding principle for practitioners. Indeed, existing literature often requires unrealistic hyperparameter choices (e.g. very small step sizes, large initialization or wide models).  In this talk I will focus on demystifying this feature/representation learning phenomena for a variety of problems spanning single index models, low-rank factorization, matrix reconstruction, and neural networks. Our results are based on an intriguing spectral bias phenomena for gradient descent, that puts the iterations on a particular trajectory towards solutions that are not only globally optimal but also generalize well by simultaneously finding good features/representations of the data while fitting to the labels. The proofs combine ideas from high-dimensional probability/statistics, optimization and nonlinear control to develop a precise analysis of model generalization along the trajectory of gradient descent. Time permitting, I will explain the implications of these theoretical results for more contemporary use cases including transfer learning, self-attention, prompt-tuning via transformers and simple self-supervised learning settings.

February 2: Timothy Armstrong (USC)

Robust Estimation and Inference in Panels with Interactive Fixed Effects

We consider estimation and inference for a regression coefficient in panels with interactive fixed effects (i.e., with a factor structure). We show that previously developed estimators and confidence intervals (CIs) might be heavily biased and size-distorted when some of the factors are weak. We propose estimators with improved rates of convergence and bias-aware CIs that are uniformly valid regardless of whether the factors are strong or not. Our approach applies the theory of minimax linear estimation to form a debiased estimate using a nuclear norm bound on the error of an initial estimate of the interactive fixed effects. We use the obtained estimate to construct a bias-aware CI taking into account the remaining bias due to weak factors. In Monte Carlo experiments, we find a substantial improvement over conventional approaches when factors are weak, with little cost to estimation error when factors are strong.

Paper link: Timothy B. Armstrong, Martin Weidner, Andrei Zeleneev. https://arxiv.org/abs/2210.06639

February 9: Yuehao Bai (USC)

On the Efficiency of Finely Stratified Experiments

This paper studies the efficient estimation of a large class of treatment effect parameters that arise in the analysis of experiments. Here, efficiency is understood to be with respect to a broad class of treatment assignment schemes for which the marginal probability that any unit is assigned to treatment equals a pre-specified value, e.g., one half. Importantly, we do not require that treatment status is assigned in an i.i.d. fashion, thereby accommodating complicated treatment assignment schemes that are used in practice, such as stratified block randomization and matched pairs. The class of parameters considered are those that can be expressed as the solution to a restriction on the expectation of a known function of the observed data, including possibly the pre-specified value for the marginal probability of treatment assignment. We show that this class of parameters includes, among other things, average treatment effects, quantile treatment effects, local average treatment effects as well as the counterparts to these quantities in experiments in which the unit is itself a cluster. In this setting, we establish two results. First, we derive a lower bound on the asymptotic variance of estimators of the parameter of interest in the form of a convolution theorem. Second, we show that the na ̈ıve method of moments estimator achieves this bound on the asymptotic variance quite generally if treatment is assigned using a “finely stratified” design. By a “finely stratified” design, we mean experiments in which units are divided into groups of a fixed size and a proportion within each group is assigned to treatment uniformly at random so that it respects the restriction on the marginal probability of treatment assignment. In this sense, “finely stratified” experiments lead to efficient estimators of treatment effect parameters “by design” rather than through ex post covariate adjustment.

Paper link: Yuehao Bai, Jizhou Liu, Azeem M. Shaikh, Max Tabord-Meehan
https://arxiv.org/abs/2307.15181

February 23: Tryphon Georgiou (UC Irvine)

Stochastic Control meets Non-equilibrium Thermodynamics: Fundamental limits of power generation in thermodynamic engines

Thermodynamics was born in the 19th century in quest of a way to quantify efficiency of steam engines at the dawn of the industrial age. In the time since, thermodynamics has impacted virtually all other areas in science, from chemistry and biology to the physics of black holes, and yet, progress beyond the classical quasi-static limit towards finite-time thermodynamic transitions has been slow; finite-time is of essence for non-vanishing power generation. In recent years a deeper understanding of non-equilibrium processes has been achieved based on stochastic models with degrees of freedom (state variables) that are subject to Brownian excitation that models heat baths. Within this framework we will explain energy transduction, we will give insights on how anisotropy in thermal or chemical potentials can be tapped for power generation in engineered and physical processes, and we will highlight fundamental bounds on the amount of power that can drawn during finite-time thermodynamic transitions.

The talk is based on joint works with Rui Fu (UCI), Olga Movilla (UCI), Amir Taghvaei (UCI) and Yongxin Chen (GaTech). Research funding by AFOSR, ARO and NSF is gratefully acknowledged.

Monday February 26, 3:30PM, KAP 427: Yongtao Guan (Chinese University of Hong Kong, Shenzhen)

Group Network Hawkes Process

In this work, we study the event occurrences of individuals interacting in a network. To characterize the dynamic interactions among the individuals, we propose a group network Hawkes process (GNHP) model whose network structure is observed and fixed. In particular, we introduce a latent group structure among individuals to account for the heterogeneous user-specific characteristics. A maximum likelihood approach is proposed to simultaneously cluster individuals in the network and estimate model parameters. A fast EM algorithm is subsequently developed by utilizing the branching representation of the proposed GNHP model. Theoretical properties of the resulting estimators of group memberships and model parameters are investigated under both settings when the number of latent groups G is over-specified or correctly specified. A data-driven criterion that can consistently identify the true G under mild conditions is derived. Extensive simulation studies and an application to a data set collected from Sina Weibo are used to illustrate the effectiveness of the proposed methodology.

March 1: Morris Yau (MIT)

Are Neural Networks Optimal Approximation Algorithms

In this talk, we discuss the power of neural networks to compute solutions to NP-hard optimization problems focusing on the class of constraint satisfaction problems (boolean sat, Sudoku, etc.).  We find there is a graph neural network architecture (OptGNN) that captures the optimal approximation algorithm for constrainst satisfaction, up to complexity theoretic assumptions via tools in semidefinite programming.  Furthermore, OptGNN can act as a convex program solver and hence output a dual (a bound) on the optimality of a combinatorial problem.  Evaluating OptGNN on benchmarks in the neural combinatorial optimization literature, we find our approach is competitive with the state of the art unsupervised neural baselines.  We discuss further connections between neural networks and computation, and point to directions for future work.

March 8: Annie Qu (UC Irvine)

A Model-Agnostic Graph Neural Network for Integrating Local and Global Information

Graph neural networks (GNNs) have achieved promising performance in a variety of graph focused tasks. Despite their success, the two major limitations of existing GNNs are the capability of learning various-order representations and providing interpretability of such deep learning-based black-box models. To tackle these issues, we propose a novel Model-agnostic Graph Neural Network (MaGNet) framework. The proposed framework is able to extract knowledge from high-order neighbors, sequentially integrates information of various orders, and offers explanations for the learned model by identifying influential compact graph structures. In particular, MaGNet consists of two components: an estimation model for the latent representation of complex relationships under graph topology, and an interpretation model that identifies influential nodes, edges, and important node features. Theoretically, we establish the generalization error bound for MaGNet via empirical Rademacher complexity and showcase its power to represent the layer-wise neighborhood mixing. We conduct comprehensive numerical studies using both simulated data and a real-world case study on investigating the neural mechanisms of the rat hippocampus,
demonstrating that the performance of MaGNet is competitive with state-of-the-art methods.

March 15: no talk [spring break]

March 22: Xiaowu Dai (UCLA)

Kernel ordinary differential equations

The ordinary differential equation (ODE) is widely used in modelling biological and physical processes in science. A new reproducing kernelbased approach is proposed for the estimation and inference of ODE given noisy observations. The functional forms in ODE are not assumed to be known or restricted to be linear or additive, and pairwise interactions are allowed. Sparse estimation is performed to select individual functionals and construct confidence intervals for the estimated signal trajectories. The estimation optimality and selection consistency of kernel ODE are established under both the low-dimensional and high-dimensional settings, where the number of unknown functionals can be smaller or larger than the sample size. The proposal tackles several important problems that are not yet fully addressed in smoothing spline analysis of variance (SS-ANOVA) framework, and extends the existing methods of dynamic causal modeling.

March 29: Kengo Kato (Cornell)

Entropic optimal transport: limit theorems and algorithms

In this talk, I will discuss my recent work on entropic optimal transport (EOT). In the first part, I will discuss limit theorems for EOT maps, dual potentials, and the Sinkhorn divergence. The key technical tool we use is a first and second-order Hadamard differentiability analysis of EOT potentials with respect to the marginals, from which the limit theorems, bootstrap consistency, and asymptotic efficiency of the empirical estimators follow. The second part concerns the entropic Gromov-Wasserstein (EGW) distance, which serves as a computationally efficient proxy for the Gromov-Wasserstein distance. By leveraging a variational representation that ties the EGW problem with a series of EOT problems, we derive stability results of EGW with respect to the auxiliary matrix, which enables us to develop efficient algorithms for solving the EGW problem. This talk is based on joint work with Ziv Goldfeld, Gabriel Rioux, and Ritwik Sadhu.

April 5: Weixuan Xia (USC)

Set-Valued Stochastic Integrals and Convoluted Lévy Processes

In this talk, I will discuss set-valued Volterra-type stochastic integrals driven by Lévy processes. I will explain the definition of set-valued convoluted stochastic integrals by taking the closed decomposable hull of integral functionals over time, thereby extending classical definitions to convoluted integrals with square-integrable kernels. Two key insights include: (1) Aside from established results for set-valued Itô integrals, while set-valued stochastic integrals with respect to a finite-variation Poisson random measure are guaranteed to be integrally bounded for bounded integrands, this is not true when the random measure represents infinite variation; (2) It is a mutual effect of kernel singularity and jumps that the set-valued convoluted integrals are possibly explosive and can take extended vector values. These findings carry significant implications for the construction of set-valued fractional dynamical systems. Additionally, I will explore two classes of set-monotone processes for practical interests in economic and financial modeling.

April 12: Yosi Rinott (The Hebrew University of Jerusalem)

On the behavior of posterior probabilities with additional data: monotonicity and nonmonotonicity, asymptotic rates, log-concavity, and Turán’s inequality

Given a parametric model, a prior, and data, Bayesian statisticians quantify their belief that the true parameter is ϑ0 by its posterior probability. The starting question of this paper is whether the posterior at ϑ0 increases when the data are generated under ϑ0, and how it behaves when the data come from ϑ ≠ ϑ0. Can it decrease and then increase, and thus additional data may mislead Bayesian statisticians?

For data arriving sequentially, we consider monotonicity properties of the posterior probabilities as a function of the sample size with respect to certain stochastic orders, specifically starting with likelihood ratio dominance.
When the data is generated by ϑ ≠ ϑ0 , Doob’s consistency theorem says that the posterior at ϑ0converges a.s. to zero and therefore its expectation converges to zero. We obtain precise asymptotic rates of the latter convergence for observations from an exponential family and show that the expectation of the ϑ0 -posterior under  ϑ ≠ ϑ0 is eventually strictly decreasing. Finally, we show that in a number of interesting cases this expectation is a log-concave function of the sample size, and thus unimodal. In the Bernoulli case we obtain this result by developing an inequality that is related to Turán’s inequality for Legendre polynomials.

The talk is based on a joint work with Sergiu Hart.

April 19: Andrew Holbrook (UCLA, Department of Biostatistics)

Quantum Markov chain Monte Carlo(s)

I discuss how one can use quantum circuits to accelerate multiproposal MCMC and point to promising avenues of future research, including quantum HMC,  quantum-accelerated nonreversible MCMC and quantum-accelerated locally-balanced MCMC.

April 26: Sui Tang (UCSB)

Learning interaction kernels in particle and agent based systems

Interacting particle systems showcase a variety of collective behaviors and are fundamental to many scientific and engineering domains, such as the flocking of birds and the milling of fish. These systems are typically modeled using differential equations to elucidate how individual behaviors drive collective dynamics, an essential inquiry across multiple disciplines. Although recent theoretical and numerical studies have successfully replicated many qualitative collective patterns seen in nature, there is still a notable deficiency in quantitatively matching these models with empirical data.

We explore the data-driven discovery of latent interaction kernels from observed trajectory data in particle and agent-based systems. We discuss recent findings in stochastic systems where interaction kernels are derived from pairwise distances and introduce a nonparametric inference strategy using a regularized maximum likelihood estimator for precise kernel estimation. We show this approach can achieve near-optimal convergence rates, regardless of the state space dimensionality when dealing with multiple trajectory data. Additionally, we conduct error analysis related to discrete-time observations and validate our methodology through numerical experiments on models such as stochastic opinion dynamics and the Lennard-Jones potential. Moreover, we also consider microscopic models and  advance our approach to estimate nonlocal interaction potentials in  aggregation-diffusion equations from noisy data, using sparsity-promoting techniques. This research is conducted in collaboration with Fei Lu, Mauro Maggioni, Jose A. Carrillo, Gissell Estrada-Rodriguez, and Laszlo Mikolas.

 


Fall 2023:


Wednesday Aug 23 2PM, KAP 245: Wei Biao Wu (U Chicago)

Fast Algorithms for Estimating Covariance Matrices of Stochastic Gradient Descent Solutions

Stochastic gradient descent (SGD), an important optimization method in machine learning, is widely used for parameter estimation especially in online setting where data comes in stream. While this recursive algorithm is popular for the computation and memory efficiency, it suffers from randomness of the solutions. In this talk we shall estimate the asymptotic covariance matrices of the averaged SGD iterates (ASGD) in a fully online fashion. Based on the recursive estimator and classic asymptotic normality results of ASGD, we can conduct online statistical inference of SGD estimators and construct asymptotically valid confidence intervals for model parameters. The algorithm for the recursive estimator is efficient and only uses SGD iterates: upon receiving new observations, we update the confidence intervals at the same time as updating the ASGD solutions without extra computational or memory cost. This approach fits in online setting even if the total number of data is unknown and takes the full advantage of SGD: computation and memory efficiency. This work is joint with Wanrong Zhu and Xi Chen.

Wednesday Aug 23 330PM, KAP 245: Danna Zhang (UCSD)

Test of Independence Based on Generalized Distance Correlation

We study the fundamental statistical inference concerning the testing of independence between two random vectors. Existing asymptotic theories for test statistics based on distance covariance can only apply to either low dimensional or high dimensional settings. In this work we develop a novel unified distributional theory of the sample generalized distance covariance that works for random vectors of arbitrary dimensions. In particular, a non-asymptotic error bound on its Gaussian approximation is derived. Under fairly mild moment conditions, the asymptotic null distribution of the sample generalized distance covariance is shown to be distributed as a linear combination of independent and identically distributed chi-squared random variables. We also show high dimensionality is necessary for the null distribution to be asymptotically normal. To estimate the asymptotic null distribution practically, we propose an innovative Half-Permutation procedure and provide the theoretical justification for its validity. The exact asymptotic distribution of the resampling distribution is derived under general marginal moment conditions and the proposed procedure is shown to be asymptotically equivalent to the oracle procedure with known marginal distributions.

Sep 1: Omer Tamuz (Caltech)

On the origin of the Boltzmann distribution

The Boltzmann distribution is used in statistical mechanics to describe the distribution of states in systems with a given temperature. We give a novel characterization of this distribution as the unique one satisfying independence for uncoupled systems. The theorem boils down to a statement about symmetries of the convolution semigroup of finitely supported probability measures on the natural numbers, or, alternatively, about symmetries of the multiplicative semigroup of polynomials with non-negative coefficients.  (Joint with Fedor Sandomirskiy.)

Sep 8: Xiaohui Chen (USC)

Statistical optimality and computational guarantee for K-means clustering

K-means clustering a widely used machine learning method to group data points based on their similarity. Solving the K-means problem is computationally hard and various tractable (or even scalable) approximation algorithms (Lloyd, spectral, nonnegative matrix factorization, semidefinite programming) have been explored in theory and practice. In this talk, I will discuss some recent developments from my research group on the statistical optimality and computational guarantee for relaxed formulations of K-means. To break the computational bottleneck, we leverage convex and non-convex optimization techinques to tackle the clustering problem and present their achievement of an information-theoretic sharp threshold for exact recovery of cluster labels under the standard Gaussian mixture model. Joint work with Yun Yang (UIUC), Yubo Zhuang (UIUC), Richard Y. Zhang (UIUC).

Sep 22: Anna Ma (UCI)

The Kaczmarz algorithm and doubly noisy linear systems

Large-scale linear systems, Ax=b, frequently arise in data science and scientific computing at massive scales, thus demanding effective iterative methods to solve them. Often, these systems are noisy due to operational errors or faulty data-collection processes. In the past decade, the randomized Kaczmarz algorithm (RK) was studied extensively as an efficient iterative solver for such systems. However, the convergence study of RK in the noisy regime is limited and only considers measurement noise in the right-hand side vector, b. Practically, that is only sometimes the case, as the coefficient matrix A can also be noisy. But is noise inherently bad? In this talk, we motivate and discuss doubly noise linear systems and the performance of the Kaczmarz-type algorithms applied to such systems, highlighting that noise can potentially be exploited. The presented work is joint work with El Houcine Bergou, Soumia Boucherouite, Aritra Dutta, and Xin Li.

Sep 29: Mladen Kolar (USC & U Chicago)

Latent multimodal functional graphical model estimation

Joint multimodal functional data acquisition, where data multiple modes of functional data are measured from the same subject simultaneously, has emerged as an exciting modern approach enabled by recent engineering breakthroughs in the neurological and biological sciences. One prominent motivation for acquiring such data is to enable new discoveries of the underlying connectivity by combining signals from multiple modalities. Yet, despite scientific interest in this problem, there remains a gap in principled statistical methods for estimating the graph underlying joint multimodal functional data. To this end, we propose a new integrative framework to estimate a single latent graphical model from multimodal functional data. We take the generative perspective by modeling the data generation process, from which we identify the inverse mapping from the observation space to the latent space as transformation operators. We then develop an estimator that simultaneously estimates the transformation operators and the latent graph via functional neighborhood regression. The approach is motivated and applied to analyzing simultaneously acquired multimodal brain imaging data where the graph indicates the underlying brain functional connectivity.

Joint work with Katherine Tsai, Boxin Zhao, Sanmi Koyejo.

Oct 6: Vatsal Sharan (USC)

Sample Amplification: Increasing Dataset Size even when Learning is Impossible

Is learning a distribution always necessary for generating new samples from the distribution? To study this, we introduce the problem of “sample amplification”: given n independent draws from an unknown distribution, D, to what extent is it possible to output a set of m > n datapoints that are indistinguishable from m i.i.d. draws from D? Curiously, we show that nontrivial amplification is often possible in the regime where the number of datapoints n is too small to learn D to any nontrivial accuracy. We prove upper bounds and matching lower bounds on sample amplification for a number of distribution families including discrete distributions, Gaussians, and any continuous exponential family.
This is based on joint work with Brian Axelrod, Yanjun Han, Shivam Garg and Greg Valiant. Most of the talk will be based on this work, but we will also touch on this one.

Oct 13: [No talk] [Fall Recess]

Oct 20: Çağın Ararat (Bilkent)

Path-regularity and martingale properties of set-valued stochastic integrals

We study the path-regularity and martingale properties of set-valued stochastic integrals defined in our previous work A., Ma, Wu (AAP, 2023). Such integrals are fundamentally different from the well-known Aumann-Itô stochastic integrals and more suitable for representing set-valued martingales. However, like the Aumann-Itô integral, the new integral is only a set-valued submartingale in general, and there is very limited knowledge about its path-regularity. We first establish the existence of right- and left-continuous modifications of set-valued submartingales in continuous time and apply these results to set-valued stochastic integrals. We also show that a set-valued stochastic integral yields a martingale if and only if the set of terminal values of the stochastic integrals associated to the integrand is closed and decomposable. As a special case, we study the set-valued martingale in the form of the conditional expectation of a convex set-valued random variable. When this random variable is a convex random polytope, we show that the conditional expectation of a vertex stays as a vertex of the set-valued conditional expectation if and only if the random polytope has a deterministic normal fan. This is a joint work with Jin Ma.

 

Oct 27: Yier Lin (U Chicago)

The atypical growth in a random interface

Random interface growth is all around us: tumors, bacterial colonies, infections, and propagating flame fronts. The KPZ equation is a stochastic PDE central to a class of random growth phenomena. In this talk, I will explain how to combine tools from probability, partial differential equations, and integrable systems to understand the behavior of the KPZ equation when it exhibits unusual growth.This talk is based in part on joint works with Pierre Yves Gaudreau Lamarre and Li-Cheng Tsai

 

Nov 3: Yuhua Zhu (UCSD)

Continuous-in-time Limit for Bandits

In this talk, I will build the connection between Hamilton-Jacobi-Bellman equations and the multi-armed bandit (MAB) problems. MAB is a widely used paradigm for studying the exploration-exploitation trade-off in sequential decision making under uncertainty. This is the first work that establishes this connection in a general setting. I will present an efficient algorithm for solving MAB problems based on this connection and demonstrate its practical applications.

Nov 10: [No talk] [Veterans’ Day]

Nov 17: Peter Kagey (Harvey Mudd) [Undergraduate lecture]

Expected value of letters of permutations with a given number of $k$-cycles

The probability that a uniformly random permutation on $2n$ letters has exactly $\alpha$ transpositions is equal to the probability that a uniformly random isometry of the $n$-cube has exactly $\alpha$ pairs of faces that are fixed. In this talk, I will describe generalizations of this relationship, show how this gives information about the expected value of letters of permutations with a given number of $k$-cycles, and propose several open questions related to families of permutation statistics.

Nov 24: [No talk] [Thanksgiving]

Dec 1: Larry Goldstein (USC)

Classical and Free Zero Bias for Infinite Divisibility

https://dornsife.usc.edu/probability-and-statistics-seminars/wp-content/uploads/sites/244/2023/11/abstract.pdf


Spring 2023:


January 13: Zachary Bezemek (Boston University)

Large Deviations and Importance Sampling for Weakly Interacting Diffusions

We consider an ensemble of N interacting particles modeled by a system of N stochastic differential equations (SDEs). The coefficients of the SDEs are taken to be such that as N approaches infinity, the system undergoes Kac’s propagation of chaos, and hence is well-approximated by the solution to a McKean-Vlasov Equation. Rare but possible deviations of the behavior of the particles from this limit may reflect a catastrophe, and computing the probability of such rare events is of high interest in many applications. In this talk, we design an importance sampling scheme which allows us to numerically compute statistics related to these rare events with high accuracy and efficiency for any N. Standard Monte Carlo methods behave exponentially poorly as N increases for such problems. Our scheme is based on subsolutions of a Hamilton-Jacobi-Bellman (HJB) Equation on Wasserstein Space which arises in the theory of mean-field control. This HJB Equation is seen to be connected to the large deviations rate function for the empirical measure on the ensemble of particles. We identify conditions under which our scheme is provably asymptotically optimal in N in the sense of log-efficiency. We also provide evidence, both analytical and numerical, that with sufficient regularity of the solution to the HJB Equation, our scheme can have vanishingly small relative error as N increases.

February 3: Mei Yin (Denver)

Probabilistic parking functions

We extend the notion of classical parking functions by introducing randomness via a probabilistic parking protocol. We explore the properties of a preference vector given that it is a parking function and discuss the effect (and lack thereof) of the probabilistic parameter p. Of special interest is when p=1/2, where we demonstrate a sharp transition in some parking statistics. We connect our result to other phenomena in combinatorics. Partially based on joint work with Irfan Durmic,  Alex Han, Pamela E. Harris, and Rodrigo Ribeiro.

Time permitting, I will discuss newer thoughts after the paper is written and further directions for research.

February 10: Ying Tan (USC)

A Generalized Kyle-Back Insider Trading Model with Dynamic Information

n this project, we consider a class of generalized Kyle-Back strategic insider trading models in which the insider is able to use the dynamic information obtained by observing the instantaneous movement of an underlying asset that is allowed to be influenced by its market price. Since such a model will be largely outside the Gaussian paradigm, we shall try to Markovize it by introducing an auxiliary (factor) diffusion process, in the spirit of the weighted total order process, as a part of the “pricing rule”. As the main technical tool in solving the Kyle-Back equilibrium in such a setting, we study a class of Stochastic Two-Point Boundary Value Problem (STPBVP), which resembles the dynamic Markov bridge in the literature, but without insisting on its local martingale requirement. In the case when the solution of the STPBVP has an affine structure, we show that the pricing rule functions, whence the Kyle-Back equilibrium, can be determined by the decoupling field of a forward-backward SDE obtained via a non-linear filtering approach, along with a set of compatibility conditions. This is a joint work with Jin Ma.

February 17: Mateo Díaz (Caltech)

Clustering a mixture of Gaussians with unknown covariance

Clustering is a fundamental data scientific task with broad applications. This talk investigates a simple clustering problem with data from a mixture of Gaussians that share a common but unknown and potentially ill-conditioned covariance matrix. We consider Gaussian mixtures with two equally-sized components and derive a Max-Cut integer program based on maximum likelihood estimation. We show its solutions achieve the optimal misclassification rate when the number of samples grows linearly in the dimension, up to a logarithmic factor. However, solving the Max-cut problem appears to be computationally intractable. To overcome this, we develop an efficient iterative algorithm that attains the optimal rate but requires a quadratic sample size. Although this sample complexity is worse than that of the Max-cut problem, we conjecture that no polynomial-time method can perform better. Furthermore, we present numerical and theoretical evidence that supports the existence of a statistical-computational gap.

March 3: Paromita Dubey (USC Marshall)

Geometric EDA for Random Objects

In this talk I will propose new tools for the exploratory data analysis of data objects taking values in a general separable metric space. First, I will introduce depth profiles, where the depth profile of a point ω in the metric space refers to the distribution of the distances between ω and the data objects. I will describe how depth profiles can be harnessed to define transport ranks, which capture the centrality of each element in the metric space with respect to the data cloud. Next, I will discuss the properties of transport ranks and show how they can be an effective device for detecting and visualizing patterns in samples of random objects. Together with practical illustrations I will establish the theoretical guarantees for the estimation of the depth profiles and the transport ranks for a wide class of metric spaces. Finally, I will describe a new two sample test geared towards populations of random objects by utilizing the depth profiles corresponding to the data objects.  I will demonstrate the efficacy of this new approach on distributional data comprising of a sample of age-at-death distributions for various countries, for compositional data through energy usage for the U.S. states and for neuroimaging network data. This talk is based on joint work with Yaqing Chen and Hans-Georg Müller.

March 10: Robi Bhattachargee (UCSD)

Online k-means Clustering on Arbitrary Data Streams

We consider online k-means clustering where each new point is assigned to the nearest cluster center, after which the algorithm may update its centers. The loss incurred is the sum of squared distances from new points to their assigned cluster centers. The goal over a data stream X is to achieve loss that is a constant factor of L(X,OPTk), the best possible loss using k fixed points in hindsight.

We propose a data parameter, Λ(X), such that for any algorithm maintaining O(kpoly(logn)) centers at time n, there exists a data stream X for which a loss of Ω(Λ(X)) is inevitable. We then give a randomized algorithm that achieves clustering loss O(Λ(X)+L(X,OPTk)). Our algorithm uses O(kpoly(logn)) memory and maintains O(kpoly(logn)) cluster centers. Our algorithm also enjoys a running time of O(kpoly(logn)) and is the first algorithm to achieve polynomial space and time complexity in this setting. It also is the first to have provable guarantees without making any assumptions on the input data.

Arxiv link:https://arxiv.org/abs/2102.09101.

March 17: [spring break]

March 24: Yian Ma (UCSD)

MCMC vs. variational inference — the rivalry and synergy between two approaches

I will introduce some recent progress towards understanding the scalability of Markov chain Monte Carlo (MCMC) methods and their comparative advantage with respect to variational inference. I will fact-check the folklore that “variational inference is fast but biased, MCMC is unbiased but slow”. I will follow an optimization perspective on the infinite dimensional probability space, where variational inference projects the probabilities onto a finite dimensional parameter space while MCMC leverages stochastic sample paths. Three ingredients will be the focus of this discussion: accuracy, dimension dependence, and stochasticity. I will then apply the MCMC approach to a sequential decision making problem.

March 31: Masoud Zargar (USC)

Probability on moduli spaces of bundles

I will discuss different probability spaces of moduli spaces of bundles over surfaces, and analytic, representation-theoretic, and probabilistic methods used in analyzing the geometry of random bundles. I will also discuss how these lead to probabilistic equidistribution results for images of geodesics under random unitary surface representations of cusped hyperbolic surfaces. If there is time, I will discuss relevant ideas and open questions in random matrix theory and integration over different moduli spaces of bundles.

April 7: Jiongmin Yong (U Central Florida) [joint with Math Finance]

Turnpike Properties for Stochastic Linear-Quadratic Optimal Control Problems 

For deterministic optimal control problems in very large time horizons (either finite dimensional or infinite dimensional), under proper conditions, the optimal pair will stay near a solution of a proper static optimization problem. Such a phenomenon is called the “turnpike’’ property. The proper static optimization problem usually is the one with the objective function being the running cost rate function and with the constraint being the equilibria of the vector field of the state equation. However, for stochastic problems, mimicking the idea of the deterministic problems will lead to some incorrect conclusions. In this talk, we will look at stochastic linear-quadratic optimal control problem in large duration. We will correctly formulate the proper static optimization problem and establish the turnpike properties of the corresponding stochastic linear-quadratic problem.

April 14: Mikhail Belkin (UCSD)

Why do neural models need so many parameters?

One of the striking aspects of modern  neural networks is their extreme size reaching billions or even trillions parameters.
Why are so many parameters needed? To attempt an answer to this question, I will discuss an algorithm and distribution independent non-asymptotic trade-off between the model size, excess test loss, and training loss for linear predictors. Specifically, we show that models that perform well on the test data (have low excess loss) are either “classical” — have training loss close to the noise level, or are “modern” — have a much larger number of parameters compared to the minimum needed to fit the training data exactly.

Furthermore, I will provide empirical evidence that optimal performance of realistic models is typically achieved in the “modern” regime, when they are trained well below the noise level.

Collaborators: Nikhil Ghosh and Like Hui.

April 21: Andrew Lowy (USC)

Private Stochastic Optimization with Large Worst-Case Lipschitz Parameter: Optimal Rates for (Non-Smooth) Convex Losses and Extension to Non-Convex Losses

We study differentially private (DP) stochastic optimization (SO) with loss functions whose worst-case Lipschitz parameter over all data points may be extremely large. To date, the vast majority of work on DP SO assumes that the loss is uniformly Lipschitz continuous over data (i.e. stochastic gradients are uniformly bounded over all data points). While this assumption is convenient, it often leads to pessimistic excess risk bounds. In many practical problems, the worst-case (uniform) Lipschitz parameter of the loss over all data points may be extremely large due to outliers. In such cases, the error bounds for DP SO, which scale with the worst-case Lipschitz parameter of the loss, are vacuous. To address these limitations, this work provides near-optimal excess risk bounds that do not depend on the uniform Lipschitz parameter of the loss. Building on a recent line of work (Wang et al., 2020; Kamath et al., 2022), we assume that stochastic gradients have bounded 𝑘-th order moments for some 𝑘≥2. Compared with works on uniformly Lipschitz DP SO, our excess risk scales with the 𝑘-th moment bound instead of the worst-case Lipschitz parameter of the loss, allowing for significantly faster rates in the presence of outliers and/or heavy-tailed data. For convex and strongly convex loss functions, we provide the first asymptotically optimal excess risk bounds (up to a logarithmic factor). In contrast to (Wang et al., 2020; Kamath et al., 2022), our bounds do not require the loss function to be differentiable/smooth. We also devise an accelerated algorithm for smooth losses that runs in linear time and has excess risk that is tight in certain practical parameter regimes. Additionally, our work is the first to address non-convex non-uniformly Lipschitz loss functions satisfying the Proximal-PL inequality; this covers some practical machine learning models. Our Proximal-PL algorithm has near-optimal excess risk.

April 28: Lenny Fukshansky (CMC) [on zoom only]

Geometric constructions for sparse integer signal recovery
We investigate the problem of constructing m x d integer matrices with small entries and d large comparing to m so that for all vectors x in Z^d with at most s ≤ m nonzero coordinates the image vector Ax is not 0. Such constructions allow for robust recovery of the original vector x from its image Ax. This problem is motivated by the compressed sensing paradigm and has numerous potential applications ranging from wireless communications to medical imaging. We discuss existence of such matrices for appropriate choices of d as a function of m. In addition to some probabilistic arguments, we demonstrate a deterministic construction of a family of such matrices stemming from a certain geometric covering problem. We also discuss a connection of our constructions to a simple variant of the Tarski plank problem. This talk is based on joint works with B. Sudakov and D. Needell, as well as with A. Hsu.

June 6: Jasper Lee (UW Madison) [in person]

Mean and Location Estimation with Optimal Constants

Mean and location estimation are two of the most fundamental problems in statistics. How do we most accurately estimate the mean of a distribution, given i.i.d. samples? What if we additionally know the shape of the distribution, and we only need to estimate its translation/location?

Both problems have well-established asymptotic theories: use the sample mean for mean estimation and maximum likelihood for location estimation. Yet, asymptotic guarantees can sometimes be misleading in practice: it may take arbitrarily many samples to converge to the (law of large numbers) limit, depending on how bad the underlying distribution is, especially if we want high confidence in our estimate.

In this talk, I will discuss recent progress on developing finite sample and high probability theories for these problems that yield optimal constants. Results include new mean estimators in 1-dimensional and very-high dimensional regimes with error tight up to a 1+o(1) factor, as well as new characterizations of location estimation and improvements over the MLE. I will end by describing some directions for future work in this area.

This talk is based on joint works with Paul Valiant, Shivam Gupta and Eric Price.


Fall 2022:


Aug 26: [No seminar]

Sep 2: [No seminar]

Sep 9: Daniel Kane (UCSD) [zoom]

Ak Testing of Distributions

We introduce the theory of distribution testing with a focus on tests for one-dimensional, structured distributions. In order to produce efficient testers, we introduce the theory of A_k distance and discuss the known results for optimal testers with respect to A_k distance, showing that it produces near-optimal testers for many distribution families.

Sep 16: Ken Alexander (USC) [in-person]

Disjointness of geodesics for first passage percolation in the plane

First passage percolation may be viewed as a way of creating a random metric on the integer lattice.  Random passage times (iid) are assigned to the edges of the lattice, and the distance from $x$ to $y$ is defined to be the smallest sum of passage times among all paths from $x$ to $y$.  We consider geodesics (shortest paths) in this context; specifically, what is the probability that two close–by nearly–parallel geodesics are disjoint?  More precisely, if the geodesics have end-to-end distance $L$ and the endpoints are separated by distance $a$ at one end and $b$ at the other, how does the disjointness probability vary with $L,a,b$?  We answer this question, at least at the level of the proper exponents for $L,a,$ and $b$.

MONDAY Sep 19 [Joint with CAMS]: Nathan Glatt Holtz (Tulane)

A unified framework for Metropolis-Hastings Type Monte Carlo methods.

The efficient generation of random samples is a central task within many of the quantitative sciences. The workhorse of Bayesian statistics and statistical physics, Markov chain Monte Carlo (MCMC) comprises a large class of algorithms for sampling from arbitrarily complex and/or high-dimensional probability distributions. The Metropolis-Hastings method (MH) stands as the seminal MCMC algorithm, and its basic operation underlies most of the MCMC techniques developed to this day.

We provide an all-encompassing, measure theoretic mathematical formalism that describes essentially any Metropolis-Hastings algorithm using three ingredients: a random proposal, an involution on an extended phase space and an accept-reject mechanism. This unified framework illuminates under-appreciated relationships between a variety of known algorithms while yielding a means for deriving new methods.

As an immediate application we identify several novel algorithms including a multiproposal version of the popular preconditioned Crank- Nicolson (pCN) sampler suitable for infinite-dimensional target measures which are absolutely continuous with respect to a Gaussian base measure. We also develop a new class of ‘extended phase space’ methods, based on Hamiltonian mechanics. These methods provide a versatile approach to bypass expensive gradient computations through skillful reduced order modeling and/or data driven approaches. A selection of case studies will be presented that use our multiproposal pCN algorithm (mpCN) to resolve a selection of problems in Bayesian statistical inversion for partial differential equations motivated by fluid flow measurement.

This is joint work with Andrew J. Holbrook (UCLA), Justin Krometis (Virginia Tech) and Cecilia Mondaini (Drexel).

Sep 23: Tom Hutchcroft (Caltech) [in-person]

Critical behaviour in hierarchical percolation

Statistical mechanics models undergoing a phase transition often exhibit rich, fractal-like behaviour at their critical points, which are described in part by critical exponents – the indices governing the power-law growth or decay of various quantities of interest. These exponents are expected to depend on the dimension but not on the microscopic details of the model such as the choice of lattice. After much progress over the last thirty years we now understand two-dimensional and high-dimensional models rather well, but intermediate dimensions such as three remain mysterious. I will discuss these issues in the context of hierarchical percolation, a simplified version of percolation on Z^d for which, in forthcoming work, I establish a fairly good description of the critical behaviour in all dimensions.

THURSDAY Sep 29: Lisa Sauermann (MIT), 12PM NOON [zoom]  [note the NONSTANDARD DAY/TIME]

Anticoncentration in Ramsey graphs and a proof of the Erdös-McKay Conjecture

This talk will discuss recent joint work with Matthew Kwan, Ashwin Sah, and Mehtaab Sawhney, proving an old conjecture of Erdös and McKay (for which Erdős offered $100). This conjecture concerns Ramsey graphs, which are (roughly speaking) graphs without large complete or empty subgraphs. In order to prove the conjecture, we study edge-statistics in Ramsey graphs, i.e. we study the distribution of the number of edges in a random vertex subset of a Ramsey graph. After discussing some background on Ramsey graphs, the talk will explain our results and give an overview of our proof approach.

Sep 30: Simo Ndaoud (ESSEC) [in-person]

Adaptive robustness and sub-Gaussian deviations in sparse linear regression through Pivotal Double SLOPE

In this talk we first review the framework of robust estimation where robustness can be with respect to outliers or heavy-tailed noise. Then, we consider the sparse linear model where some of the observations can be corrupted and the noise heavy-tailed.  After deriving the minimax quadratic risk for estimation of the signal, we propose a practical and fully adaptive procedure that is optimal. Our procedure corresponds to solving a novel penalized pivotal estimation problem. As a result, we develop a new method that is not only minimax optimal and robust but also enjoys sub-Gaussian deviations even in the presence of heavy-tailed noise.
Joint with S. Minsker and L. Wang.

October 7: Guang Cheng (UCLA)  [in-person]

A Statistical Journey through Trustworthy AI

Our lab believes that the next generation of AI is mainly driven by trustworthiness, beyond performance. This talk attempts to offer statistical perspectives to embrace three challenges in trustworthy AI: privacy, robustness and fairness. Specifically, we consider privacy protection by machine un-learning, enhance adversarial robustness by utilizing artificially generated data, and establish fair Bayes-optimal classifiers. These results demonstrate the unique value of statisticians in studying trustworthy AI from empirical, methodological or theoretical aspects.

October 14: [Fall break]

October 21: Wen (Rick) Zhou (Colorado State)  [in-person]

High dimensional clustering via latent semiparametric mixture models.

Cluster analysis is a fundamental task in machine learning. Several clustering algorithms have been extended to handle high-dimensional data by incorporating a sparsity constraint in the estimation of a mixture of Gaussian models. Though it makes some neat theoretical analysis possible, this type of approach is arguably restrictive for many applications. In this work we propose a novel latent variable transformation mixture model for clustering in which we assume that after some unknown monotone transformations the data follows a mixture of Gaussians. Under the assumption that the optimal clustering admits a sparsity structure, we develop a new clustering algorithm named CESME for high-dimensional clustering. The use of unspecified transformation makes the model far more flexible than the classical mixture of Gaussians. On the other hand, the transformation also brings quite a few technical challenges to the model estimation as well as the theoretical analysis of CESME. We offer a comprehensive analysis of CESME including identifiability, initialization, algorithmic convergence, and statistical guarantees on clustering. In addition, the convergence analysis has revealed an interesting algorithmic phase transition for CESME, which has also been noted for the EM algorithm in literature. Leveraging such a transition, a data-adaptive procedure is developed and substantially improves the computational efficiency of CESME. Extensive numerical study and real data analysis show that CESME outperforms the existing high-dimensional clustering algorithms including CHIME, sparse spectral clustering, sparse K-means, sparse convex clustering, and IF-PCA.

Oct 28: Çağın Ararat (Bilkent) [in-person]

Set-valued martingales and backward stochastic differential equations

Motivated by the connection between univariate dynamic risk measures and backward stochastic differential equations, we start building a theory for set-valued backward stochastic differential equations (SV-BSDE). As a first step for this purpose, we formulate a simple SV-BSDE with a compact-valued driver function and study the well-posedness of this SV-BSDE. A key tool in establishing well-posedness is the availability of a stochastic integral representation for set-valued martingales. We prove a new martingale representation theorem which, in contrast to the available literature, allows the initial value of the martingale to be nontrivial. This is a joint work with Jin Ma and Wenqian Wu.

November 4: Qiyang Han (Rutgers University)  [in-person]

Universality of regularized regression estimators in high dimensions

The Convex Gaussian Min-Max Theorem (CGMT) has emerged as a prominent theoretical tool for analyzing the precise stochastic behavior of various statistical estimators in the so-called high dimensional proportional regime, where the sample size and the signal dimension are of the same order. However, a well recognized limitation of the existing CGMT machinery rests in its stringent requirement on the exact Gaussianity of the design matrix, therefore rendering the obtained precise high dimensional asymptotics largely a specific Gaussian theory in various important statistical models. This work provides a structural universality framework for a broad class of regularized regression estimators that is particularly compatible with the CGMT machinery. Here universality means that if a `structure’ is satisfied by the regression estimator $\hat{\mu}_G$ for a standard Gaussian design $G$, then it will also be satisfied by $\hat{\mu}_A$ for a general non-Gaussian design $A$ with independent entries. In particular, we show that with a good enough $\ell_\infty$ bound for the regression estimator $\hat{\mu}_A$, any `structural property’ that can be detected via the CGMT for $\hat{\mu}_G$ also holds for $\hat{\mu}_A$ under a general design $A$ with independent entries. As a proof of concept, we demonstrate our new universality framework in three key examples of regularized regression estimators: the Ridge, Lasso and regularized robust regression estimators, where new universality properties of risk asymptotics and/or distributions of regression estimators and other related quantities are proved. As a major statistical implication of the Lasso universality results, we validate inference procedures using the degrees-of-freedom adjusted debiased Lasso under general design and error distributions. This talk is based on joint work with Yandi Shen (Chicago).

Nov 18: Melih Iseri (USC)

Set Valued HJB Equations

In this talk we introduce a notion of set valued PDEs. The set values have been introduced for many applications, such as time inconsistent stochastic optimization problems, multivariate dynamic risk measures, and nonzero sum games with multiple equilibria. One crucial property they enjoy is the dynamic programming principle (DPP). Together with the set valued Itô formula, which is a key component, DPP induces the PDE. In the context of multivariate optimization problems, we introduce the set valued Hamilton-Jacobi-Bellman equations and established its wellposedness. In the standard scalar case, our set valued PDE reduces back to the standard HJB equation.

Our approach is intrinsically connected to the existing theory of surface evolution equations, where a well-known example is mean curvature flows. Roughly speaking, those equations can be viewed as first order set valued ODEs, and we extend them to second order PDEs. Another difference is that, due to different applications, those equations are forward in time (with initial conditions), while we consider backward equations (with terminal conditions).

November 25: [Thanksgiving break]

TBD

December 2: Uwe Schmock (TU Wien)

Existence and Properties of Optimal Strategies for Distribution-Constrained Optimization Problems

We consider stochastic optimization problems in discrete time under distributional constraints. These problems are motivated by actuarial science, in particular to treat unit-linked insurance products with guarantees. They can also serve as benchmark models to account for customer behaviour, when the treatment as American option is not appropriate. The basic mathematical set-up is an adapted stochastic process (interpreted as pay-outs) and (possibly randomized) stopping times to optimize the expected pay-out. The difference to classical optimal stopping problems is our constraint concerning the distribution of the stopping times or, more generally, the adapted random probability measures. For these distribution-constrained optimization problems we prove the existence of an optimal strategy; the basic assumptions are suitable moment conditions. In special cases, optimal strategies are identified explicitly. Time permitting, also a continuous-time variant will be discussed. (The main part of this talk is based on joint work


Spring 2022:


January 14:

February 4: Allan Sly (Princeton)

Spread of infections in random walkers

We consider a class of interacting particle systems with two types, A and B which perform independent random walks at different speeds. Type A particles turn into type B when they meet another type B particle. This class of systems includes models of infections and multi-particle diffusion limited aggregation. I will describe new methods to understand growth processes in such models.

Joint work with Duncan Dauvergne, Danny Nam and Dor Elboim.

February 11: Thibaut Mastrolia (Berkeley)

Market Making and incentives design in the presence of a dark pool

We consider the issue of a market maker acting at the same time in the lit and dark pools of an exchange. The exchange wishes to establish a suitable fees policy to attract transactions on its venues. We first solve the stochastic control problem of the market maker without the intervention of the exchange. Then we derive the equations defining the optimal contract to be set between the market maker and the exchange. This contract depends on the trading flows generated by the market maker’s activity on the two venues. In both cases, we show existence and uniqueness, in the viscosity sense, of the solutions of the Hamilton-Jacobi-Bellman equations associated with the market maker and exchange’s problems. We finally design deep reinforcement learning algorithms enabling us to efficiently approximate the optimal controls of the market maker and the optimal incentives to be provided by the exchange.

Thursday, February 17: Nike Sun (MIT), 1PM PST

NOTE THE NONSTANDARD DAY AND TIME

Generalized Ising perceptron models

The perceptron is a toy model of a single-layer neural network that “stores” a collection of given patterns. We consider the model on the N-dimensional discrete cube with M=N*alpha patterns, with a general activation function U that is bounded above. For U bounded away from zero, or U a one-sided threshold function, it was shown by Talagrand (2000, 2011) that for small alpha, the free energy of the model converges in the large-N limit to the replica symmetric formula conjectured in the physics literature (Krauth and Mezard, 1989). We give a new proof of this result, which covers the more general class of all functions U that are bounded above and satisfy a certain variance bound.
Based on joint works with Jian Ding, Erwin Bolthausen, Shuta Nakajima, and Changji Xu.

 

Thursday, February 24: Josh Frisch (ENS), 12PM PST

NOTE THE NONSTANDARD DAY AND TIME

Harmonic functions on Linear Groups

The Poisson Boundary of a group is a probabilistic object which serves a dual role. It represents the space of asymptotic trajectories a random walk might take and it represents the possible space of bounded harmonic functions on a group. In this Talk I will give an introduction to the Poisson Boundary with emphasis on the question of when it is trivial or not. I will emphasize how this question depends on properties of the acting group. In particular I will discuss the case of linear groups. This is Joint work with Anna Erschler.

March 4: Shirshendu Ganguly (Berkeley)

Fractal geometry in models of random growth 

In last passage percolation models predicted to lie in the Kardar-Parisi-Zhang (KPZ) universality class, geodesics are oriented paths moving through random noise accruing maximum weight. The weights of such geodesics as their endpoints are varied gives rise to an intricate random energy field expected to converge to a rich universal object known as the Directed Landscape constructed by Dauvergne, Ortmann and Virag.
The Directed Landscape satisfies various scale invariance properties making it a source of intricate random fractal behavior which has been investigated rather intensely since its construction. In this talk, we will report recent progress in this direction and discuss results about the coupling structure of the geodesic weights as the endpoints are varied.
The talk will be based on various works jointly with subsets of R. Basu, E. Bates, A. Hammond, M. Hegde and L. Zhang.

 

March 11: Andrew Ahn (Cornell)

Universality for Lyapunov Exponents of Random Matrix Products

Consider the discrete-time process formed by the singular values of products of random matrices, where time corresponds to the number of matrix factors. It is known due to Oseledets’ theorem that under general assumptions, the Lyapunov exponents converge as the number of matrix factors tend to infinity. In this talk, we consider random matrices with distributional invariance under right multiplication by unitary matrices, which include Ginibre matrices and truncated unitary matrices. The corresponding singular value process is Markovian with additional structure that admits study via integrable probability techniques. In this talk, I will discuss recent results, including joint work with Roger Van Peski, on the Lyapunov exponents in the setting where the number M matrix factors tend to infinity simultaneously with matrix sizes N. When this limit is tuned so that M and N grow on the same order, the limiting Lyapunov exponents can be described in terms of Dyson Brownian motion with a special drift vector, which in turn can be linked to a matrix-valued diffusion on the complex general linear group. We find that this description is universal, under general assumptions on the spectrum of the matrix factors.

March 18 [no talk, spring break]

March 25: Sangchul Lee (USC)

Pollution-sensitivity of bootstrap percolation

We discuss the terminal configuration in polluted bootstrap percolation on an infinite square lattice, which is the cellular automaton started from a random configuration where a vertex is initially declared occupied with probability $p$, polluted with probability $q$ and vacant otherwise, independently of other vertices. The update rule turns a vacant site into occupied whenever it has at least two occupied neighbors; polluted vertices stay polluted forever. Setting $q=\lambda p^2$, we define two thresholds, $\lambda_1$ and $\lambda_2$, such that, in the limit $p\downarrow 0$, the terminal occupation is asymptotically full when $\lambda<\lambda_1$ and asymptotically empty when $\lambda>\lambda_2$. This formally reproduces the main result of McDonald and Gravner; the novelty is that the thresholds are now defined using the continuum percolation model of blocking contours — with $\lambda_1$ being the threshold for exponential decay of connectivities and $\lambda_2$ for the emergence of an infinite connected component.

April 1: Rachel Wang (Sydney)

Global community detection using individual-centered partial networks

In statistical network modeling and inference, we often assume either the full network is available or multiple subgraphs can be sampled to estimate various global properties of the full network. However, in a real social network, people frequently make decisions based on their local view of the network alone. In this paper, we consider a partial information framework that characterizes the local network centered at a given individual by path length (or knowledge depth $L$) and gives rise to a partial adjacency matrix. Under $L=2$, we focus on the problem of (global) community detection using the popular stochastic block model (SBM) and its degree-corrected variant (DCSBM). We derive general properties of the eigenvalues and eigenvectors from the major term of the partial adjacency matrix and propose new spectral-based community detection algorithms for these two types of models, which can achieve almost exact recovery under appropriate conditions. Our settings in the DCSBM also allow us to interpret the efficiency of clustering using neighborhood features of the central node. Using simulated and real networks, we demonstrate the performance of our algorithms in inferring global community memberships using a partial network. In particular, we show that the clustering accuracy indicates different global structure is visible to different individuals.

Thursday, April 7: Ivan Corwin (Columbia), 12PM PST

NOTE THE NONSTANDARD DAY AND TIME

The ASEP speed process

Consider the asymmetric simple exclusion process (ASEP) on Z started with particles at negative integer sites, holes at positive integer sites and a lone second class particle at the origin. We prove that almost surely the second class particle achieves a limiting velocity as time goes to infinity which is uniform on the interval [-1,1]. This enables us to construct the ASEP speed process. Our work positively resolves a conjecture of Amir, Angel and Valko and extends the TASEP result of Mountford and Guiol. We rely on a combination of probabilistic couplings (in particular one due to Rezakhanlou) and effective hydrodynamic results (that we prove using tools from integrable probability). This is joint work with Amol Aggarwal and Promit Ghosal.

Thursday, April 14: Amol Aggarwal (Columbia) 12PM PST

NOTE THE NONSTANDARD DAY AND TIME

Universality Results in Random Lozenge Tilings

The statistical behavior of random tilings of large domains has been an intense topic of mathematical research for decades, partly since they highlight a central phenomenon in physics: local behaviors of highly correlated systems can be very sensitive to boundary conditions. Indeed, a salient feature of random tiling models is that the local densities of tiles can differ considerably in different regions of the domain, depending on the shape of the domain. Thus, a question of interest is how the shape of the domain affects the statistics of a random tiling. A general prediction in this context is that, while the family of possible domains is infinite-dimensional, the limiting statistics should only depend on a few parameters. In this talk, we will explain this universality phenomenon in the context of random lozenge tilings (equivalently, the dimer model on the hexagonal lattice). Various possible limiting local statistics can arise, depending on whether one probes the bulk of the domain; the edge of a facet; or the boundary of the domain. We will describe recent works that verify the universality of these statistics in each of these regimes.

April 22: J. E. Paguyo (USC)

Cycle structure of random parking functions

Parking functions were introduced by Konheim and Weiss in their study of the hash storage structure, and have since found many applications in combinatorics, probability, and computer science. Consider $n$ cars, each with a preferred parking spot in a one-way street with $n$ spots. Each car drives to its preferred spot and parks if possible, and otherwise takes the next available spot if it exists. A sequence of preferences is a parking function if all cars are able to park. In this talk, we discuss various combinatorial and probabilistic properties of random parking functions. In particular, we show that the joint distribution of cycle counts converges to a process of independent Poisson random variables. This partially answers a question of Diaconis and Hicks. Our proof uses a multivariate Stein’s method via exchangeable pairs, which also provides rates of convergence.

April 29: Robert Webber (Caltech)

Randomized low-rank approximation with higher accuracy and reduced costs

Randomized low-rank matrix approximation is one of the great success stories of randomized numerical linear algebra. It is a fast, scalable approach that has been widely adopted in scientific computing. However, the approach is most successful at approximating matrices with quickly decaying singular values since there can be significant random errors when the singular values decay slowly. Our ongoing work improves the accuracy of randomized low-rank matrix approximation for matrices with slow singular value decay. We study an emerging approach called ‘randomized block Krylov iteration’, which builds a rich approximation space from the output of repeated matrix-matrix multiplications. Our contributions are two-fold: first, we derive an efficient implementation of randomized block Krylov iteration that reduces the standard runtime by a half, and second, we establish error bounds that quantify the rapid convergence of randomized block Krylov iteration, demonstrating its advantage over alternative methods.


Fall 2021:


August 27:

September 3: Mehtaab Sawhney (MIT)

Approximate counting and sampling via local central limit theorems

We give deterministic approximate counting algorithms for the number of matchings and independent sets of a given size in bounded degree graphs. For matchings, the algorithm applies to any size bounded away from the maximum by a constant factor; for independent sets the algorithm applies to sizes up to the NP-hardness threshold. Our results are based on exploiting local central limit theorems as an algorithmic tool. For our results for independent sets, we prove a new local central limit theorem for the hard-core model.  (with Vishesh Jain, Will Perkins, and Ashwin Sah)

September 24: Sourav Chatterjee (Stanford)

Local KPZ growth in a class of random surfaces

One of the main difficulties in proving convergence of discrete models of surface growth to the Kardar-Parisi-Zhang (KPZ) equation in dimensions higher than one is that the correct way to take a scaling limit, so that the limit is nontrivial, is not known in a rigorous sense. The same problem has so far prevented the construction of nontrivial solutions of the KPZ equation in dimensions higher than one. To understand KPZ growth without being hindered by this issue, I will introduce a new concept in this talk, called “local KPZ behavior”, which roughly means that the instantaneous growth of the surface at a point decomposes into the sum of a Laplacian term, a gradient squared term, a noise term, and a remainder term that is negligible compared to the other three terms and their sum. The main result is that for a general class of surfaces, which contains the model of directed polymers in a random environment as a special case, local KPZ behavior occurs under arbitrary scaling limits, in any dimension.

October 1: Xinyu Wu (CMU)

Separations between classical and quantum computational complexity through stochastic calculus

The Forrelation problem is: given two random Boolean vectors x and y, decide if x and y are uniformly random or if x is uniformly random and y is the Walsh-Fourier transform of x. Variants of this problem show up as examples of functions hard for some classical complexity classes but easy for the corresponding quantum complexity classes. In this talk, I will present an approach to analyze the complexity of the Forrelation problem using techniques from stochastic calculus. Using this technique, I will give some simplified proofs of recent results by Raz-Tal and Girish-Raz-Zhan.

October 8: Ankur Moitra (MIT)

Rethinking Robustness: From Classification to Contextual Bandits

What does it mean for a learning algorithm to be robust? We could hope to tolerate adversarial corruptions, but this makes many simple learning tasks computationally intractable. We could assume a particular stochastic model for noise, but then our algorithms might be overtuning to this one distribution, which was only meant to be a simplifying assumption.

In this talk we will revisit some classic learning problems, like learning halfspaces, online regression and contextual bandits, in models that blend worst-case and average-case noise. We will give simple twists on old algorithms that allow them to enjoy strong provable robustness guarantees. Finally we explore some intriguing connections between robustness and fairness.

October 15: [no talk]  [Fall break]

October 22: Song Mei (UC Berkeley)

Local convexity of the TAP free energy and AMP convergence for Z2-synchronization

We study mean-field variational Bayesian inference using the TAP approach, for Z2-synchronization as a prototypical example of a high-dimensional Bayesian model. We show that for any signal strength lambda > 1 (the weak-recovery threshold), there exists a unique local minimizer of the TAP free energy functional near the mean of the Bayes posterior law. Furthermore, the TAP free energy in a local neighborhood of this minimizer is strongly convex. Consequently, a natural-gradient/mirror-descent algorithm achieves linear convergence to this minimizer from a local initialization, which may be obtained by a finite number of iterates of Approximate Message Passing (AMP). This provides a rigorous foundation for variational inference in high dimensions via minimization of the TAP free energy. We also analyze the finite-sample convergence of AMP, showing that AMP is asymptotically stable at the TAP minimizer for any lambda > 1, and is linearly convergent to this minimizer from a spectral initialization for sufficiently large lambda. Such a guarantee is stronger than results obtainable by state evolution analyses, which only describe a fixed number of AMP iterations in the infinite-sample limit.

October 29: Deeparnab Chakrabarty (Dartmouth)

A polynomial lower bound on the number of rounds for efficient submodular function minimization

Submodular function minimization (SFM)  is a fundamental discrete optimization problem generalizing problems such as minimum cut problems in graphs and hypergraphs, and the matroid intersection problem. It is well-known and remarkable that a submodular function on an N-element universe can be minimized using only poly(N) evaluation queries. However, all such efficient algorithms have poly(N) rounds of adaptivity. A natural question is whether one can get efficient parallel SFM algorithms with much fewer rounds of adaptivity. Recently, in STOC 2020, Balkanski and Singer proved an Omega(log N/log log N) lower bound on the rounds of adaptivity. However, it left open the possibility of a polylogarithmic depth efficient algorithm; such algorithms are indeed known for efficient (approximate) submodular function *maximization* versions.

In this talk, I will show that submodular function minimization is not friendly to parallelization. In particular, I’ll sketch a proof that any SFM algorithm making at most poly(N) calls, must proceed in at least (roughly) N^{1/3} rounds. This is joint work with Yu Chen and Sanjeev Khanna, both of whom are at the University of Pennsylvania. No prior knowledge of submodular functions will be assumed.

November 5: Gabor Lugosi (ICREA)

Learning the structure of graphical models by covariance queries

The dependence structure of high-dimensional distributions is often modeled by graphical models. The problem of learning the graph underlying such distributions has received a lot of attention in statistics and machine learning. In problems of very high dimension, it is often too costly even to store the sample covariance matrix. We propose a new model in which one can query single entries of the covariance matrix. We propose computationally efficient algorithms for structure recovery in Gaussian graphical models with computational complexity that is quasi-linear in the dimension. We present algorithms that work for trees and, more generally, for graphs of small treewidth. The talk is based on joint  work with Jakub Truszkowski, Vasiliki Velona, and Piotr Zwiernik.

Thursday, November 11, 10AM: Afonso Bandeira (ETH Zurich)

NOTE THE NONSTANDARD DAY AND TIME

Noncommutative Matrix Concentration Inequalities

Matrix Concentration inequalities for the spectrum of random matrices, such as Matrix Bernstein, have played an important role in many areas of pure and applied mathematics. These inequalities are intimately related to the celebrated noncommutative Khintchine inequality of Lust-Piquard and Pisier. In this talk we leverage ideas from Free Probability to remove the dimensional dependence in the noncommutative Khintchine in a range of instances, yielding sharp bounds in many settings of interest. As a byproduct we develop non-assymptotic matrix concentration inequalities that capture non-commutativity (or, to be more precise, “freeness”).
Joint work with March Boedihardjo and Ramon van Handel, more information at arXiv:2108.06312 [math.PR].

November 19: Nina Holden (ETH Zurich/ NYU) [talk via Zoom]

Conformal welding in Liouville quantum gravity: recent results and applications

Liouville quantum gravity (LQG) is a natural model for a random fractal surface with origins in conformal field theory and string theory. A powerful tool in the study of LQG is conformal welding, where multiple LQG surfaces are combined into a single LQG surface. The interfaces between the original LQG surfaces are typically described by variants of the random fractal curves known as Schramm-Loewner evolutions (SLE). We will present a few recent conformal welding results for LQG surfaces and applications to the integrability of SLE and LQG partition functions. Based on joint work with Ang and Sun and with Lehmkuehler.

November 26: [no talk] [Thanksgiving]

December 3: Melih Iseri (USC)

Set Values for Mean Field Games

In this talk we study mean field games with possibly multiple mean field equilibria. Instead of focusing on the individual equilibria, we propose to study the set of values over all possible equilibria, which we call the set value of the mean field game. When the mean field equilibrium is unique, typically under certain monotonicity conditions, our set value reduces to the singleton of the standard value function which solves the master equation. The set value is by nature unique, and we shall establish two crucial properties: (i) the dynamic programming principle, also called time consistency; and (ii) the convergence of the set values of the corresponding N -player games. To our best knowledge, this is the first work in the literature which studies the dynamic value of mean field games without requiring the uniqueness of mean field equilibria. We emphasize that the set value is very sensitive to the choice of the admissible controls. In particular, for the convergence one has to restrict to the same type of equilibria for the N-player game and for the mean field game. We shall illustrate this point by investigating three cases, two in finite state space models and the other in a diffusion model

Previous Talks:


Spring 2021:


January 15: Steven Heilman (USC)

Three Candidate Plurality is Stablest for Small Correlations

Suppose we model n votes in an election between two candidates as n i.i.d. uniform random variables in {-1,1}, so that 1 represents a vote for the first candidate, and -1 represents a vote for the other candidate.  Then, for each vote, we flip a biased coin (with fixed probability larger than 1/2 of landing heads).  If the coin lands tails, the vote is changed to the other candidate.  In an election where each voter has a small influence on the election’s outcome, and each candidate has an equal chance of winning, the majority function best preserves the election’s outcome, when comparing the original election vote to the corrupted vote.  This Majority is Stablest Theorem was proven in 2005 by Mossel-O’Donnell-Oleszkiewicz.  The corresponding statement for elections between three or more voters has remained open since its formulation in 2004 by Khot-Kindler-Mossel-O’Donnell.  We show that Plurality is Stablest for elections with three candidates, when the original and corrupted votes have a sufficiently small correlation.  In fact, this result is a corollary of a more general structure theorem that applies for elections with any number of candidates and any correlation parameter.

(joint with Alex Tarter)

January 22: Jimmy He (Stanford University)

Random walks on finite fields with deterministic jumps

Recently, Chatterjee and Diaconis showed that most bijections, if applied between steps of a Markov chain, cause the resulting chain to mix much faster. However, explicit examples of this speedup phenomenon are rare. I will discuss recent work studying such walks on finite fields where the bijection is algebraically defined, and give a near-linear mixing time. This work gives a large collection of examples where this speedup phenomenon occurs. These walks can be seen as a non-linear analogue of the Chung-Diaconis-Graham process, where the bijection is multiplication by a non-zero element of the finite field.

January 29: Hoi Nguyen (Ohio State University)

Universality results for the cokernels of random integral matrices

For various general random matrix models with integral entries we discuss the probability that the cokernels are isomorphic to a given finite abelian group, or when they are cyclic. We will show that these statistics are asymptotically universal, depending only on the symmetry class of the matrices.

Based on joint works with M. M. Wood.

February 5: Dmitrii Ostrovskii (USC)

Near-Optimal Model Discrimination

In the standard setup of parametric M-estimation with convex loss, we consider two population risk minimizers associated with two possible distributions of a random observation, and then ask the following question:

Given the value of one of the two minimizers and access to i.i.d. sampling from both distributions, what sample sizes allow to identify the “right” distribution, i.e., the one corresponding to the given value?

Making the first steps towards answering this question in full generality, we first consider the case of a well-specified linear model with squared loss. Here we provide nearly matching upper and lower bounds on the sample complexity, showing it to be min{1/Δ^2,\sqrt{r}/Δ} up to a constant factor, where Δ is a measure of separation between the two distributions and r is the maximum (resp., minimum) of the ranks of the design covariance matrices in the case of the upper (resp., lower) bound. This bound is dimension-independent and rank-independent for large enough separation. We then extend this result in two directions: (i) for the general parametric setup in asymptotic regime; (ii) for generalized linear models in the small-sample regime n≤r and under weak moment assumptions. In both cases, we derive sample complexity bounds of a similar form, even under misspecification. Our testing procedures only access the “known” value through a certain functional of empirical risk. In addition, the number of observations that allows to reach statistical confidence for testing does not allow to “resolve” the two models — that is, recover both minimizers up to O(Δ) prediction accuracy. These two properties open the prospect of applying our testing framework in practical tasks, where one would like to identify a prediction model, which can be proprietary, while guaranteeing that the model cannot be actually inferred by the identifying agent.

February 12: Tomasz Tkocz (Carnegie Mellon University)

Khinchin inequalities and sections of convex bodies

I shall discuss some recent results on sharp moment comparison inequalities for weighted sums of i.i.d. random variables, a.k.a. Khinchin inequalities, as well as highlight how this relates to the problem of finding sections of convex bodies of extremal volume.

February 19: Nick Cook (Duke)

Universality for the minimum modulus of random trigonometric polynomials

We consider the restriction to the unit circle of random degree-n polynomials with iid normalized coefficients (Kac polynomials). Recent work of Yakir and Zeitouni shows that for Gaussian coefficients, the minimum modulus (suitably rescaled) follows a limiting exponential distribution. We show this is a universal phenomenon, extending their result to arbitrary sub-Gaussian coefficients, such as Rademacher signs. Our approach relates the joint distribution of small values at several angles to that of a random walk in high-dimensional phase space, for which we obtain strong central limit theorems. For discrete coefficients this requires dealing with arithmetic structure among the angles. Based on joint work with Hoi Nguyen.

February 26: Anderson Ye Zhang (Wharton School of the University of Pennsylvania)

Optimal Ranking Recovery from Pairwise Comparisons 

Ranking from pairwise comparisons is a central problem in a wide range of learning and social contexts. Researchers in various disciplines have made significant methodological and theoretical contributions to it. However, many fundamental statistical properties remain unclear especially for the recovery of ranking structure. This talk presents two recent projects towards optimal ranking recovery, under the Bradley-Terry-Luce (BTL) model.
In the first project, we study the problem of top-k ranking. That is, to optimally identify the set of top-k players. We derive the minimax rate and show that it can be achieved by MLE. On the other hand, we show another popular algorithm, the spectral method, is in general suboptimal.
In the second project, we study the problem of full ranking among all players. The minimax rate exhibits a transition between an exponential rate and a polynomial rate depending on the magnitude of the signal-to-noise ratio of the problem. To the best of our knowledge, this phenomenon is unique to full ranking and has not been seen in any other statistical estimation problem. A divide-and-conquer ranking algorithm is proposed to achieve the minimax rate.

March 5: Paata Ivanisvili (NC State)

Exponential integrability for Gaussian measures

Let X be the standard Gaussian random vector. In this talk I will discuss conditions on a real valued test function f such that the exponential integral E exp(f(X)) is finite. Talagrand showed that if the exponential of the square of the gradient of f divided by 2 is integrable with respect to the standard Gaussian measure then this guarantees  finiteness of E exp(f(X)). In the joint work with Ryan Russell we sharpen this result, and we also provide  quantitative dimension independent bounds relating these two quantities.

March 12: Movie Screening: Secrets of the Surface, The Mathematical Vision of Maryam Mirzakhani [Wellness Day]

March 19: Jianfeng Zhang (USC)

Mean Field Game Master Equations with Non-separable Hamiltonians and Displacement Monotonicity

It is well known that the monotonicity condition is crucial for the global wellposedness of mean field game master equations. In the literature the monotonicity condition is proposed only for separable Hamiltonians. In this talk, we propose a structural condition on non-separable Hamiltonians, in the spirit of the so-called displacement monotonicity condition, and then establish the global wellposedness for such master equations.The talk is based on a joint work with Gangbo, Meszaros, and Mou.

April 9: Arash A. Amini (UCLA)

Adjusted chi-square test for degree-corrected block models

We propose a goodness-of-fit test for degree-corrected stochastic block models (DCSBM). The test is based on an adjusted chi-square statistic for measuring equality of means among groups of $n$ multinomial distributions with $d_1,\dots,d_n$ observations. In the context of network models, the number of multinomials, $n$, grows much faster than the number of observations, $d_i$, corresponding to the degree of node $i$, hence the setting deviates from classical asymptotics. We show that a simple adjustment allows the statistic to converge in distribution, under null, as long as the harmonic mean of $\{d_i\}$ grows to infinity.

When applied sequentially, the test can also be used to determine the number of communities. The test operates on a (row) compressed version of the adjacency matrix, conditional on the degrees, and as a result is highly scalable to large sparse networks. We incorporate a novel idea of compressing the rows based on a $(K+1)$-community assignment when testing for $K$ communities. This approach increases the power in sequential applications without sacrificing computational efficiency, and we prove its consistency in recovering the number of communities. Since the test statistic does not rely on a specific alternative, its utility goes beyond sequential testing and can be used to simultaneously test against a wide range of alternatives outside the DCSBM family.

The test can also be easily applied to Poisson count arrays in clustering or biclustering applications, as well as bipartite and directed networks. We show the effectiveness of the approach by extensive numerical experiments with simulated and real data. In particular, applying the test to the Facebook-100 dataset, a collection of one hundred social networks, we find that a DCSBM with a small number of communities (say $ < 25$) is far from a good fit in almost all cases. Despite the lack of fit, we show that the statistic itself can be used as an effective tool for exploring community structure, allowing us to construct a community profile for each network.

April 16: Ramon van Handel (Princeton University)

Vector concentration inequalities

How do classical concentration inequalities extend to functions taking values in normed spaces? Such questions arise in various settings (in functional analysis, random matrix theory, etc.), and are generally studied on a case by case basis. In the Gaussian case, however, Pisier discovered a powerful principle that explains in complete generality how vector-valued functions concentrate. Unfortunately, Pisier’s argument appears to be very specific to the Gaussian distribution, and until recently this phenomenon was not known to hold in any other setting.

In this talk, I will explain that Pisier’s principle is much more universal than expected: a completely analogous result holds under the classical (Bakry-Emery) conditions used in the theory of scalar concentration inequalities. Moreover, a modified form of this principle holds also in discrete settings (as was first observed, in the special case of the discrete cube, in joint work with Ivanisvili and Volberg that settled an old conjecture of Enflo in Banach space theory). These new inequalities provide very general tools for establishing concentration
properties of vector-valued functions.

April 23: Tselil Schramm (Stanford University)

Computational Barriers to Estimation from Low-Degree Polynomials

One fundamental goal of high-dimensional statistics is to detect and recover structure from noisy data. But even for simple settings (e.g. a planted low-rank matrix perturbed by noise), the computational complexity of estimation is sometimes poorly understood. A growing body of work studies low-degree polynomials as a proxy for computational complexity: it has been demonstrated in various settings that low-degree polynomials of the data can match the statistical performance of the best known polynomial-time algorithms for detection. But prior work has failed to address settings in which there is a “detection-recovery gap” and detection is qualitatively easier than recovery. In this talk, I’ll describe a recent result in which we extend the method of low-degree polynomials to address recovery problems. As applications, we resolve (in the low-degree framework) open problems about the computational complexity of recovery for the planted submatrix and planted dense subgraph problems.
Based on joint work with Alex Wein.

April 30: Mark Rychnovsky (Columbia)   [Wellness Day]

Extreme particles for sticky Brownian motions

Sticky Brownian motions behave as independent Brownian motions when they are separated and interact when they touch, so that the coincidence time has positive Lebesgue measure with positive probability. For a specific sticky interaction we show that as the number of particles, and the time the particles are allowed to evolve are simultaneously sent to infinity, the fluctuations of the extremal particle are given by the GUE Tracy-Widom distribution. The proof involves realizing a specific sticky interaction as the limit of an exactly solvable statistical mechanics model, the Beta random walk in random environment, and using this to derive exact integral formulas.

This is joint work with Guillaume Barraquand.


Fall 2020:


Aug 28: Varun Jog (U. Wisconsin-Madison)

Reverse Euclidean and Gaussian isoperimetric inequalities for parallel sets with applications
The $r$-parallel set of a measurable set $A$ is the set of all points whose distance from $A$ is at most $r$. In this talk, we discuss some recent results that establish upper bounds on the Euclidean and Gaussian surface areas of $r$-parallel sets. We also discuss a reverse form of the Brunn-Minkowski inequality for $r$-parallel sets, and as an aside a reverse entropy power inequality for Gaussian-smoothed random variables. We will conclude by presenting applications of our results to some problems of interest in theoretical machine learning.

Sep 04: Zhou Fan (Yale University)

Likelihood landscape and maximum likelihood estimation for the discrete orbit recovery model
We study the non-convex optimization landscape for maximum likelihood estimation in the discrete orbit recovery model with Gaussian noise. This is a statistical model motivated by applications in molecular microscopy, where each measurement of an unknown object is subject to an independent random rotation from a known rotational group.
We show that fundamental properties of the likelihood landscape depend on the signal-to-noise ratio and the group structure. At low noise, this landscape is “benign” for any discrete group, possessing no spurious local optima and only strict saddle points. At high noise, this landscape may develop spurious local optima, depending on the specific group. We discuss several positive and negative examples. We show that the Fisher information matrix transitions from resembling that of a single Gaussian distribution in low noise to having a graded eigenvalue structure in high noise, which is determined by the graded algebra of invariant polynomials under the group action. In a local neighborhood of the true object, the landscape is strongly convex in a reparametrized system of variables given by a transcendence basis of this polynomial algebra. We discuss implications for optimization algorithms, including slow convergence of expectation-maximization, and possible advantages of momentum-based acceleration and variable reparametrization for first- and second-order descent methods.
This is joint work with Yi Sun, Tianhao Wang, and Yihong Wu.

Sep 11: Katherine Bouman (Caltech)

Capturing the First Image of a Black Hole & Designing the Future of Black Hole Imaging

This talk will present the methods and procedures used to produce the first image of a black hole from the Event Horizon Telescope, as well as discuss future developments for black hole imaging. It had been theorized for decades that a black hole would leave a “shadow” on a background of hot gas. Taking a picture of this black hole shadow would help to address a number of important scientific questions, both on the nature of black holes and the validity of general relativity. Unfortunately, due to its small size, traditional imaging approaches require an Earth-sized radio telescope. In this talk, I discuss techniques the Event Horizon Telescope Collaboration has developed to photograph a black hole using the Event Horizon Telescope, a network of telescopes scattered across the globe. Imaging a black hole’s structure with this computational telescope required us to reconstruct images from sparse measurements, heavily corrupted by atmospheric error. The talk will also discuss future developments, including new imaging techniques and how we are developing machine learning methods to help design future telescope arrays.

Sep 18: Michael Cranston (UC Irvine)

The Riemann zeta distribution

We examine statistical properties of integers when they are sampled using the Riemann zeta distribution and compare to similar properties when sampled according to “uniform” or harmonic distributions.  For example, as the variable in the Riemann zeta function approaches 1, central limit theorem, large and moderate deviations for the distinct number of prime factors for the sampled integer can be readily derived.

Sep 25: Philippe Sosoe (Cornell University)

Central moments of the free energy of the O’Connell-Yor polymer
We estimate the central moments of the stationary semi-discrete polymer in a Brownian environment, also known as the O’Connell-Yor polymer.   From  previous  work  of  Seppalainen  and  Valko,  it  is known that for certain choices of the parameters, the variance growth this governed by the exponent 2/3, characteristic of fluctuations of models in the KPZ class.
We develop formulas based on Gaussian integration by parts to relate higher cumulants of the free energy to expectations of products of quenched  cumulants  of  the  first  jump from  the  boundary  into  the system.   We  then  use  these  formulas  to  obtain  estimates  for  all central moments and the moments of the first jump (corresponding to the transversal fluctuations). The bounds are of nearly optimal order $1/3 + \epsilon$, resp. $2/3+\epsilon$, with $\epsilon$ arbitrary.
I will also discuss an extension of this work to certain discrete polymers in random environment.
Joint work with Christian Noack.

Oct 2: James Lee (University of Washington)

Scaling exponents in stationary random media

A long line of work in anomalous diffusion on fractals and self-similar random media confirms that in the “strongly recurrent” regime, certain spectral and geometric notions satisfy the “Einstein relation” for diffusion.  These are relations between scaling exponents that connect the mean displacement and return probabilities of the random walk to the density and conductivity of the underlying medium.
I will show that, in stationary random environments, these relations hold somewhat more generally.  In particular, we handle the case of spectral dimension = 2, capturing a rich class of random networks that are not strongly recurrent.  This includes many models of random planar networks like the uniform infinite planar triangulation (UIPT).
Our main technical tool is the theory of Markov type and metric embedding theory, combined with a well-chosen change of metric on the underlying graph.

Oct 9: Josh Starmer (StatQuest)

The Elements of StatQuest: How to clearly explain complicated sounding things.

Communicating statistical concepts and results to non-statisticians is hard. Unfortunately, it is also the most important thing that statisticians do. The good news is that there are relatively simple rules that we can follow to improve our ability to communicate. In this seminar, I describe the rules that I created and follow when making videos for my youtube channel, StatQuest, which aims to clearly explain statistics, machine learning and data science topics.

Oct 16: Yosi Rinott (Hebrew University of Jerusalem)

215PM START. NON-STANDARD TIME

Statistical Aspects of Quantum Supremacy Demonstration

Quantum supremacy as presented by Google’s team in 2019, and as it is expected to be in the near future, consists of demonstrating that a quantum circuit is capable of generating bitstrings  from a distribution (which is itself random) that is considered hard to simulate on classical computers. This is called a sampling task (which is impractical for now). Quantum computers perform such a task with a great deal of noise, and  efficient statistical analysis is needed in order is to verify that indeed the right distribution was sampled, and to estimate the noise level or fidelity.  Some interesting statistical questions arise.  (joint with Gil Kalai)

Oct 23: Xin Tong (USC Marshall)

Individual-centered partial information in social networksMost existing statistical network analysis literature assumes a global view of the network, under which community detection, testing, and other statistical procedures are developed. Yet in the real world, people frequently make decisions based on their partial understanding of the network information. As individuals barely know beyond friends’ friends, we assume that an individual of interest knows all paths of length up to L = 2 that originate from her. As a result, this individual’s perceived adjacency matrix B differs significantly from the usual adjacency matrix A based on the global information. The new individual-centered partial information framework sparks an array of interesting endeavors from theory to practice. Key general properties on the eigenvalues and eigenvectors of BE, a major term of B, are derived. These general results, coupled with the classic stochastic block model, lead to a new theory-backed spectral approach to detecting the community memberships based on an anchored individual’s partial information. Real data analysis delivers interesting insights that cannot be obtained from global network analysis.  (joint with Xiao Han)

Oct 30: Andrea Montanari (Stanford University)

Self-induced regularization from linear regression to neural networks 

Modern machine learning methods –most noticeably multi-layer neural networks– require to fit highly non-linear models comprising tens of thousands to millions of parameters. Despite this, little attention is paid to the regularization mechanism to control model’s complexity. Indeed, the resulting models are often so complex as to achieve vanishing training error: they interpolate the data.  Despite this, these models generalize well to unseen data : they have small test error.I will discuss several examples of this phenomenon, beginning with a simple linear regression model, and ending with two-layers neural networks in the so-called lazy regime. For these examples precise asymptotics could be determined mathematically, using tools from random matrix theory. I will try to extract a unifying picture. A common feature is the fact that a complex unregularized nonlinear model becomes essentially equivalent to a simpler model, which is however regularized in a non-trivial way.

[Based on joint papers with: Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, Feng Ruan, Youngtak Sohn, Jun Yan, Yiqiao Zhong]

Nov 6: Xiaohui Chen (University of Illinois at Urbana-Champaign)

A diffusion perspective of manifold clustering

We introduce the diffusion K-means clustering method on Riemannian submanifolds, which maximizes the within-cluster connectedness based on the diffusion distance. The diffusion K-means constructs a random walk on the similarity graph with vertices as data points randomly sampled on the manifolds and edges as similarities given by a kernel that captures the local geometry of manifolds. The diffusion K-means is a multi-scale clustering tool that is suitable for data with non-linear and non-Euclidean geometric features in mixed dimensions. Given the number of clusters, we propose a polynomial-time convex relaxation algorithm via the semidefinite programming (SDP) to solve the diffusion K-means. In addition, we also propose a nuclear norm regularized SDP that is adaptive to the number of clusters. In both cases, we show that exact recovery of the SDPs for diffusion K-means can be achieved under suitable between-cluster separability and within-cluster connectedness of the submanifolds, which together quantify the hardness of the manifold clustering problem. We further propose the localized diffusion K-means by using the local adaptive bandwidth estimated from the nearest neighbors. We show that exact recovery of the localized diffusion K-means is fully adaptive to the local probability density and geometric structures of the underlying submanifolds.
[Based on joint work with Yun Yang (UIUC).]

Nov 13: Sumit Mukherjee (Columbia University)

Motif Counting via Subgraph sampling: A fourth moment phenomenon
Consider the subgraph sampling model, where we observe a random subgraph of a given (possibly non random) large graph $G_n$, by choosing vertices of $G_n$ independently at random with probability $p_n$. In this setting, we study the question of estimating the number of copies $N(H,G_n)$ of a fixed motif/small graph (think of $H$ as edges, two stars, triangles) in the big graph $G_n$. We derive necessary and sufficient conditions for the consistency and the asymptotic normality of a natural Horvitz-Thompson (HT) type estimator.
As it turns out, the asymptotic normality of the HT estimator exhibits an interesting fourth-moment phenomenon, which asserts that the HT estimator (appropriately centered and rescaled) converges in distribution to the standard normal whenever its fourth-moment converges to 3. We apply our results to several natural graph ensembles, such as sparse graphs with bounded degree, Erdős-Renyi random graphs, random regular graphs, and dense graphons.

 


Spring 2020:


January 17: Tianyi Zheng (UCSD)

Invariant random subgroups of groups acting on trees

A group is called permutation stable if almost homomorphisms to the symmetric groups must be close to actual homomorphisms asymptotically. Recently Becker, Lubotzky and Thom showed a characterization of permutation stability for amenable groups in terms of invariant random subgroups (IRSs). We will discuss some examples of amenable groups acting on the rooted tree, including Grigorchuk groups, whose IRSs have special properties. The information on IRSs allows to verify the Becker-Lubotzky-Thom criterion, thus providing a large collection of permutation stable groups.


January 24: Asaf Ferber (UC Irvine)

Resilience of the rank of random matrices

Let M be an n × m matrix of independent Rademacher (±1) random variables. It is well known that if n ≤ m, then M is of full rank with high probability. We show that this property is resilient to adversarial changes to M. More precisely, if m ≥ n+n^{1-ε}, then even after changing the sign of (1 − ε)m/2 entries, M is still of full rank with high probability. Note that this is asymptotically best possible as one can easily make any two rows proportional with at most m/2 changes. Moreover, this theorem gives an asymptotic solution to a slightly weakened version of a conjecture made by Van Vu.

Joint work with Kyle Luh and Gwen McKinley


January 31: Carina Betken (Ruhr University Bochum)

Limit theorems in dynamic random networks via Stein’s method

Many real world phenomena can be modelled by dynamic random networks. We will focus on preferential attachment models where the networks grow in time and new vertices connect to older ones with probability depending on a sublinear function of the degree of the older vertex. Developing Stein’s method for a class of limiting degree distributions provides rates of convergence for the total variation distance as the number of vertices tends to infinity. Stein’s method also yields a central limit theorem for the number of isolated vertices for networks with specific attachment functions.

This is joint work with Hanna Döring and Marcel Ortgiese.


February 7: Aditi Dandapani (Ecole Polytechnique/Columbia University) [Joint with Math Finance Colloquium]

From Quadratic Hawkes to Rough Volatility and Zumbach Effect

Using microscopic price models based on Hawkes processes, it has been shown that under some no-arbitrage condition, the high degree of endogeneity of markets together with the phenomenon of metaorders splitting generate rough Heston-type volatility at the macroscopic scale. One additional important feature of financial dynamics, at the heart of several influential works in econophysics, is the so-called feedback or Zumbach effect. This essentially means that past trends in returns convey significant information on future volatility. A natural way to reproduce this property in microstructure modeling is to use quadratic versions of Hawkes processes. We show that after suitable rescaling, the long term limits of these processes are refined versions of rough Heston models where the volatility coefficient is enhanced compared to the square root characterizing Heston-type dynamics. Furthermore the Zumbach effect remains explicit in these limiting rough volatility models


February 14: Moritz Voss (UCSB)  [Joint with Math Finance Colloquium]

A two-player price impact game

We study the competition of two strategic agents for liquidity in the benchmark portfolio tracking setup of Bank, Soner, Voss (2017), both facing common aggregated temporary and permanent price impact à la Almgren and Chriss (2001). The resulting stochastic linear quadratic differential game with terminal state constraints allows for an explicitly available open-loop Nash equilibrium in feedback form. Our results reveal how the equilibrium strategies of the two players take into account the other agent’s trading targets: either in an exploitative intent or by providing liquidity to the competitor, depending on the ratio between temporary and permanent price impact. As a consequence, different behavioral patterns can emerge as optimal in equilibrium. These insights complement existing studies in the literature on predatory trading models examined in the context of optimal portfolio liquidation problems.

Preprint available on arXiv: arXiv:1911.05122.

February 21: Ken Alexander (USC)

Geodesics, bigeodesics, and coalescence in first passage percolation in general dimension

In first passage percolation, iid passage times are attached to the bonds of the lattice Zd.  The geodesic from x to y is the path which minimizes the sum of passage times; a θ-ray is a path to ∞ with asymptotic direction θ for which every finite segment is a geodesic, and a bigeodesic is an analogous doubly infinite geodesic path. It is known that under mild hypotheses, for each starting point x and direction θ, there is a θ-ray from x. In d = 2 it is a.s. unique, and furthermore, for all x,y the θ-rays from x and y eventually coalesce, and there are no bigeodesics with θ as either asymptotic direction. We show that in general dimension, under somewhat stronger hypotheses, a weak form of coalescence holds: we take all θ-rays starting next to a hyperplane H, translate the hyperplane forward by distance R to give HR , and consider the density of sites in HR where one of the θ-rays first crosses
HR . We show this density approaches 0 as R → ∞, with near-optimal bounds on the rate.  Essentially as a consequence, we show that there are no bigeodesics in any direction.


February 28: Jianfeng Zheng (USC)

Dynamic Set Values for Nonzero Sum Game Problems
Nonzero sum games typically have multiple Nash equilibriums (or no equilibrium), and unlike the zero sum case, they may have different values at different equilibriums. Instead of focusing on the existence of individual equilibriums, we study the set of values over all equilibriums, which we call the set value of the game. The set value is unique by nature and always exists (with possible value $\emptyset$). Similar to the standard value function in control literature, it enjoys many nice properties such as regularity, stability, and more importantly the dynamic programming principle. We shall also discuss the possible extension to mean field games. The talk is based on a joint work with Feinstein and Rudloff as well as an ongoing joint work with Iseri.

March 6: Pierre Baldi (UC Irvine)

Mathematical Foundations of Deep Learning

Deep Learning is at the center of AI and a myriad of applications ranging from medical imaging to games. I will demonstrate some of these applications, briefly review the history of the field, and then discuss the mathematical foundations of deep learning, with an emphasis on recent progress (capacity theory, deep learning channel theory) and open questions.

March 13: [no talk]


March 20: [no talk]

[TBA]


March 25: Yaniv Plan [UBC] [CANCELLED]


April 3: [TBA]


April 10: Wuchen Li (UCLA)

Information Newton’s flow in probability space

We introduce a framework for Newton’s flows in probability space with information metrics, named information Newton’s flows. Here two information metrics are considered, including both the Fisher-Rao metric and the Wasserstein-2 metric. Several examples of information Newton’s flows for learning objective/loss functions are provided, such as Kullback-Leibler (KL) divergence, Maximum mean discrepancy (MMD), and cross-entropy. The asymptotic convergence results of proposed Newton’s methods are provided. A known fact is that overdamped Langevin dynamics corresponds to Wasserstein gradient flow of KL divergence. Extending this fact to Wasserstein Newton’s flow of KL divergence, we derive the Newton’s Langevin dynamics. We provide examples of Newton’s Langevin dynamics in both one-dimensional space and Gaussian families. In numerics, we design a sampling efficient variational method to approximate information Newton’s directions. We restrict the dual variable of the Newton’s direction into a affine function space. Several numerical examples in Gaussian families and Bayesian logistic regression are shown to demonstrate the effectiveness of the proposed method.


April 17: [TBA]


April 24: Qi Feng (USC)

Sub-Riemannian Ricci curvature via generalized Gamma z calculus and functional inequalities.

A drift-diffusion process with non-degenerate diffusion coefficient matrix posses good properties: convergence to equilibrium, entropy dissipation rate, etc. The degenerate drift-diffusion process possess non-positive definite diffusion coefficient matrix, which makes it difficult to govern the convergence property and entropy dissipation rate by drift-diffusion coefficients on its own because of lacking control for the system. In general, the degenerate drift-diffusion is intrinsically equipped with sub-Riemannian structure defined by the diffusion coefficients. We propose a new methodology to systematically study general drift-diffusion process through sub-Riemannian geometry and Wasserstein geometry. We generalize the Bakry-E´mery calculus and Gamma z calculus to define a new notion of sub-Riemannian Ricci curvature tensor. With the new Ricci curvature tensor, we are able to establish generalized curvature dimension bounds on sub-Riemannnian manifolds which goes beyond step two condition. As application, for the first time, we establish analytical bounds for logarithmic Sobolev inequalities for the weighted measure on displacement group and Engel group. Our result also provides entropy dissipation rate for Langevin dynamics with gradient drift and variable temperature matrix. The talk is based on joint works with Wuchen Li.


May 1: Galyna Livshyts (Georgia Tech)

A discussion on the Log-Brunn-Minkowski conjecture and related questions

We shall discuss the Log-Brunn-Minkowski conjecture, a conjectured strengthening of the Brunn-Minkowski inequality proposed by Boroczky, Lutwak, Yang and Zhang. The discussion will involve introduction and explanation of how the local version of the conjecture arises naturally, a collection of ‘’hands on’’ examples and elementary geometric tricks leading to various related partial results, statements of related questions as well as a discussion of more technically involved approaches and results. Based on work with Johannes Hosle and Alexander Kolesnikov, as well as on previous joint results with Colesanti, Marsiglietti, Nayar, Zvavitch.


 Fall 2019:


Sep 6: Sarah Cannon (Department of Mathematical Sciences, Claremont Mckenna College)

Counting independent sets in unbalanced bipartite graphs

We study the hard-core model (independent sets) on bipartite graphs using the cluster expansion from statistical physics. When there is a sufficient imbalance in the degrees or fugacities between the sides (L,R) of the bipartition, we can rewrite the hard-core partition function in terms of deviations from independent sets that are empty on one side of the bipartition and show this expression has a convergent cluster expansion. This has interesting algorithmic and probabilistic consequences. On the algorithmic side, we address an open problem in approximate counting and give a polynomial-time algorithm for approximating the partition function for a large class of bounded degree bipartite graphs; this includes, among others, the unweighted biregular case where the degrees satisfy  d_R >= 7 d_L log d_L. Our approximation algorithm is based on truncating the cluster expansion. As a consequence of the method, we also prove that the hard-core model on such graphs exhibits exponential decay of correlations by utilizing connections between the cluster expansion and joint cumulants.


Sep 13: Joint seminar with CAMS. Speaker: Sunčica Čanić (Department of Mathematics, UC Berkeley and Department of Mathematics, University of Houston)

Weak solutions to fluid-mesh-shell interaction problem

We give an overview of the recent developments in the design of constructive existence proofs for nonlinear moving boundary problems involving 3D incompressible, viscous fluids and various elastic structures. A special attention will be paid to the interaction involving elastic mesh-supported shells. Real life examples of such problems are many, including the interaction between blood flow and vascular walls treated with mesh-like devices called stents. Examples of applications to vascular procedures will be shown.


September 20: Qing Zhou (Department of Statistics, UCLA)

Score-based learning of DAGs in high-dimensions
We analyze a popular score-based DAG estimator that has been the subject of extensive empirical inquiry in recent years and is known to achieve state-of-the-art results. We give explicit finite-sample bounds that are valid in high-dimensional regime. The estimator is based around a difficult nonconvex optimization problem, and its analysis may be of independent interest given recent development in nonconvex optimization in machine learning. Our analysis introduces a novel strategy for reducing the nonconvex program to a polynomial collection of tractable neighborhood regression problems by exploiting a useful lattice property of conditional independence in the Gaussian case. This strategy overcomes the drawbacks of existing approaches, which either fail to guarantee structure consistency in high-dimensions (i.e. learning the correct graph with high probability), or rely on strong assumptions. We will discuss some related methods and algorithms developed recently for structure learning of DAGs.

 


September 27: Samuel B. Hopkins (Department of Electrical Engineering and Computer Sciences, UC Berkeley)

How to Estimate the Mean of a (Heavy-Tailed) Random Vector

We study polynomial time algorithms for estimating the mean of a multivariate random vector under very mild assumptions: we assume only that the random vector X has finite mean and covariance. This allows for X to be heavy-tailed. In this setting, the radius of confidence intervals achieved by the empirical mean are exponentially larger in the case that X is Gaussian or sub-Gaussian. That is, the empirical mean is poorly concentrated.
We offer the first polynomial time algorithm to estimate the mean of X with sub-Gaussian-size confidence intervals under such mild assumptions. That is, our estimators are exponentially better-concentrated than the empirical mean. Our algorithm is based on a new semidefinite programming relaxation of a high-dimensional median. Previous estimators which assumed only existence of finitely-many moments of X either sacrifice sub-Gaussian performance or are only known to be computable via brute-force search procedures requiring time exponential in the dimension.

Based on https://arxiv.org/abs/1809.07425 to appear in Annals of Statistics.


October 4: Elizaveta Rebrova (UCLA, Department of Mathematics)

Sketching for iterative linear solvers: strengths, weaknesses, and tools from random matrix theory.

A group of projection based approaches for solving large-scale linear systems is one of the most efficient in its simplicity. For example, well-known Kaczmarz algorithm iteratively projects the previous approximation x_k onto the solution spaces of the next equation in the system (an elegant proof of the exponential convergence of this method using correct randomization of the process was given in 2009 by Strohmer and Vershynin, and succeeded by many extensions and generalizations in the works of Needell, Tropp, Ward, Srebro, Tan etc). “Sketch-and-project” framework is an interesting unifying view on a number of iterative solvers (introduced by Gower and Richtarik in 2016). One can observe that a random selection of the next equation (or, a subset of equations) can be represented as a sketch, that is, left multiplication by a random vector (or matrix). I will give an overview of some of these methods and discuss the role that random matrix theory plays in proving their convergence. And then talk about our recent work with Deanna Needell on the block Gaussian sketch and project methods.


October 11: Simo Ndaoud (USC, Department of Mathematics)

Scaled minimax optimality in high-dimensional linear regression: A non-convex algorithmic regularization approach

In this talk, we introduce the notion of scaled minimaxity for sparse estimation in the high-dimensional linear regression model. Fixing the scale of the signal-to-noise ratio, we prove that the estimation error can be much smaller than the global minimax error. Taking advantage of the interplay between estimation and support recovery we achieve optimal performance for both problems simultaneously under orthogonal designs. We also construct a new optimal estimator for scaled minimax sparse estimation. Sharp results for the classical minimax risk are recovered as a consequence of our study.
For general designs, we first connect scaled minimax sparse estimation to the problem of de-biaising convex penalized estimators. Then, we introduce a new analysis framework based on non-convex algorithmic regularization where previous sharp results hold. The procedure we present achieves optimal statistical accuracy faster than, for instance, classical algorithms for the Lasso and eliminates the usual bias term whenever it is possible. As a consequence, we present a new iterative algorithm for high-dimensional linear regression that is scaled minimax optimal, fast and adaptive.

Oct 18: [No talk]


October 25: Richard Arratia (USC, Department of Mathematics)

Proof of a 60-year-old conjecture of Sol Golomb, on the longest cycle for a random feedback shift register

In joint work with Rod Canfield and Al Hales, a 1959 conjecture by Sol Golomb has been proved.  In simplest form, writing N=2^n, the length L(n) of the longest cycle, for a random feedback shift register of width n, with all 2^{N/2} feedback logics equally likely, has the same limit distribution for L(n)/N, as it would in the case of a random permutation of the N binary n-tuples with all N! permutations equally likely.   The proof starts with a deterministic study of shotgun versus sequential editing, to avoid n-tuple repeats, and makes heavy use of the process version of Chen-Stein Poisson approximation, from Arratia, Goldstein, and Gordon 1992, applied to the indicators of leftmost (n-1)-tuple repeats in fair coin tossing sequences.  This use of Chen-Stein is not to get a Poisson approximation for the probability of having no such repeats, but rather, in a regime where the expected number of such repeats is huge, to conclude that the process of all indicators of such repeats is close, in total variation distance, to the independent process with the same marginals.  This work is available at https://arxiv.org/abs/1903.09183.  Lots of background info is at https://writings.stephenwolfram.com/2016/05/solomon-golomb-19322016/


Nov 1: Pierre Bellec (Rutgers University)

Degrees-of-freedom in high-dimensional statistics
In classical statistics, degrees-of-freedom adjustments arise naturally in the construction of unbiased estimators; for instance in linear regression with p parameters and p<<<n, the residual sum of  squares divided by (n-p) estimates the noise level without bias. I will present in this talk two examples of degrees-of-freedom adjustments in high dimensions (i.e., when n>>>p), and explain the mathematics that justify applying these degrees-of-freedom adjustments for estimators commonly used to analyze high-dimensional data.

A well-understood degrees-of-freedom adjustment appears in Stein’s Unbiased Risk Estimate (SURE) to construct an unbiased estimate of the mean square risk of almost any estimator μ͂; here the divergence of μ͂ plays the role of degrees-of-freedom or the estimator. Thanks to Stein’s formula, not only unbiased estimates can be constructed for the risk of μ͂, but also for the risk of SURE itself (SURE for SURE): a simple unbiased estimate provides information about the squared distance between SURE and the squared estimation error of μ͂, again involving degrees-of-freedom adjustments.

A second area where degrees-of-freedom appear is the construction of confidence intervals. A novel analysis reveals that degrees-of-freedom adjustments play a major role in de-biasing methodologies to construct confidence intervals in high-dimension. We will see that in sparse linear regression for the Lasso for Gaussian designs, existing de-biasing schemes need to be modified with an adjustment that accounts for the degrees-of-freedom of the Lasso. This degrees-of-freedom adjustment is necessary for statistical efficiency in the regime s>>>n^{2/3}. Here, the necessity of degrees-of-freedom adjustment is explained using the interpolation path between Gaussian vectors typically used to derive Slepian’s lemma.


Nov 8: Sangchul Lee (UCLA, Department of Mathematics)

Exceptional points of discrete-time random walks in planar domains
Given a sequence of lattice approximations DN⊂ ℤ2 of a domain D ⊂ ℝ2 with the wired outer boundary ϱ , we consider discrete-time simple random walks in DN∪{ϱ} that are run for a time proportional to the expected cover time. In this talk, we show that the scaling limit of the exceptional level sets of the local time can be described according to the zero-average Liouville Quantum Gravity measures in D, up to multiplication of a log-normal field. Two main tools used are the Second Ray-Knight Theorem, which describes the elegant connection between the local time and an associated Discrete Gaussian Free Field (DGFF), and a novel factorization concerning the Liouville Quantum Gravity measures arising from exceptional level sets of DGFF. This is joint work in progress with Yoshihiro Abe and Marek Biskup.

November 15: Deanna Needell (UCLA)

Simple approaches to complicated data analysis

Recent advances in technology have led to a monumental increase in large-scale data across many platforms. One mathematical model that has gained a lot of recent attention is the use of sparsity. Sparsity captures the idea that high dimensional signals often contain a very small amount of intrinsic information. Using this notion, one may design efficient low-dimensional representations of large-scale data as well as robust reconstruction methods for those representations. Binary, or one-bit, representations of data for example, arise naturally in many applications, and are appealing in both hardware implementations and algorithm design. In this talk, we provide a brief background to sparsity and 1-bit measurements, and present new results on the problem of data classification with low computation and resource costs. We illustrate the utility of the proposed approach on recently acquired data about Lyme disease.   


November 22: Steven Heilman (USC)

Independent Sets in Random Graphs and Random Trees

An independent set of size k in a finite undirected graph is a set of k vertices of the graph, no two of which are connected by an edge.  The structure of independent sets of size k as k varies is of interest in probability, statistical physics, combinatorics, and computer science.  In 1987, Alavi, Malde, Schwenk and Erdos conjectured that the number of independent sets of size k in a tree is a unimodal sequence (this number goes up and then it goes down), and this problem is still open.  A variation on this question is: what do the number of independent sets of size k look like in an Erdos-Renyi random graph, or random tree, when k varies?  By adapting an argument of Coja-Oghlan and Efthymiou, we show unimodality for Erdos-Renyi random graphs (with high probability as the number of vertices in the graph goes to infinity, when the expected degree of a single vertex is large)  Time permitting, we will discuss a completely different problem: finding the maximum Gaussian perimeter of a convex set in the plane.


Nov 29: [No talk]


Spring 2019:


January 11: Vlad Vicol (Princeton University) [joint with CAMS Colloquium]

Convex integration on thin sets

I will discuss the construction of wild weak solutions to the Navier-Stokes equation which are smooth on the complement of a thin set of times (with Hausdorff dimension strictly less than 1). This is based on joint work with T. Buckmaster and M. Colombo.


January 18: Omer Tamuz (Caltech Mathematics)

The Poisson boundary and the infinite conjugacy class property

Joint work with Joshua Frisch, Yair Hartman and Pooya Vahidi Ferdowsi.

The Poisson boundary of a random walk on a group captures the uncertainty in the walk’s asymptotic behavior. It has long been known that all commutative groups are Choquet-Deny groups: namely, they have trivial Poisson boundaries for every random walk. More generally, it has been shown that all virtually nilpotent groups are Choquet-Deny. I will present a recent result showing that in the class of finitely generated groups, only virtually nilpotent groups are Choquet-Deny. This proves a conjecture of Kaimanovich and Vershik (1983), who suggested that groups of exponential growth are not Choquet-Deny. Our proof does not use the superpolynomial growth property of non-virtually nilpotent groups, but rather that they have have quotients with the infinite conjugacy class property (ICC). Indeed, we show that a countable discrete group is Choquet-Deny if and only if it has no ICC quotients.

January 25: Farruh Shahidi (Penn State)

Thermodynamics via inducing

We consider continuous maps f: X to X on compact metric spaces admitting inducing schemes of hyperbolic type  as well as the induced maps \tilde{f} : \tilde{X}\to\tilde{X} and the associated tower maps \hat{f}:\hat{X}\to\hat{X}. For a certain class of potential functions on X, which includes all Holder continuous functions, we establish thermodynamic formalism for each of the above three systems and we describe some relations between the corresponding equilibrium measures.  Furthermore we study ergodic properties of these equilibrium measures including the Bernoulli property, decay of correlations, and the Central Limit Theorem (CLT).  Finally, we prove analyticity of the pressure function for the three systems.


Feb 1: Jun Yin (UCLA)

Anisotropic law and eigenvector empirical spectral distribution 

We develop a new method for deriving local laws for a large class of random matrices. It is applicable to many matrix models built from sums and products of deterministic or independent random matrices. In particular, it may be used to obtain local laws for matrix ensembles that are anisotropic in the sense that their resolvents are well approximated by deterministic matrices that are not multiples of the identity.  With this tool,  we study the convergence rate of the eigenvector empirical spectral distribution  VESD of sample covariance matrices to the deformed Marchenko-Pastur (MP) distribution. Consider sample covariance matrices of the form Σ1/2XX∗Σ1/2, where X = (xij) is an M × N random matrix whose entries are independent random variables with mean zero and variance N−1, and Σ is a deterministic positive-definite matrix. We prove that the Kolmogorov distance between the expected VESD and the deformed MP distribution is bounded byN−1+ε for any fixed N^{1/2} xij have uniformly bounded 6th moments and |N/M − 1| ≥ τ for some constant τ > 0.
Joint work with Antti Knowles, Haokai Xi, Fan Yang

Feb 8: Georg Menz (UCLA)

Ergodicity of the infinite swapping algorithm at low temperature

About a joint work with André Schlichting, Wenpin Tang, and Tianqi Wu. Sampling Gibbs measures at low temperature is a very important task but computationally very challenging. Numeric evidence suggest that the infinite-swapping algorithm (isa) is a promising method. The isa can be seen as an improvement of replica methods which are very popular. We rigorously analyze the ergodic properties of the isa in the low temperature regime deducing Eyring-Kramers formulas for the spectral gap (or Poincaré constant) and the log- Sobolev constant. Our main result shows that the effective energy barrier can be reduced drastically using the isa compared to the classical over-damped Langevin dynamics. As a corollary we derive a deviation inequality showing that sampling is also improved by an exponential factor. Furthermore, we analyze simulated annealing for the isa and show that isa is again superior to the over-damped Langevin dynamics.


Feb 15: Qi Feng (USC)

Stochastic Analysis on Totally Geodesic Riemannian Foliations

In this talk, we will present stochastic analysis on totally geodesic Riemannian foliations equipped with sub-Riemannian structure. We first show that the horizontal Wiener measure is quasi-invariant under the action of flows generated by suitable tangent processes. On the path space of totally geodesic foliations, we derive various Driver’s IBP(integration by parts) formulas, as well as concentration property derived from the improved Log-Sobolev inequalities on the path space. I will use Heisenberg group as a example to show the idea. In the end, I shall comment on the connections between stochastic analysis and Ricci flow.


Feb 22: Todd Kemp (UCSD)

Eigenvalues of Brownian Motion on GL(N), in the Large-N Limit

   Two of cornerstone models of random matrix theory are the Gaussian Unitary Ensemble and the Ginibre Ensemble; both are matrices with Gaussian entries and (mostly) independent entries.  The analysis of the former began the modern field of random matrix theory with Wigner’s work in the 1950s.  The latter took 10 years longer, due to the far more subtle relationship between entries and eigenvalues of a non-normal matrix.
   From a dynamical viewpoint, these two ensembles can (and should!) be viewed as endpoints of Brownian motions: the former on the space of Hermitian matrices, the latter on the space of all matrices.  Both of these linear spaces happen to be Lie algebras: of the unitary group U(N) and the general linear group GL(N).  The Lie structure is one of the main reasons the eigenvalues have “nice” behavior.
   It is natural to study the corresponding question of the behavior of eigenvalues of Brownian motion on the Lie groups U(N) and GL(N).  Bulk (and some finer) statistics are now well-understood in the unitary case, but the general linear Brownian motion has proved a greater challenge.  In this talk, I will focus on my very recent joint work with Driver and Hall, identifying the large-N limit of the empirical eigenvalue distribution of the Brownian motion on GL(N).
   This talk will feature lots of pictures.

Mar 1: Nicolai Haydn (USC)

Local escape rates in dynamics

We consider a dynamical system with an invariant measure and the probability distribution of the time it takes to exit through a hole. This distribution is typically exponential where the rate depends on the measure of the hole. This rate normalised by the measure of the hole yields in the limit the localised escape rate which, for $\phi$-mixing measures, we show to be equal to 1 at non-periodic points and equal to the extremal index at periodic points. We also give some examples including equilibrium states where we obtain the `Pitskel value’ at periodic points.


Mar 8: Daniel Lacker (Columbia) (joint with Math Finance Colloquium)

Beyond mean field limits: Local dynamics for large sparse networks of interacting processes

We study large systems of stochastic processes (particles), in discrete or continuous time, in which each particle is associated with a vertex in a graph and interacts only with its neighbors. It is known that when the graph is complete and the numbers of particles grows to infinity, the system is well-described by a nonlinear (McKean-Vlasov) process, which describes the behavior of one typical particle. For general (sparse) graphs, however, the system is no longer exchangeable, and the mean field approximation fails. Nevertheless, if the underlying graph is locally tree-like, we show that a single particle and its nearest neighbors are characterized by an autonomous evolution we call the “local dynamics.” These local dynamics are peculiar in that they are inevitably non-Markovian even if the original system is Markovian; the dynamics depend on the conditional law of the solution given the histories of certain particles. Joint work with Kavita Ramanan and Ruoyu Wu.


Mar 15: no seminar (spring break)


Mar 22: Krishnakumar Balasubramanian (UC Davis)

Applications of Stein’s Identity to Non-Convex Optimization and Sampling  
Optimization and sampling are arguably the computational backbones of Frequentist and Bayesian statistics respectively. In this talk the use of Stein’s identity for performing stochastic Zeroth-Order (ZO) non-convex optimization and sampling will be discussed. Specifically, we consider the case when the function (or density) under consideration is not available to us analytically, rather we are able to only obtain noisy evaluations. Using Stein’s identity, techniques for estimating the gradient and Hessian of a function (or density) will be introduced. Based on this, the following algorithms and the corresponding theoretical results will be discussed: (i) ZO stochastic gradient method in high-dimensions (ii) ZO Newton’s method for escaping saddle points (iii) ZO Euler and Ozaki discretized (Kinetic) Langevin Monte Carlo sampling. Some open questions about the proposed ZO Hessian matrix estimator and its connections to random matrix theory will also be discussed. References: https://arxiv.org/pdf/1809.06474.pdf and https://arxiv.org/pdf/1902.01373.pdf

Mar 29: Daniel Kane (UCSD)

Computationally Efficient Robust Statistics

Robust statistics studies what can be learned from data even when some of it may have been corrupted. For example, the median is a robust estimator of the mean of a Gaussian. While a few wrong samples can throw off the mean significantly, their effect on the median is limited. However, when trying to generalize this simple estimator to many dimensions, things become substantially harder. Surprisingly, until recently all known techniques either produced much larger error than information-theoretically attainable, or required estimators that were computationally intractable to find. Only in the last few years were algorithms discovered that overcame both of these difficulties. We discuss these new algorithms and some of the related theory that has developed over the past few years.


Apr 5: Hanbaek Lyu (UCLA)

Phase transition in random contingency tables with non-uniform margins

Contingency tables are matrices with nonnegative integer entries with fixed row and column margins. Understanding the structure of uniformly chosen contingency table with given margins is an important problem especially in statistics. For parameters $n,\delta,B,$ and $C$, let $X=(X_{k\ell})$ be the random uniform contingency table whose first $\lfloor n^{\delta} \rfloor $ rows and columns have margin $\lfloor BCn \rfloor$ and the other $n$ rows and columns have margin $\lfloor Cn \rfloor$. For any $0<\delta<1$, we establish a sharp phase transition of the limiting distribution of each entries of $X$ at the critical value $B_{c}=1+\sqrt{1+1/C}$. One of our main result shows that, for $1/2<\delta<1$, all entries have uniformly bounded expectation for $B<B_{c}$, but the mass concentrates at the smallest block and grows in the order of $n^{1-\delta}$ for $B>B_{c}$. We also establish a strong law of large numbers for the row sums within blocks. Joint work with Igor Pak and Sam Dittmer.


Apr 12: [no speaker]


Apr 19: Bruce Driver (UCSD)

Global Existence of RDEs on Manifolds

In this talk we will discuss a theorem guaranteeing the existence of global (in time) solutions to rough path differential equations (RDEs) on a smooth manifold. These results will depend on quantitative estimates of the quality of the truncated Baker-Cambel-Hausdorff-Dynkin formula for vector fields on the manifold.


Apr 26: Ilias Diakonikolas (USC CS)

The Degree-d Chow Parameters Problem

The Degree-d Chow Parameters of a Boolean function are its degree at most d Fourier coefficients.A degree-d Polynomial Threshold Function (PTF) is a Boolean function of the form f(x) = sgn(p(x)), where p is a degree at most d real polynomial. A classical result, dating back to the 1960s, shows that degree-d PTFs are uniquely specified by their degree-d Chow parameters. The algorithmic problem of reconstructing degree-d PTFs from their degree-d Chow parameters is of fundamental interest in several scientific disciplines, including learning theory, computational complexity, and game theory. In this talk, we will provide an overview of the recent progress on this problem.

Based on joint works with Daniel Kane (UCSD) and Chrystalla Pavlou (Edinburgh).

Fall 2018:


August 31: Johannes Muhle-Karbe (Carnegie Mellon University)

Equilibrium Asset Pricing with Transaction Costs

We study a risk-sharing equilibrium where heterogenous agents trade subject to a quadratic transaction costs. The corresponding equilibrium asset prices and trading strategies are characterized by a system of fully-coupled forward-backward stochastic differential equations. We show that a unique solution exists provided that the agents’ preferences are sufficiently similar.  (Joint work in progress with Martin Herdegen and Dylan Possamai)

September 7: Roman Vershynin (UC Irvine)

From number theory to machine learning: hunting for smooth Boolean functions

The most fundamental kind of functions studied in computer science are Boolean functions. Most Boolean functions oscillate a lot, which is analogous to the fact that “most” continuous functions on R are nowhere differentiable. A “smooth” Boolean function can be generated by taking the sign of some polynomial of low degree in n variables. Such functions are called polynomial threshold functions, and they are widely used in machine learning as classification devices. Surprisingly, we do not know how many low-degree polynomial threshold functions there exist! An approximate answer to this question has been known only for polynomials of degree 1, i.e. for linear functions. In a recent joint work with Pierre Baldi, we found a way to approximately count polynomial threshold functions of any fixed degree. This solves a problem of M. Saks that goes back to 1993 and much earlier. Our argument draws insights from analytical number theory, additive combinatorics, enumerative combinatorics, probability and discrete geometry. I will describe some of these connections, focusing particularly on a beautiful interplay of zeta and Mobius functions in number theory, hyperplane arrangements in enumerative combinatorics and random tensors in probability theory.

September 14: Xin Tong (USC Marshall School of Business)

Neyman-Pearson classification 

In many binary classification applications, such as disease diagnosis and spam detection, practitioners commonly face the need to limit type I error (that is, the conditional probability of misclassifying a class 0 observation as class 1) so that it remains below a desired threshold. To address this need, the Neyman-Pearson (NP) classification paradigm is a natural choice; it minimizes type II error (that is, the conditional probability of misclassifying a class 1 observation as class 0) while enforcing an upper bound, alpha, on the type I error. Although the NP paradigm has a century-long history in hypothesis testing, it has not been well recognized and implemented in classification schemes. Common practices that directly limit the empirical type I error to no more than alpha do not satisfy the type I error control objective because the resulting classifiers are still likely to have type I errors much larger than alpha.  This talk introduces the speaker’s work on NP classification algorithms and their applications and raises current challenges under the NP paradigm.


September 21: Wei-Min Shen (USC Information Sciences Institute)

Self-Reconfigurable “SuperBot” Robots and Surprise-Based Learning (SBL)

Self-reconfigurable robots/UAVs would autonomously adapt their logical or physical configurations and locomotion/manipulation capabilities (such as shape, size, and function) to the environments and tasks in hand. With modularity, versatility, self-healing-ability, and low cost for mass production, such systems could provide powerful solutions for complex tasks in unstructured, uncertain, and dynamic environments. This talk first describes the major challenges for such research, including dynamic network topology, limited individual resource, lack of reliable global communication and synchronization, constraints for centralized control. We then present a particular robotic system called “SuperBot” for self-assembly, self-healing, and self-reconfiguration, as well as coordinated precision locomotion and manipulation. Individual SuperBot units are modeled “robotic stem cells” and use a biologically inspired mechanism call “digital hormones” to physically connect and disconnect with each other and collaborate via content-based messages without any globally unique identifiers. Featured demonstration movies will show their modular designs, distributed control, topology-triggered discovery and behaviors, and their potential applications. A non-parametric machine learning technology called “Surprise-Based Learning (SBL)” has been developed over 30 years to enable individual or global systems to autonomously learn from their own experiences in both deterministic and uncertain environments. Based on logic and probability, SBL could detect and rectify model deficiency or anomalies, build and refine predictive models (such as POMDP), and recover from local failures. We will discuss some details of SBL from theoretical and practical aspects, including developmental psychology, cognitive architecture, system reliability, detectability and learnability of surprises, model convergence, correctness, and scalability, with collected simulation and physical experimental results.


September 28:

Zachary Feinstein (Washington University Saint Louis)

Moving Scalarizations for Time Consistency in Dynamic Multivariate Problems in Finance

In this talk we will consider the time consistency problem for dynamic multivariate risk measures.  Such risk measures arise in markets with transaction costs and, separately, when studying systemic risk.  We will introduce the family of all market-consistent scalar multivariate risk measures.  Duality of such risk measures will be presented.  These results will be used to provide a recursive formulation of multivariate scalar risk measures.  We relate this recursive formulation to a notion of a “moving scalarization”, i.e. where the consistent risk-weights applied to the assets (in markets with transaction costs) or banks (in systemic risk) must change dynamically in time.


October 5: Jianfeng Zhang (USC)

Viscosity Solutions to Parabolic Master Equations 

The master equation is a type of PDE whose state variable involves the distribution of certain underlying state process, and thus is by nature an infinite dimensional problem.  It is a powerful tool for studying limit behavior of large interacting systems, including mean field games and systemic risk. It also appears naturally in stochastic control problems with partial information and in time inconsistent problems. In this talk we propose a novel notion of viscosity solution for parabolic master equations, arising mainly from control problems, and investigate its wellposedness. Our main innovation is to restrict the involved measures to certain set of semimartingale measures which satisfies the desired compactness.


October 12: no seminar


October 19: Matthew Junge (Department of Mathematics, Duke University)

DiffusionLimited Annihilating Systems

We study a two-type annihilating system in which particles are placed with equal density on the integer lattice. Particles perform simple random walk and annihilate when they contact a particle of different type. The occupation probability of the origin was known to exhibit anomalous behavior in low-dimension when particles have equal speeds. Describing the setting with asymmetric speeds has been open for over 20 years. We prove a lower bound that matches physicists’ conjectures and discuss partial progress towards an upper bound. Joint with Michael Damron, Hanbaek Lyu, Tobias Johnson, and David Sivakoff.

October 26: Steven Heilman (USC)

Symmetric Sets with Minimal Gaussian Surface Area

It is well known that a Euclidean set with fixed Gaussian volume and minimal Gaussian surface area is a half-space.  On the other hand, finding the symmetric Euclidean set of fixed Gaussian volume and minimal Gaussian surface area has remained elusive since at least a work of Barthe from 2001.  We show that if this set or its complement is convex and if a certain integral of its curvature is not close to 1, then the set must be a round cylinder. That is, with a small range of exceptions, we resolve the convex case of Barthe’s conjecture. Time permitting, we will discuss applications to theoretical computer science and to finding Euclidean partitions of fixed Gaussian volume and minimal Gaussian surface area.


November 2: Aaditya Ramdas (Department of Statistics and Data Science, Carnegie Mellon University)

Exponential line-crossing inequalities

This paper develops a class of exponential bounds for the probability that a martingale sequence crosses a time-dependent linear threshold. Our key insight is that it is both natural and fruitful to formulate exponential concentration inequalities in this way. We illustrate this point by presenting a single assumption and a single theorem that together strengthen many tail bounds for martingales, including classical inequalities (1960-80) by Bernstein, Bennett, Hoeffding, and Freedman; contemporary inequalities (1980-2000) by Shorack and Wellner, Pinelis, Blackwell, van de Geer, and de la Pena; and several modern inequalities (post-2000) by Khan, Tropp, Bercu and Touati, Delyon, and others. In each of these cases, we give the strongest and most general statements to date, quantifying the time-uniform concentration of scalar, matrix, and Banach-space-valued martingales, under a variety of nonparametric assumptions in discrete and continuous time. In doing so, we bridge the gap between existing line-crossing inequalities, the sequential probability ratio test, the Cramer-Chernoff method, self-normalized processes, and other parts of the literature.

(joint work with Steve Howard, Jas Sekhon and Jon McAuliffe, preprint https://arxiv.org/abs/1808.03204)


November 9: Joe Neeman (Department of Mathematics, UT-Austin)

The Gaussian doublebubble
The Gaussian isoperimetric inequality states that if we want to partition R^n into two sets with prescribed Gaussian measure while minimizing the Gaussian surface area of the interface between the sets, then the optimal partition is obtained by cutting R^n with a hyperplane. We prove an extension to more than two parts. For example, the optimal way to partition R^3 into three parts involves cutting along three rays that meet at 120-degree angles at a common point. This is the Gaussian analogue of the famous Double Bubble Theorem in Euclidean space, which describes the shape that results from blowing two soap bubbles and letting them stick together.

November 16: no seminar (USC mathematical finance conference)


Monday November 19 [Note the nonstandard day]: Cesar Cuenca (Department of Mathematics, MIT)

Interlacing Particle Systems and Special Functions
I will talk about the beta orbital corners process, a natural interpolation (on the dimension of the base field) of orbital measures from random matrix theory. The new result is the convergence in probability of finitely many levels of the orbital corners process to a Gaussian process. Our approach is based on the study of asymptotics of a class of special functions (called the multivariate Bessel functions) via explicit formulas.

November 23: no seminar (Thanksgiving)


Wednesday November 28 [Math colloquium]: Tim Austin (Department of Mathematics, UCLA)

The Weak Pinsker Property

This talk is about the structure theory of measure-preserving systems: transformations of a finite measure space that preserve the measure.  Many important examples arise from stationary processes in probability, and simplest among these are the i.i.d. processes.  In ergodic theory, i.i.d. processes are called Bernoulli shifts.  Some of the main results of ergodic theory concern an invariant of systems called their entropy, which turns out to be intimately related to the existence of maps from a general system to Bernoulli shifts.
I will give an overview of this area and its history, ending with a recent advance in this direction.  A measure-preserving system has the weak Pinsker property if it can be split, in a natural sense, into a direct product of a Bernoulli shift and a system of arbitrarily low entropy.  The recent result is that all ergodic measure-preserving systems have this property.
This talk will assume some familiarity with measure theory and the basic language of random variables and their distributions.  Past exposure to entropy or ergodic theory will not be essential.


Spring 2018:


January 12: Pierre-François Rodriguez (UCLA, Department of Mathematics)

On random walks and local limits in dimension 2

We will discuss recent developments relating Poissonian “loop soups” à la Lawler-Werner, Le Jan and Sznitman, to study random walks on large two-dimensional tori. We will show how the underlying correspondence can be used to effectively describe certain critical phenomena in such systems, which occur when the walk is run up to suitable timescales while forced to avoid a given point.


January 19: Andrew Stuart (Caltech)

Large Graph Limits of Learning Algorithms

Many problems in machine learning require the classification of high dimensional data. One methodology to approach such problems is to construct a graph whose vertices are identified with data points, with edges weighted according to some measure of affinity between the data points. Algorithms such as spectral clustering, probit classification and the Bayesian level set method can all be applied in this setting. The goal of the talk is to describe these algorithms for classification, and analyze them in the limit of large data sets. Doing so leads to interesting problems in the calculus of variations, in stochastic partial differential equations and in Monte Carlo Markov Chain, all of which will be highlighted in the talk. These limiting problems give insight into the structure of the classification problem, and algorithms for it.

The talk is based on collaboration with Matt Dunlop (Caltech), Dejan Slepcev (CMU) and Matt Thorpe (Cambridge).


February 2: Jacob Bien (USC, Marshall School of Business)

Rare Feature Selection in High Dimensions

Many prediction problems include a large number of features that count the frequency of various events. In several common application areas, nearly all of these events are rare, leading to design matrices that are highly sparse. The challenge posed by such “rare features” has received very little attention despite its prevalence. We show, both theoretically and empirically, that not explicitly accounting for the rareness of features can greatly reduce the effectiveness of an analysis. We next propose a framework for aggregating rare features into denser features in a flexible manner that creates better predictors of the response. Our strategy leverages side information in the form of a tree that encodes feature similarity.
We apply our method to data from Trip Advisor, in which we predict the numerical rating of a hotel based on the text of the associated review. Our method achieves high accuracy by making effective use of rare words; by contrast, the lasso is unable to identify highly predictive words if they are too rare.

February 16: Timothy Cannings (USC, Marshall School of Business)

Local nearest neighbour classification with applications to semi-supervised learning.

We derive a new asymptotic expansion for the global excess risk of a local $k$-nearest neighbour classifier, where the choice of $k$ may depend upon the test point.  We prove that, provided the $d$-dimensional marginal distribution of the features has a finite $\rho$th moment for some $\rho > 4$ (as well as other regularity conditions), a local choice of $k$ can yield a rate of convergence of the excess risk of $O(n^{-4/(d+4)})$, where $n$ is the sample size, whereas for the standard $k$-nearest neighbour classifier, our theory would require $d \geq 5$ and $\rho > 4d/(d-4)$ finite moments to achieve this rate.  Our results motivate a new $k$-nearest neighbour classifier for semi-supervised learning problems, where the unlabelled data are used to obtain an estimate of the marginal feature density, and fewer neighbours are used for classification when this density estimate is small.  The potential improvements over the standard $k$-nearest neighbour classifier are illustrated both through our theory and via a simulation study.


February 23: Wen Sun (USC, Marshall School of Business)

A General Framework for Information Pooling in Large-Scale Multiple Testing

This talk discusses a general framework for exploiting the sparsity information in two-sample multiple testing problems. We propose to first construct a covariate sequence, in addition to the usual primary test statistics, to capture the sparsity structure, and then incorporate the auxiliary covariates in inference via a three-step algorithm consisting of grouping, adjusting and pooling (GAP). The GAP procedure provides a simple and effective framework for information pooling. An important advantage of GAP is its capability of handling various dependence structures such as those arise from multiple testing for high-dimensional linear regression, differential correlation analysis, and differential network analysis. We establish general conditions under which GAP is asymptotically valid for false discovery rate control, and show that these conditions are fulfilled in a range of applications. Numerical results demonstrate that existing methods can be much improved by the proposed framework. An application to a breast cancer study for identifying gene-gene interactions will be discussed if time permits.


March 2: Akihiko Nishimura (UCLA)

Discontinuous Hamiltonian Monte Carlo for models with discrete parameters and discontinuous likelihoods

Hamiltonian Monte Carlo (HMC) is a powerful sampling algorithm employed by several probabilistic programming languages. Its fully automatic implementations have made HMC a standard tool for applied Bayesian modeling. While its performance is often superior to alternatives under a wide range of models, one prominent weakness of HMC is its inability to handle discrete parameters. In this talk, I present discontinuous HMC, an extension that can efficiently explore discrete spaces involving ordinal parameters as well as target distributions with discontinuous densities. The proposed algorithm is based on two key ideas: embedding of discrete parameters into a continuous space and simulation of Hamiltonian dynamics on a piecewise smooth density function. The latter idea has been explored under special cases in the literature, but the extensions introduced here are critical in turning the idea into a general and practical sampling algorithm. When properly-tuned, discontinuous HMC is guaranteed to outperform a Metropolis-within-Gibbs algorithm as the two algorithms coincide under a specific (and sub-optimal) implementation of discontinuous HMC. We apply our algorithm to challenging posterior inference problems to demonstrate its wide applicability and competitive performance.
    To make the talk more accessible, I will start my talk with a review of the essential ideas and notions behind HMC. A brief review of Bayesian inference and Markov chain Monte Carlo will also be provided.

March 23: Joseph Salmon (TELECOM ParisTech )

Generalized Concomitant Multi-Task Lasso for sparse multimodal regression

For standard Lasso theory to hold though, the regularization parameter should be proportional to the noise level, which is generally unknown in practice. A remedy is to consider estimators, such as the Concomitant Lasso, which jointly optimize over the regression coefficients and the noise level. However, when data from different sources are pooled to increase sample size, or when dealing with multimodal data, noise levels differ and new dedicated estimators are needed. We provide new statistical and computational solutions to perform heteroscedastic regression, with an emphasis on functional brain imaging with magneto- and electroencephalographic (M/EEG) signals. This joint work with M. Massias, O. Fercoq and A. Gramfort is to appear to AISTAT 2018. Pdf <https://arxiv.org/abs/1705.09778>. Python code <https://github.com/mathurinm/SHCL>.


March 30: Yaming Yu (UC Irvine, Department of Statistics)

Successive sampling, monotonicity, and Bayesian bandits problems

Stochastic orders appear naturally in many problems in statistics and applied probability and can be used to derive useful inequalities. We discuss monotonicity in classical limit theorems, structural results in Bayesian bandit problems, and comparisons between successive sampling and conditional Poisson sampling in sample surveys.


April 6: Leila Setayeshgar (Providence College)

Large Deviations for a Class of Stochastic Semilinear Partial Differential Equations

Standard approaches to large deviations analysis for stochastic partial differential equa- tions (SPDEs) are often based on approximations. These approximations are mostly technical and often onerous to carry out. In 2008, Budhiraja, Dupuis and Maroulas, employed the weak convergence approach and showed that these approximations can be avoided for many infinite dimensional models. Large deviations analysis for such systems instead relied on demonstrating existence, uniqueness and tightness properties of certain perturbations of the original process. In this talk, we use the weak convergence approach, and establish the large deviation principle for the law of the solutions to a class of semilinear SPDEs. Our family of semilinear SPDEs contains, as special cases, both the stochastic Burgers’ equation, and the stochastic reaction-diffusion equation.


April 13: Henry Schellhorn (Claremont Graduate University, Department of Mathematics)

Density formula for functionals of compound Poisson processes using Malliavin calculus

We extend the work of Nourdin and Viens (Electronic Journal of Probability 2009) to obtain a new exact formula for the density of the law of a random variable Z, which is measurable and di§erentiable with respect to a compound Poisson process. The main restriction is that the Levy measure must charge the whole real line.

This talk is work in progress, and not all results have been exploited. For this reason time will probably allow for me to describe another project.


April 20: Guido Montufar (UCLA, Department of Mathematics)

Mixtures and Products in Two Graphical Models

This talk is about two graphical models with hidden variables. One is a mixture model and the other is a product of mixtures called restricted Boltzmann machine. We derive relations between theoretical properties of restricted Boltzmann machines and several natural notions from discrete mathematics and convex geometry. We take a closer look at the first non-trivial case, with three observed binary variables. Although the mixture and the product of mixtures look different from their parametrizations, we show that in this case they represent the same set of distributions on the interior of the probability simplex, and are equal up to closure. We give a semi-algebraic description of this model in terms of six binomial inequalities and obtain closed form expressions for the maximum likelihood estimates.

This talk is based on joint work with Jason Morton and Anna Seigal.


April 27: Leonard Wong (USC, Department of Mathematics)

Information geometry and optimal transport

Information geometry studies spaces of probability distributions using differential geometry. On the other hand, optimal transport is about finding efficient methods of transporting one distribution to another. Both fields apply geometric ideas in probability and have found numerous applications in statistics and machine learning. In this talk we explain some new connections between the two fields. We study a family of logarithmic divergences (distance-like quantities) which generalizes the Bregman divergence (of which the relative entropy is a prime example). These divergences have a dual structure in terms of an optimal transport map, satisfy a generalized Pythagorean theorem, and correspond to natural extensions of exponential family. Geometrically, they characterize (locally) statistical manifolds with constant curvatures.