Probability and Statistics Seminar 2023-2024
NEW [May 2021]: Our YOUTUBE CHANNEL contains videos of some recent talks.
Fridays 3:30pm in KAP 414; tea is usually provided at 3:00pm
NEW [Aug 2023]: To sign up for our MAILING LIST, email: sympa@mymaillists.usc.edu with subject: Subscribe probstat-l Firstname Lastname [besides the subject line, the email itself should be empty]
Organizers: Xiaohui Chen <xiaohuic@usc.edu> Evgeni Dimitrov <edimitro@usc.edu> Steven Heilman <stevenmheilman@gmail.com> Stanislav Minsker <minsker@usc.edu>
For seminars/colloquia on other topics, see the Department of Mathematics webpage.
Older seminars: Fall 2018, Spring 2018, Fall 2017, Spring 2017, Fall 2016, Spring 2016, Fall 2015, Spring 2015, Fall 2014, 2013-2014
Spring 2025:
February 21: Lijun Ding (UCSD)
Fall 2024:
August 30: Yiyun He (UC Irvine)
Differentially Private Algorithms for Synthetic Data
September 6: Omer Tamuz (Caltech)
Asymptotic Renyi Entropies of Random Walks on Groups
We introduce asymptotic Renyi entropies as a parameterized family of invariants for random walks on groups. These invariants interpolate between various well-studied properties of the random walk, including the growth rate of the group, the Shannon entropy, and the spectral radius. They furthermore offer large deviation counterparts of the Shannon-McMillan-Breiman Theorem. We prove some basic properties of asymptotic Renyi entropies that apply to all groups, and discuss their analyticity and positivity for the free group and lamplighter groups.
Joint with Kimberly Golubeva and Minghao Pan
September 20: Weixin Yao (UC Riverside)
October 4: Richard Y. Zhang (UIUC)
Rank Overparameterization and Global Optimality Certification for Large-Scale Low-rank Optimization
Numerous important problems across applied statistics reduce into nonconvex estimation / optimization over a low-rank matrix. In principle, these can be reliably solved to global optimality via convex relaxation, but the computational costs can become prohibitive on a large scale. In practice, it is much more common to optimize over the low-rank matrices directly, as in the Burer-Monteiro approach, but their nonconvexity can cause failure by getting stuck at a spurious local minimum. For safety-critical societal applications, such as the operation and planning of an electricity grid, our inability to reliably achieve global optimality can have significant real-world consequences.
In the first part of this talk, we investigate how overparameterizing the low-rank factorization can render its nonconvexity increasingly benign. In the smooth and strongly-convex setting, we rigorously show that, as the rank is increased, spurious local minima become increasingly rare in a step-wise fashion. In other words, rank-2 has fewer spurious local minima than rank-1, and rank-3 has fewer than rank-2, etc. Once the rank exceeds an O(1) threshold, every remaining local minimum is a global minimum, and every saddle point can be escaped. In the second part of this talk, we use the rank deficiency brought on by rank overparameterization to certify convergence to global optimality after the fact. The certification is an a posteriori guarantee that is valid under much weaker assumptions than typical “no spurious local minima” guarantees. However, rank deficiency significantly slows down the convergence of gradient descent, from a linear rate to a sublinear rate. We propose an inexpensive preconditioner that restores the convergence rate of gradient descent back to linear in the overparameterized case.
Main related papers:
https://arxiv.org/abs/2207.01789
https://arxiv.org/abs/2206.03345 (joint work with Gavin Zhang and Salar Fattahi)
October 11: [no classes]
October 18: Camilo Hernández (USC ISE)
The mean field Schrödinger problem: a mean field control perspective
The mean field Schrödinger problem (MFSP) is the problem of finding the most likely path of a McKean-Vlasov type particle with constrained initial and final configurations. It was first introduced by Backhoff et al. (2020), who studied its existence and long-time behavior. This talk aims to show how ideas from mean field control theory allow us to derive new interesting results on the MFSP. In particular, we study its existence, characterization, and the so-called convergence problem. The method rests upon studying suitably penalized problems and stochastic control techniques. This talk is based on a joint work with Ludovic Tangpi (Princeton).
October 25: Yizhe Zhu (USC)
Non-convex matrix sensing: Breaking the quadratic rank barrier in the sample complexity
For the problem of reconstructing a low-rank matrix from a few linear measurements, two classes of algorithms have been widely studied in the literature: convex approaches based on nuclear norm minimization, and non-convex approaches that use factorized gradient descent. Under certain statistical model assumptions, it is known that nuclear norm minimization recovers the ground truth as soon as the number of samples scales linearly with the number of degrees of freedom of the ground truth. In contrast, while non-convex approaches are computationally less expensive, existing recovery guarantees assume that the number of samples scales at least quadratically with the rank. In this talk, we consider the problem of reconstructing a positive semidefinite matrix from a few Gaussian measurements. We improve the previous rank-dependence in the sample complexity of non-convex matrix factorization from quadratic to linear. Our proof relies on a probabilistic decoupling argument, where we show that the gradient descent iterates are only weakly dependent on the individual entries of the measurement matrices. Joint work with Dominik Stöger (KU Eichstätt-Ingolstadt).
November 1: Wojciech Ozanski (Florida State University)
[Joint with PDE seminar]
Instantaneous continuous loss of regularity for the SQG equation
November 8: Johannes Wiesel (CMU)
[Joint with Mathematical Finance seminar]
Bounding adapted Wasserstein metrics
The Wasserstein distance $\mathcal{W}_p$ is an important instance of an optimal transport cost. Its numerous mathematical properties as well as applications to various fields such as mathematical finance and statistics have been well studied in recent years. The adapted Wasserstein distance $\mathcal{A}\mathcal{W}_p$ extends this theory to laws of discrete time stochastic processes in their natural filtrations, making it particularly well suited for analyzing time-dependent stochastic optimization problems.
While the topological differences between $\mathcal{A}\mathcal{W}_p$ and $\mathcal{W}_p$ are well understood, their differences as metrics remain largely unexplored beyond the trivial bound $\mathcal{W}_p\lesssim \mathcal{A}\mathcal{W}_p$. This paper closes this gap by providing upper bounds of $\mathcal{A}\mathcal{W}_p$ in terms of $\mathcal{W}_p$ through investigation of the smooth adapted Wasserstein distance. Our upper bounds are explicit and are given by a sum of $\mathcal{W}_p$, Eder’s modulus of continuity and a term characterizing the tail behavior of measures. As a consequence, upper bounds on $\mathcal{W}_p$ automatically hold for $\mathcal{AW}_p$ under mild regularity assumptions on the measures considered. A particular instance of our findings is the inequality $\mathcal{A}\mathcal{W}_1\le C\sqrt{\mathcal{W}_1}$ on the set of measures that have Lipschitz kernels.
Our work also reveals how smoothing of measures affects the adapted weak topology. In fact, we find that the topology induced by the smooth adapted Wasserstein distance exhibits a non-trivial interpolation property, which we characterize explicitly: it lies in between the adapted weak topology and the weak topology, and the inclusion is governed by the decay of the smoothing parameter.
This talk is based on joint work with Jose Blanchet, Martin Larsson and Jonghwa Park.
November 15: Greta Panova (USC)
Algebra meets probability: permutons from pipe dreams via integrable probability
November 22: Gil Kur (ETH Zürich)
Note the non-standard time and location: 1pm in KAP 265
Minimum Norm Interpolation Meets The Local Theory of Banach Spaces
Minimum-norm interpolators have recently gained attention as an analyzable model to shed light on the double descent phenomenon observed for neural networks. Most of the work has focused on analyzing interpolators in Hilbert spaces, where, typically, an effectively low-rank structure of the feature covariance prevents a large bias. This work takes a first step towards establishing a general framework that connects generalization properties of the interpolators to well-known concepts from high-dimensional geometry, specifically, from the local theory of Banach spaces.
November 29: [no classes]
December 6: Elliot Paquette (McGill)
Spring 2024:
[CANCELLED] January 12: Roman Vershynin (UC Irvine) [Undergraduate lecture]
January 19: Grigory Franguridi, 2PM, (USC Center for Economic and Social Research)
Estimation of panels with attrition and refreshment samples
January 26: Mahdi Soltanolkotabi (USC)
Foundations for feature learning via gradient descent
One of the key mysteries in modern learning is that a variety of models such as deep neural networks when trained via (stochastic) gradient descent can extract useful features and learn high quality representations directly from data simultaneously with fitting the labels. This feature learning capability is also at the forefront of the recent success of a variety of contemporary paradigms such as transformer architectures, self-supervised and transfer learning. Despite a flurry of exciting activity over the past few years, existing theoretical results are often too crude and/or pessimistic to explain feature/representation learning in practical regimes of operation or serve as a guiding principle for practitioners. Indeed, existing literature often requires unrealistic hyperparameter choices (e.g. very small step sizes, large initialization or wide models). In this talk I will focus on demystifying this feature/representation learning phenomena for a variety of problems spanning single index models, low-rank factorization, matrix reconstruction, and neural networks. Our results are based on an intriguing spectral bias phenomena for gradient descent, that puts the iterations on a particular trajectory towards solutions that are not only globally optimal but also generalize well by simultaneously finding good features/representations of the data while fitting to the labels. The proofs combine ideas from high-dimensional probability/statistics, optimization and nonlinear control to develop a precise analysis of model generalization along the trajectory of gradient descent. Time permitting, I will explain the implications of these theoretical results for more contemporary use cases including transfer learning, self-attention, prompt-tuning via transformers and simple self-supervised learning settings.
February 2: Timothy Armstrong (USC)
Robust Estimation and Inference in Panels with Interactive Fixed Effects
We consider estimation and inference for a regression coefficient in panels with interactive fixed effects (i.e., with a factor structure). We show that previously developed estimators and confidence intervals (CIs) might be heavily biased and size-distorted when some of the factors are weak. We propose estimators with improved rates of convergence and bias-aware CIs that are uniformly valid regardless of whether the factors are strong or not. Our approach applies the theory of minimax linear estimation to form a debiased estimate using a nuclear norm bound on the error of an initial estimate of the interactive fixed effects. We use the obtained estimate to construct a bias-aware CI taking into account the remaining bias due to weak factors. In Monte Carlo experiments, we find a substantial improvement over conventional approaches when factors are weak, with little cost to estimation error when factors are strong.
Paper link: Timothy B. Armstrong, Martin Weidner, Andrei Zeleneev. https://arxiv.org/abs/2210.06639
February 9: Yuehao Bai (USC)
On the Efficiency of Finely Stratified Experiments
This paper studies the efficient estimation of a large class of treatment effect parameters that arise in the analysis of experiments. Here, efficiency is understood to be with respect to a broad class of treatment assignment schemes for which the marginal probability that any unit is assigned to treatment equals a pre-specified value, e.g., one half. Importantly, we do not require that treatment status is assigned in an i.i.d. fashion, thereby accommodating complicated treatment assignment schemes that are used in practice, such as stratified block randomization and matched pairs. The class of parameters considered are those that can be expressed as the solution to a restriction on the expectation of a known function of the observed data, including possibly the pre-specified value for the marginal probability of treatment assignment. We show that this class of parameters includes, among other things, average treatment effects, quantile treatment effects, local average treatment effects as well as the counterparts to these quantities in experiments in which the unit is itself a cluster. In this setting, we establish two results. First, we derive a lower bound on the asymptotic variance of estimators of the parameter of interest in the form of a convolution theorem. Second, we show that the na ̈ıve method of moments estimator achieves this bound on the asymptotic variance quite generally if treatment is assigned using a “finely stratified” design. By a “finely stratified” design, we mean experiments in which units are divided into groups of a fixed size and a proportion within each group is assigned to treatment uniformly at random so that it respects the restriction on the marginal probability of treatment assignment. In this sense, “finely stratified” experiments lead to efficient estimators of treatment effect parameters “by design” rather than through ex post covariate adjustment.
Paper link: Yuehao Bai, Jizhou Liu, Azeem M. Shaikh, Max Tabord-Meehan
https://arxiv.org/abs/2307.15181
February 23: Tryphon Georgiou (UC Irvine)
Stochastic Control meets Non-equilibrium Thermodynamics: Fundamental limits of power generation in thermodynamic engines
Thermodynamics was born in the 19th century in quest of a way to quantify efficiency of steam engines at the dawn of the industrial age. In the time since, thermodynamics has impacted virtually all other areas in science, from chemistry and biology to the physics of black holes, and yet, progress beyond the classical quasi-static limit towards finite-time thermodynamic transitions has been slow; finite-time is of essence for non-vanishing power generation. In recent years a deeper understanding of non-equilibrium processes has been achieved based on stochastic models with degrees of freedom (state variables) that are subject to Brownian excitation that models heat baths. Within this framework we will explain energy transduction, we will give insights on how anisotropy in thermal or chemical potentials can be tapped for power generation in engineered and physical processes, and we will highlight fundamental bounds on the amount of power that can drawn during finite-time thermodynamic transitions.
The talk is based on joint works with Rui Fu (UCI), Olga Movilla (UCI), Amir Taghvaei (UCI) and Yongxin Chen (GaTech). Research funding by AFOSR, ARO and NSF is gratefully acknowledged.
Monday February 26, 3:30PM, KAP 427: Yongtao Guan (Chinese University of Hong Kong, Shenzhen)
Group Network Hawkes Process
In this work, we study the event occurrences of individuals interacting in a network. To characterize the dynamic interactions among the individuals, we propose a group network Hawkes process (GNHP) model whose network structure is observed and fixed. In particular, we introduce a latent group structure among individuals to account for the heterogeneous user-specific characteristics. A maximum likelihood approach is proposed to simultaneously cluster individuals in the network and estimate model parameters. A fast EM algorithm is subsequently developed by utilizing the branching representation of the proposed GNHP model. Theoretical properties of the resulting estimators of group memberships and model parameters are investigated under both settings when the number of latent groups G is over-specified or correctly specified. A data-driven criterion that can consistently identify the true G under mild conditions is derived. Extensive simulation studies and an application to a data set collected from Sina Weibo are used to illustrate the effectiveness of the proposed methodology.
March 1: Morris Yau (MIT)
Are Neural Networks Optimal Approximation Algorithms
In this talk, we discuss the power of neural networks to compute solutions to NP-hard optimization problems focusing on the class of constraint satisfaction problems (boolean sat, Sudoku, etc.). We find there is a graph neural network architecture (OptGNN) that captures the optimal approximation algorithm for constrainst satisfaction, up to complexity theoretic assumptions via tools in semidefinite programming. Furthermore, OptGNN can act as a convex program solver and hence output a dual (a bound) on the optimality of a combinatorial problem. Evaluating OptGNN on benchmarks in the neural combinatorial optimization literature, we find our approach is competitive with the state of the art unsupervised neural baselines. We discuss further connections between neural networks and computation, and point to directions for future work.
March 8: Annie Qu (UC Irvine)
A Model-Agnostic Graph Neural Network for Integrating Local and Global Information
Graph neural networks (GNNs) have achieved promising performance in a variety of graph focused tasks. Despite their success, the two major limitations of existing GNNs are the capability of learning various-order representations and providing interpretability of such deep learning-based black-box models. To tackle these issues, we propose a novel Model-agnostic Graph Neural Network (MaGNet) framework. The proposed framework is able to extract knowledge from high-order neighbors, sequentially integrates information of various orders, and offers explanations for the learned model by identifying influential compact graph structures. In particular, MaGNet consists of two components: an estimation model for the latent representation of complex relationships under graph topology, and an interpretation model that identifies influential nodes, edges, and important node features. Theoretically, we establish the generalization error bound for MaGNet via empirical Rademacher complexity and showcase its power to represent the layer-wise neighborhood mixing. We conduct comprehensive numerical studies using both simulated data and a real-world case study on investigating the neural mechanisms of the rat hippocampus,
demonstrating that the performance of MaGNet is competitive with state-of-the-art methods.
March 15: no talk [spring break]
March 22: Xiaowu Dai (UCLA)
Kernel ordinary differential equations
The ordinary differential equation (ODE) is widely used in modelling biological and physical processes in science. A new reproducing kernelbased approach is proposed for the estimation and inference of ODE given noisy observations. The functional forms in ODE are not assumed to be known or restricted to be linear or additive, and pairwise interactions are allowed. Sparse estimation is performed to select individual functionals and construct confidence intervals for the estimated signal trajectories. The estimation optimality and selection consistency of kernel ODE are established under both the low-dimensional and high-dimensional settings, where the number of unknown functionals can be smaller or larger than the sample size. The proposal tackles several important problems that are not yet fully addressed in smoothing spline analysis of variance (SS-ANOVA) framework, and extends the existing methods of dynamic causal modeling.
March 29: Kengo Kato (Cornell)
Entropic optimal transport: limit theorems and algorithms
In this talk, I will discuss my recent work on entropic optimal transport (EOT). In the first part, I will discuss limit theorems for EOT maps, dual potentials, and the Sinkhorn divergence. The key technical tool we use is a first and second-order Hadamard differentiability analysis of EOT potentials with respect to the marginals, from which the limit theorems, bootstrap consistency, and asymptotic efficiency of the empirical estimators follow. The second part concerns the entropic Gromov-Wasserstein (EGW) distance, which serves as a computationally efficient proxy for the Gromov-Wasserstein distance. By leveraging a variational representation that ties the EGW problem with a series of EOT problems, we derive stability results of EGW with respect to the auxiliary matrix, which enables us to develop efficient algorithms for solving the EGW problem. This talk is based on joint work with Ziv Goldfeld, Gabriel Rioux, and Ritwik Sadhu.
April 5: Weixuan Xia (USC)
Set-Valued Stochastic Integrals and Convoluted Lévy Processes
In this talk, I will discuss set-valued Volterra-type stochastic integrals driven by Lévy processes. I will explain the definition of set-valued convoluted stochastic integrals by taking the closed decomposable hull of integral functionals over time, thereby extending classical definitions to convoluted integrals with square-integrable kernels. Two key insights include: (1) Aside from established results for set-valued Itô integrals, while set-valued stochastic integrals with respect to a finite-variation Poisson random measure are guaranteed to be integrally bounded for bounded integrands, this is not true when the random measure represents infinite variation; (2) It is a mutual effect of kernel singularity and jumps that the set-valued convoluted integrals are possibly explosive and can take extended vector values. These findings carry significant implications for the construction of set-valued fractional dynamical systems. Additionally, I will explore two classes of set-monotone processes for practical interests in economic and financial modeling.
April 12: Yosi Rinott (The Hebrew University of Jerusalem)
On the behavior of posterior probabilities with additional data: monotonicity and nonmonotonicity, asymptotic rates, log-concavity, and Turán’s inequality
Given a parametric model, a prior, and data, Bayesian statisticians quantify their belief that the true parameter is ϑ0 by its posterior probability. The starting question of this paper is whether the posterior at ϑ0 increases when the data are generated under ϑ0, and how it behaves when the data come from ϑ ≠ ϑ0. Can it decrease and then increase, and thus additional data may mislead Bayesian statisticians?
For data arriving sequentially, we consider monotonicity properties of the posterior probabilities as a function of the sample size with respect to certain stochastic orders, specifically starting with likelihood ratio dominance.
When the data is generated by ϑ ≠ ϑ0 , Doob’s consistency theorem says that the posterior at ϑ0converges a.s. to zero and therefore its expectation converges to zero. We obtain precise asymptotic rates of the latter convergence for observations from an exponential family and show that the expectation of the ϑ0 -posterior under ϑ ≠ ϑ0 is eventually strictly decreasing. Finally, we show that in a number of interesting cases this expectation is a log-concave function of the sample size, and thus unimodal. In the Bernoulli case we obtain this result by developing an inequality that is related to Turán’s inequality for Legendre polynomials.
The talk is based on a joint work with Sergiu Hart.
April 19: Andrew Holbrook (UCLA, Department of Biostatistics)
Quantum Markov chain Monte Carlo(s)
I discuss how one can use quantum circuits to accelerate multiproposal MCMC and point to promising avenues of future research, including quantum HMC, quantum-accelerated nonreversible MCMC and quantum-accelerated locally-balanced MCMC.
April 26: Sui Tang (UCSB)
Learning interaction kernels in particle and agent based systems
Interacting particle systems showcase a variety of collective behaviors and are fundamental to many scientific and engineering domains, such as the flocking of birds and the milling of fish. These systems are typically modeled using differential equations to elucidate how individual behaviors drive collective dynamics, an essential inquiry across multiple disciplines. Although recent theoretical and numerical studies have successfully replicated many qualitative collective patterns seen in nature, there is still a notable deficiency in quantitatively matching these models with empirical data.
We explore the data-driven discovery of latent interaction kernels from observed trajectory data in particle and agent-based systems. We discuss recent findings in stochastic systems where interaction kernels are derived from pairwise distances and introduce a nonparametric inference strategy using a regularized maximum likelihood estimator for precise kernel estimation. We show this approach can achieve near-optimal convergence rates, regardless of the state space dimensionality when dealing with multiple trajectory data. Additionally, we conduct error analysis related to discrete-time observations and validate our methodology through numerical experiments on models such as stochastic opinion dynamics and the Lennard-Jones potential. Moreover, we also consider microscopic models and advance our approach to estimate nonlocal interaction potentials in aggregation-diffusion equations from noisy data, using sparsity-promoting techniques. This research is conducted in collaboration with Fei Lu, Mauro Maggioni, Jose A. Carrillo, Gissell Estrada-Rodriguez, and Laszlo Mikolas.
Fall 2023:
Wednesday Aug 23 2PM, KAP 245: Wei Biao Wu (U Chicago)
Fast Algorithms for Estimating Covariance Matrices of Stochastic Gradient Descent Solutions
Stochastic gradient descent (SGD), an important optimization method in machine learning, is widely used for parameter estimation especially in online setting where data comes in stream. While this recursive algorithm is popular for the computation and memory efficiency, it suffers from randomness of the solutions. In this talk we shall estimate the asymptotic covariance matrices of the averaged SGD iterates (ASGD) in a fully online fashion. Based on the recursive estimator and classic asymptotic normality results of ASGD, we can conduct online statistical inference of SGD estimators and construct asymptotically valid confidence intervals for model parameters. The algorithm for the recursive estimator is efficient and only uses SGD iterates: upon receiving new observations, we update the confidence intervals at the same time as updating the ASGD solutions without extra computational or memory cost. This approach fits in online setting even if the total number of data is unknown and takes the full advantage of SGD: computation and memory efficiency. This work is joint with Wanrong Zhu and Xi Chen.
Wednesday Aug 23 330PM, KAP 245: Danna Zhang (UCSD)
Test of Independence Based on Generalized Distance Correlation
We study the fundamental statistical inference concerning the testing of independence between two random vectors. Existing asymptotic theories for test statistics based on distance covariance can only apply to either low dimensional or high dimensional settings. In this work we develop a novel unified distributional theory of the sample generalized distance covariance that works for random vectors of arbitrary dimensions. In particular, a non-asymptotic error bound on its Gaussian approximation is derived. Under fairly mild moment conditions, the asymptotic null distribution of the sample generalized distance covariance is shown to be distributed as a linear combination of independent and identically distributed chi-squared random variables. We also show high dimensionality is necessary for the null distribution to be asymptotically normal. To estimate the asymptotic null distribution practically, we propose an innovative Half-Permutation procedure and provide the theoretical justification for its validity. The exact asymptotic distribution of the resampling distribution is derived under general marginal moment conditions and the proposed procedure is shown to be asymptotically equivalent to the oracle procedure with known marginal distributions.
Sep 1: Omer Tamuz (Caltech)
On the origin of the Boltzmann distribution
The Boltzmann distribution is used in statistical mechanics to describe the distribution of states in systems with a given temperature. We give a novel characterization of this distribution as the unique one satisfying independence for uncoupled systems. The theorem boils down to a statement about symmetries of the convolution semigroup of finitely supported probability measures on the natural numbers, or, alternatively, about symmetries of the multiplicative semigroup of polynomials with non-negative coefficients. (Joint with Fedor Sandomirskiy.)
Sep 8: Xiaohui Chen (USC)
Statistical optimality and computational guarantee for K-means clustering
K-means clustering a widely used machine learning method to group data points based on their similarity. Solving the K-means problem is computationally hard and various tractable (or even scalable) approximation algorithms (Lloyd, spectral, nonnegative matrix factorization, semidefinite programming) have been explored in theory and practice. In this talk, I will discuss some recent developments from my research group on the statistical optimality and computational guarantee for relaxed formulations of K-means. To break the computational bottleneck, we leverage convex and non-convex optimization techinques to tackle the clustering problem and present their achievement of an information-theoretic sharp threshold for exact recovery of cluster labels under the standard Gaussian mixture model. Joint work with Yun Yang (UIUC), Yubo Zhuang (UIUC), Richard Y. Zhang (UIUC).
Sep 22: Anna Ma (UCI)
The Kaczmarz algorithm and doubly noisy linear systems
Large-scale linear systems, Ax=b, frequently arise in data science and scientific computing at massive scales, thus demanding effective iterative methods to solve them. Often, these systems are noisy due to operational errors or faulty data-collection processes. In the past decade, the randomized Kaczmarz algorithm (RK) was studied extensively as an efficient iterative solver for such systems. However, the convergence study of RK in the noisy regime is limited and only considers measurement noise in the right-hand side vector, b. Practically, that is only sometimes the case, as the coefficient matrix A can also be noisy. But is noise inherently bad? In this talk, we motivate and discuss doubly noise linear systems and the performance of the Kaczmarz-type algorithms applied to such systems, highlighting that noise can potentially be exploited. The presented work is joint work with El Houcine Bergou, Soumia Boucherouite, Aritra Dutta, and Xin Li.
Sep 29: Mladen Kolar (USC & U Chicago)
Latent multimodal functional graphical model estimation
Joint multimodal functional data acquisition, where data multiple modes of functional data are measured from the same subject simultaneously, has emerged as an exciting modern approach enabled by recent engineering breakthroughs in the neurological and biological sciences. One prominent motivation for acquiring such data is to enable new discoveries of the underlying connectivity by combining signals from multiple modalities. Yet, despite scientific interest in this problem, there remains a gap in principled statistical methods for estimating the graph underlying joint multimodal functional data. To this end, we propose a new integrative framework to estimate a single latent graphical model from multimodal functional data. We take the generative perspective by modeling the data generation process, from which we identify the inverse mapping from the observation space to the latent space as transformation operators. We then develop an estimator that simultaneously estimates the transformation operators and the latent graph via functional neighborhood regression. The approach is motivated and applied to analyzing simultaneously acquired multimodal brain imaging data where the graph indicates the underlying brain functional connectivity.
Joint work with Katherine Tsai, Boxin Zhao, Sanmi Koyejo.
Oct 6: Vatsal Sharan (USC)
Sample Amplification: Increasing Dataset Size even when Learning is Impossible
Oct 13: [No talk] [Fall Recess]
Oct 20: Çağın Ararat (Bilkent)
Path-regularity and martingale properties of set-valued stochastic integrals
We study the path-regularity and martingale properties of set-valued stochastic integrals defined in our previous work A., Ma, Wu (AAP, 2023). Such integrals are fundamentally different from the well-known Aumann-Itô stochastic integrals and more suitable for representing set-valued martingales. However, like the Aumann-Itô integral, the new integral is only a set-valued submartingale in general, and there is very limited knowledge about its path-regularity. We first establish the existence of right- and left-continuous modifications of set-valued submartingales in continuous time and apply these results to set-valued stochastic integrals. We also show that a set-valued stochastic integral yields a martingale if and only if the set of terminal values of the stochastic integrals associated to the integrand is closed and decomposable. As a special case, we study the set-valued martingale in the form of the conditional expectation of a convex set-valued random variable. When this random variable is a convex random polytope, we show that the conditional expectation of a vertex stays as a vertex of the set-valued conditional expectation if and only if the random polytope has a deterministic normal fan. This is a joint work with Jin Ma.
Oct 27: Yier Lin (U Chicago)
The atypical growth in a random interface
Random interface growth is all around us: tumors, bacterial colonies, infections, and propagating flame fronts. The KPZ equation is a stochastic PDE central to a class of random growth phenomena. In this talk, I will explain how to combine tools from probability, partial differential equations, and integrable systems to understand the behavior of the KPZ equation when it exhibits unusual growth.Tsai
This talk is based in part on joint works with Pierre Yves Gaudreau Lamarre and Li-Cheng
Nov 3: Yuhua Zhu (UCSD)
Continuous-in-time Limit for Bandits
In this talk, I will build the connection between Hamilton-Jacobi-Bellman equations and the multi-armed bandit (MAB) problems. MAB is a widely used paradigm for studying the exploration-exploitation trade-off in sequential decision making under uncertainty. This is the first work that establishes this connection in a general setting. I will present an efficient algorithm for solving MAB problems based on this connection and demonstrate its practical applications.
Nov 10: [No talk] [Veterans’ Day]
Nov 17: Peter Kagey (Harvey Mudd) [Undergraduate lecture]
Expected value of letters of permutations with a given number of $k$-cycles
The probability that a uniformly random permutation on $2n$ letters has exactly $\alpha$ transpositions is equal to the probability that a uniformly random isometry of the $n$-cube has exactly $\alpha$ pairs of faces that are fixed. In this talk, I will describe generalizations of this relationship, show how this gives information about the expected value of letters of permutations with a given number of $k$-cycles, and propose several open questions related to families of permutation statistics.
Nov 24: [No talk] [Thanksgiving]
Dec 1: Larry Goldstein (USC)
Classical and Free Zero Bias for Infinite Divisibility
Spring 2023:
January 13: Zachary Bezemek (Boston University)
Large Deviations and Importance Sampling for Weakly Interacting Diffusions
We consider an ensemble of N interacting particles modeled by a system of N stochastic differential equations (SDEs). The coefficients of the SDEs are taken to be such that as N approaches infinity, the system undergoes Kac’s propagation of chaos, and hence is well-approximated by the solution to a McKean-Vlasov Equation. Rare but possible deviations of the behavior of the particles from this limit may reflect a catastrophe, and computing the probability of such rare events is of high interest in many applications. In this talk, we design an importance sampling scheme which allows us to numerically compute statistics related to these rare events with high accuracy and efficiency for any N. Standard Monte Carlo methods behave exponentially poorly as N increases for such problems. Our scheme is based on subsolutions of a Hamilton-Jacobi-Bellman (HJB) Equation on Wasserstein Space which arises in the theory of mean-field control. This HJB Equation is seen to be connected to the large deviations rate function for the empirical measure on the ensemble of particles. We identify conditions under which our scheme is provably asymptotically optimal in N in the sense of log-efficiency. We also provide evidence, both analytical and numerical, that with sufficient regularity of the solution to the HJB Equation, our scheme can have vanishingly small relative error as N increases.
February 3: Mei Yin (Denver)
Probabilistic parking functions
We extend the notion of classical parking functions by introducing randomness via a probabilistic parking protocol. We explore the properties of a preference vector given that it is a parking function and discuss the effect (and lack thereof) of the probabilistic parameter p. Of special interest is when p=1/2, where we demonstrate a sharp transition in some parking statistics. We connect our result to other phenomena in combinatorics. Partially based on joint work with Irfan Durmic, Alex Han, Pamela E. Harris, and Rodrigo Ribeiro.
Time permitting, I will discuss newer thoughts after the paper is written and further directions for research.
February 10: Ying Tan (USC)
A Generalized Kyle-Back Insider Trading Model with Dynamic Information
n this project, we consider a class of generalized Kyle-Back strategic insider trading models in which the insider is able to use the dynamic information obtained by observing the instantaneous movement of an underlying asset that is allowed to be influenced by its market price. Since such a model will be largely outside the Gaussian paradigm, we shall try to Markovize it by introducing an auxiliary (factor) diffusion process, in the spirit of the weighted total order process, as a part of the “pricing rule”. As the main technical tool in solving the Kyle-Back equilibrium in such a setting, we study a class of Stochastic Two-Point Boundary Value Problem (STPBVP), which resembles the dynamic Markov bridge in the literature, but without insisting on its local martingale requirement. In the case when the solution of the STPBVP has an affine structure, we show that the pricing rule functions, whence the Kyle-Back equilibrium, can be determined by the decoupling field of a forward-backward SDE obtained via a non-linear filtering approach, along with a set of compatibility conditions. This is a joint work with Jin Ma.
February 17: Mateo Díaz (Caltech)
Clustering a mixture of Gaussians with unknown covariance
Clustering is a fundamental data scientific task with broad applications. This talk investigates a simple clustering problem with data from a mixture of Gaussians that share a common but unknown and potentially ill-conditioned covariance matrix. We consider Gaussian mixtures with two equally-sized components and derive a Max-Cut integer program based on maximum likelihood estimation. We show its solutions achieve the optimal misclassification rate when the number of samples grows linearly in the dimension, up to a logarithmic factor. However, solving the Max-cut problem appears to be computationally intractable. To overcome this, we develop an efficient iterative algorithm that attains the optimal rate but requires a quadratic sample size. Although this sample complexity is worse than that of the Max-cut problem, we conjecture that no polynomial-time method can perform better. Furthermore, we present numerical and theoretical evidence that supports the existence of a statistical-computational gap.
March 3: Paromita Dubey (USC Marshall)
Geometric EDA for Random Objects
In this talk I will propose new tools for the exploratory data analysis of data objects taking values in a general separable metric space. First, I will introduce depth profiles, where the depth profile of a point ω in the metric space refers to the distribution of the distances between ω and the data objects. I will describe how depth profiles can be harnessed to define transport ranks, which capture the centrality of each element in the metric space with respect to the data cloud. Next, I will discuss the properties of transport ranks and show how they can be an effective device for detecting and visualizing patterns in samples of random objects. Together with practical illustrations I will establish the theoretical guarantees for the estimation of the depth profiles and the transport ranks for a wide class of metric spaces. Finally, I will describe a new two sample test geared towards populations of random objects by utilizing the depth profiles corresponding to the data objects. I will demonstrate the efficacy of this new approach on distributional data comprising of a sample of age-at-death distributions for various countries, for compositional data through energy usage for the U.S. states and for neuroimaging network data. This talk is based on joint work with Yaqing Chen and Hans-Georg Müller.
March 10: Robi Bhattachargee (UCSD)
Online k-means Clustering on Arbitrary Data Streams
We consider online k-means clustering where each new point is assigned to the nearest cluster center, after which the algorithm may update its centers. The loss incurred is the sum of squared distances from new points to their assigned cluster centers. The goal over a data stream X is to achieve loss that is a constant factor of L(X,OPTk), the best possible loss using k fixed points in hindsight.
We propose a data parameter, Λ(X), such that for any algorithm maintaining O(kpoly(logn)) centers at time n, there exists a data stream X for which a loss of Ω(Λ(X)) is inevitable. We then give a randomized algorithm that achieves clustering loss O(Λ(X)+L(X,OPTk)). Our algorithm uses O(kpoly(logn)) memory and maintains O(kpoly(logn)) cluster centers. Our algorithm also enjoys a running time of O(kpoly(logn)) and is the first algorithm to achieve polynomial space and time complexity in this setting. It also is the first to have provable guarantees without making any assumptions on the input data.
Arxiv link:https://arxiv.org/abs/2102.09101.
March 17: [spring break]
March 24: Yian Ma (UCSD)
MCMC vs. variational inference — the rivalry and synergy between two approaches
I will introduce some recent progress towards understanding the scalability of Markov chain Monte Carlo (MCMC) methods and their comparative advantage with respect to variational inference. I will fact-check the folklore that “variational inference is fast but biased, MCMC is unbiased but slow”. I will follow an optimization perspective on the infinite dimensional probability space, where variational inference projects the probabilities onto a finite dimensional parameter space while MCMC leverages stochastic sample paths. Three ingredients will be the focus of this discussion: accuracy, dimension dependence, and stochasticity. I will then apply the MCMC approach to a sequential decision making problem.
March 31: Masoud Zargar (USC)
Probability on moduli spaces of bundles
I will discuss different probability spaces of moduli spaces of bundles over surfaces, and analytic, representation-theoretic, and probabilistic methods used in analyzing the geometry of random bundles. I will also discuss how these lead to probabilistic equidistribution results for images of geodesics under random unitary surface representations of cusped hyperbolic surfaces. If there is time, I will discuss relevant ideas and open questions in random matrix theory and integration over different moduli spaces of bundles.
April 7: Jiongmin Yong (U Central Florida) [joint with Math Finance]
Turnpike Properties for Stochastic Linear-Quadratic Optimal Control Problems
For deterministic optimal control problems in very large time horizons (either finite dimensional or infinite dimensional), under proper conditions, the optimal pair will stay near a solution of a proper static optimization problem. Such a phenomenon is called the “turnpike’’ property. The proper static optimization problem usually is the one with the objective function being the running cost rate function and with the constraint being the equilibria of the vector field of the state equation. However, for stochastic problems, mimicking the idea of the deterministic problems will lead to some incorrect conclusions. In this talk, we will look at stochastic linear-quadratic optimal control problem in large duration. We will correctly formulate the proper static optimization problem and establish the turnpike properties of the corresponding stochastic linear-quadratic problem.
April 14: Mikhail Belkin (UCSD)
Why do neural models need so many parameters?
One of the striking aspects of modern neural networks is their extreme size reaching billions or even trillions parameters.
Why are so many parameters needed? To attempt an answer to this question, I will discuss an algorithm and distribution independent non-asymptotic trade-off between the model size, excess test loss, and training loss for linear predictors. Specifically, we show that models that perform well on the test data (have low excess loss) are either “classical” — have training loss close to the noise level, or are “modern” — have a much larger number of parameters compared to the minimum needed to fit the training data exactly.
Furthermore, I will provide empirical evidence that optimal performance of realistic models is typically achieved in the “modern” regime, when they are trained well below the noise level.
Collaborators: Nikhil Ghosh and Like Hui.
April 21: Andrew Lowy (USC)
Private Stochastic Optimization with Large Worst-Case Lipschitz Parameter: Optimal Rates for (Non-Smooth) Convex Losses and Extension to Non-Convex Losses
April 28: Lenny Fukshansky (CMC) [on zoom only]
June 6: Jasper Lee (UW Madison) [in person]
Mean and location estimation are two of the most fundamental problems in statistics. How do we most accurately estimate the mean of a distribution, given i.i.d. samples? What if we additionally know the shape of the distribution, and we only need to estimate its translation/location?
Both problems have well-established asymptotic theories: use the sample mean for mean estimation and maximum likelihood for location estimation. Yet, asymptotic guarantees can sometimes be misleading in practice: it may take arbitrarily many samples to converge to the (law of large numbers) limit, depending on how bad the underlying distribution is, especially if we want high confidence in our estimate.
In this talk, I will discuss recent progress on developing finite sample and high probability theories for these problems that yield optimal constants. Results include new mean estimators in 1-dimensional and very-high dimensional regimes with error tight up to a 1+o(1) factor, as well as new characterizations of location estimation and improvements over the MLE. I will end by describing some directions for future work in this area.
This talk is based on joint works with Paul Valiant, Shivam Gupta and Eric Price.
Fall 2022:
Aug 26: [No seminar]
Sep 2: [No seminar]
Sep 9: Daniel Kane (UCSD) [zoom]
Ak Testing of Distributions
We introduce the theory of distribution testing with a focus on tests for one-dimensional, structured distributions. In order to produce efficient testers, we introduce the theory of A_k distance and discuss the known results for optimal testers with respect to A_k distance, showing that it produces near-optimal testers for many distribution families.
Sep 16: Ken Alexander (USC) [in-person]
Disjointness of geodesics for first passage percolation in the plane
First passage percolation may be viewed as a way of creating a random metric on the integer lattice. Random passage times (iid) are assigned to the edges of the lattice, and the distance from $x$ to $y$ is defined to be the smallest sum of passage times among all paths from $x$ to $y$. We consider geodesics (shortest paths) in this context; specifically, what is the probability that two close–by nearly–parallel geodesics are disjoint? More precisely, if the geodesics have end-to-end distance $L$ and the endpoints are separated by distance $a$ at one end and $b$ at the other, how does the disjointness probability vary with $L,a,b$? We answer this question, at least at the level of the proper exponents for $L,a,$ and $b$.
MONDAY Sep 19 [Joint with CAMS]: Nathan Glatt Holtz (Tulane)
A unified framework for Metropolis-Hastings Type Monte Carlo methods.
The efficient generation of random samples is a central task within many of the quantitative sciences. The workhorse of Bayesian statistics and statistical physics, Markov chain Monte Carlo (MCMC) comprises a large class of algorithms for sampling from arbitrarily complex and/or high-dimensional probability distributions. The Metropolis-Hastings method (MH) stands as the seminal MCMC algorithm, and its basic operation underlies most of the MCMC techniques developed to this day.
We provide an all-encompassing, measure theoretic mathematical formalism that describes essentially any Metropolis-Hastings algorithm using three ingredients: a random proposal, an involution on an extended phase space and an accept-reject mechanism. This unified framework illuminates under-appreciated relationships between a variety of known algorithms while yielding a means for deriving new methods.
As an immediate application we identify several novel algorithms including a multiproposal version of the popular preconditioned Crank- Nicolson (pCN) sampler suitable for infinite-dimensional target measures which are absolutely continuous with respect to a Gaussian base measure. We also develop a new class of ‘extended phase space’ methods, based on Hamiltonian mechanics. These methods provide a versatile approach to bypass expensive gradient computations through skillful reduced order modeling and/or data driven approaches. A selection of case studies will be presented that use our multiproposal pCN algorithm (mpCN) to resolve a selection of problems in Bayesian statistical inversion for partial differential equations motivated by fluid flow measurement.
This is joint work with Andrew J. Holbrook (UCLA), Justin Krometis (Virginia Tech) and Cecilia Mondaini (Drexel).
Sep 23: Tom Hutchcroft (Caltech) [in-person]
Critical behaviour in hierarchical percolation
Statistical mechanics models undergoing a phase transition often exhibit rich, fractal-like behaviour at their critical points, which are described in part by critical exponents – the indices governing the power-law growth or decay of various quantities of interest. These exponents are expected to depend on the dimension but not on the microscopic details of the model such as the choice of lattice. After much progress over the last thirty years we now understand two-dimensional and high-dimensional models rather well, but intermediate dimensions such as three remain mysterious. I will discuss these issues in the context of hierarchical percolation, a simplified version of percolation on Z^d for which, in forthcoming work, I establish a fairly good description of the critical behaviour in all dimensions.
THURSDAY Sep 29: Lisa Sauermann (MIT), 12PM NOON [zoom] [note the NONSTANDARD DAY/TIME]
Anticoncentration in Ramsey graphs and a proof of the Erdös-McKay Conjecture
This talk will discuss recent joint work with Matthew Kwan, Ashwin Sah, and Mehtaab Sawhney, proving an old conjecture of Erdös and McKay (for which Erdős offered $100). This conjecture concerns Ramsey graphs, which are (roughly speaking) graphs without large complete or empty subgraphs. In order to prove the conjecture, we study edge-statistics in Ramsey graphs, i.e. we study the distribution of the number of edges in a random vertex subset of a Ramsey graph. After discussing some background on Ramsey graphs, the talk will explain our results and give an overview of our proof approach.
Sep 30: Simo Ndaoud (ESSEC) [in-person]
Adaptive robustness and sub-Gaussian deviations in sparse linear regression through Pivotal Double SLOPE
October 7: Guang Cheng (UCLA) [in-person]
A Statistical Journey through Trustworthy AI
Our lab believes that the next generation of AI is mainly driven by trustworthiness, beyond performance. This talk attempts to offer statistical perspectives to embrace three challenges in trustworthy AI: privacy, robustness and fairness. Specifically, we consider privacy protection by machine un-learning, enhance adversarial robustness by utilizing artificially generated data, and establish fair Bayes-optimal classifiers. These results demonstrate the unique value of statisticians in studying trustworthy AI from empirical, methodological or theoretical aspects.
October 14: [Fall break]
October 21: Wen (Rick) Zhou (Colorado State) [in-person]
High dimensional clustering via latent semiparametric mixture models.
Cluster analysis is a fundamental task in machine learning. Several clustering algorithms have been extended to handle high-dimensional data by incorporating a sparsity constraint in the estimation of a mixture of Gaussian models. Though it makes some neat theoretical analysis possible, this type of approach is arguably restrictive for many applications. In this work we propose a novel latent variable transformation mixture model for clustering in which we assume that after some unknown monotone transformations the data follows a mixture of Gaussians. Under the assumption that the optimal clustering admits a sparsity structure, we develop a new clustering algorithm named CESME for high-dimensional clustering. The use of unspecified transformation makes the model far more flexible than the classical mixture of Gaussians. On the other hand, the transformation also brings quite a few technical challenges to the model estimation as well as the theoretical analysis of CESME. We offer a comprehensive analysis of CESME including identifiability, initialization, algorithmic convergence, and statistical guarantees on clustering. In addition, the convergence analysis has revealed an interesting algorithmic phase transition for CESME, which has also been noted for the EM algorithm in literature. Leveraging such a transition, a data-adaptive procedure is developed and substantially improves the computational efficiency of CESME. Extensive numerical study and real data analysis show that CESME outperforms the existing high-dimensional clustering algorithms including CHIME, sparse spectral clustering, sparse K-means, sparse convex clustering, and IF-PCA.
Oct 28: Çağın Ararat (Bilkent) [in-person]
Set-valued martingales and backward stochastic differential equations
Motivated by the connection between univariate dynamic risk measures and backward stochastic differential equations, we start building a theory for set-valued backward stochastic differential equations (SV-BSDE). As a first step for this purpose, we formulate a simple SV-BSDE with a compact-valued driver function and study the well-posedness of this SV-BSDE. A key tool in establishing well-posedness is the availability of a stochastic integral representation for set-valued martingales. We prove a new martingale representation theorem which, in contrast to the available literature, allows the initial value of the martingale to be nontrivial. This is a joint work with Jin Ma and Wenqian Wu.
November 4: Qiyang Han (Rutgers University) [in-person]
Universality of regularized regression estimators in high dimensions
The Convex Gaussian Min-Max Theorem (CGMT) has emerged as a prominent theoretical tool for analyzing the precise stochastic behavior of various statistical estimators in the so-called high dimensional proportional regime, where the sample size and the signal dimension are of the same order. However, a well recognized limitation of the existing CGMT machinery rests in its stringent requirement on the exact Gaussianity of the design matrix, therefore rendering the obtained precise high dimensional asymptotics largely a specific Gaussian theory in various important statistical models. This work provides a structural universality framework for a broad class of regularized regression estimators that is particularly compatible with the CGMT machinery. Here universality means that if a `structure’ is satisfied by the regression estimator $\hat{\mu}_G$ for a standard Gaussian design $G$, then it will also be satisfied by $\hat{\mu}_A$ for a general non-Gaussian design $A$ with independent entries. In particular, we show that with a good enough $\ell_\infty$ bound for the regression estimator $\hat{\mu}_A$, any `structural property’ that can be detected via the CGMT for $\hat{\mu}_G$ also holds for $\hat{\mu}_A$ under a general design $A$ with independent entries. As a proof of concept, we demonstrate our new universality framework in three key examples of regularized regression estimators: the Ridge, Lasso and regularized robust regression estimators, where new universality properties of risk asymptotics and/or distributions of regression estimators and other related quantities are proved. As a major statistical implication of the Lasso universality results, we validate inference procedures using the degrees-of-freedom adjusted debiased Lasso under general design and error distributions. This talk is based on joint work with Yandi Shen (Chicago).
Nov 18: Melih Iseri (USC)
Set Valued HJB Equations
In this talk we introduce a notion of set valued PDEs. The set values have been introduced for many applications, such as time inconsistent stochastic optimization problems, multivariate dynamic risk measures, and nonzero sum games with multiple equilibria. One crucial property they enjoy is the dynamic programming principle (DPP). Together with the set valued Itô formula, which is a key component, DPP induces the PDE. In the context of multivariate optimization problems, we introduce the set valued Hamilton-Jacobi-Bellman equations and established its wellposedness. In the standard scalar case, our set valued PDE reduces back to the standard HJB equation.
Our approach is intrinsically connected to the existing theory of surface evolution equations, where a well-known example is mean curvature flows. Roughly speaking, those equations can be viewed as first order set valued ODEs, and we extend them to second order PDEs. Another difference is that, due to different applications, those equations are forward in time (with initial conditions), while we consider backward equations (with terminal conditions).
November 25: [Thanksgiving break]
TBD
December 2: Uwe Schmock (TU Wien)
Existence and Properties of Optimal Strategies for Distribution-Constrained Optimization Problems
We consider stochastic optimization problems in discrete time under distributional constraints. These problems are motivated by actuarial science, in particular to treat unit-linked insurance products with guarantees. They can also serve as benchmark models to account for customer behaviour, when the treatment as American option is not appropriate. The basic mathematical set-up is an adapted stochastic process (interpreted as pay-outs) and (possibly randomized) stopping times to optimize the expected pay-out. The difference to classical optimal stopping problems is our constraint concerning the distribution of the stopping times or, more generally, the adapted random probability measures. For these distribution-constrained optimization problems we prove the existence of an optimal strategy; the basic assumptions are suitable moment conditions. In special cases, optimal strategies are identified explicitly. Time permitting, also a continuous-time variant will be discussed. (The main part of this talk is based on joint work
Spring 2022:
January 14:
February 4: Allan Sly (Princeton)
Spread of infections in random walkers
We consider a class of interacting particle systems with two types, A and B which perform independent random walks at different speeds. Type A particles turn into type B when they meet another type B particle. This class of systems includes models of infections and multi-particle diffusion limited aggregation. I will describe new methods to understand growth processes in such models.
February 11: Thibaut Mastrolia (Berkeley)
Market Making and incentives design in the presence of a dark pool
We consider the issue of a market maker acting at the same time in the lit and dark pools of an exchange. The exchange wishes to establish a suitable fees policy to attract transactions on its venues. We first solve the stochastic control problem of the market maker without the intervention of the exchange. Then we derive the equations defining the optimal contract to be set between the market maker and the exchange. This contract depends on the trading flows generated by the market maker’s activity on the two venues. In both cases, we show existence and uniqueness, in the viscosity sense, of the solutions of the Hamilton-Jacobi-Bellman equations associated with the market maker and exchange’s problems. We finally design deep reinforcement learning algorithms enabling us to efficiently approximate the optimal controls of the market maker and the optimal incentives to be provided by the exchange.
Thursday, February 17: Nike Sun (MIT), 1PM PST
NOTE THE NONSTANDARD DAY AND TIME
Generalized Ising perceptron models
The perceptron is a toy model of a single-layer neural network that “stores” a collection of given patterns. We consider the model on the N-dimensional discrete cube with M=N*alpha patterns, with a general activation function U that is bounded above. For U bounded away from zero, or U a one-sided threshold function, it was shown by Talagrand (2000, 2011) that for small alpha, the free energy of the model converges in the large-N limit to the replica symmetric formula conjectured in the physics literature (Krauth and Mezard, 1989). We give a new proof of this result, which covers the more general class of all functions U that are bounded above and satisfy a certain variance bound.
Based on joint works with Jian Ding, Erwin Bolthausen, Shuta Nakajima, and Changji Xu.
Thursday, February 24: Josh Frisch (ENS), 12PM PST
NOTE THE NONSTANDARD DAY AND TIME
Harmonic functions on Linear Groups
The Poisson Boundary of a group is a probabilistic object which serves a dual role. It represents the space of asymptotic trajectories a random walk might take and it represents the possible space of bounded harmonic functions on a group. In this Talk I will give an introduction to the Poisson Boundary with emphasis on the question of when it is trivial or not. I will emphasize how this question depends on properties of the acting group. In particular I will discuss the case of linear groups. This is Joint work with Anna Erschler.
March 4: Shirshendu Ganguly (Berkeley)
Fractal geometry in models of random growth
March 11: Andrew Ahn (Cornell)
Universality for Lyapunov Exponents of Random Matrix Products
Consider the discrete-time process formed by the singular values of products of random matrices, where time corresponds to the number of matrix factors. It is known due to Oseledets’ theorem that under general assumptions, the Lyapunov exponents converge as the number of matrix factors tend to infinity. In this talk, we consider random matrices with distributional invariance under right multiplication by unitary matrices, which include Ginibre matrices and truncated unitary matrices. The corresponding singular value process is Markovian with additional structure that admits study via integrable probability techniques. In this talk, I will discuss recent results, including joint work with Roger Van Peski, on the Lyapunov exponents in the setting where the number M matrix factors tend to infinity simultaneously with matrix sizes N. When this limit is tuned so that M and N grow on the same order, the limiting Lyapunov exponents can be described in terms of Dyson Brownian motion with a special drift vector, which in turn can be linked to a matrix-valued diffusion on the complex general linear group. We find that this description is universal, under general assumptions on the spectrum of the matrix factors.
March 18 [no talk, spring break]
March 25: Sangchul Lee (USC)
Pollution-sensitivity of bootstrap percolation
April 1: Rachel Wang (Sydney)
Global community detection using individual-centered partial networks
In statistical network modeling and inference, we often assume either the full network is available or multiple subgraphs can be sampled to estimate various global properties of the full network. However, in a real social network, people frequently make decisions based on their local view of the network alone. In this paper, we consider a partial information framework that characterizes the local network centered at a given individual by path length (or knowledge depth $L$) and gives rise to a partial adjacency matrix. Under $L=2$, we focus on the problem of (global) community detection using the popular stochastic block model (SBM) and its degree-corrected variant (DCSBM). We derive general properties of the eigenvalues and eigenvectors from the major term of the partial adjacency matrix and propose new spectral-based community detection algorithms for these two types of models, which can achieve almost exact recovery under appropriate conditions. Our settings in the DCSBM also allow us to interpret the efficiency of clustering using neighborhood features of the central node. Using simulated and real networks, we demonstrate the performance of our algorithms in inferring global community memberships using a partial network. In particular, we show that the clustering accuracy indicates different global structure is visible to different individuals.
Thursday, April 7: Ivan Corwin (Columbia), 12PM PST
NOTE THE NONSTANDARD DAY AND TIME
The ASEP speed process
Consider the asymmetric simple exclusion process (ASEP) on Z started with particles at negative integer sites, holes at positive integer sites and a lone second class particle at the origin. We prove that almost surely the second class particle achieves a limiting velocity as time goes to infinity which is uniform on the interval [-1,1]. This enables us to construct the ASEP speed process. Our work positively resolves a conjecture of Amir, Angel and Valko and extends the TASEP result of Mountford and Guiol. We rely on a combination of probabilistic couplings (in particular one due to Rezakhanlou) and effective hydrodynamic results (that we prove using tools from integrable probability). This is joint work with Amol Aggarwal and Promit Ghosal.
Thursday, April 14: Amol Aggarwal (Columbia) 12PM PST
NOTE THE NONSTANDARD DAY AND TIME
Universality Results in Random Lozenge Tilings
The statistical behavior of random tilings of large domains has been an intense topic of mathematical research for decades, partly since they highlight a central phenomenon in physics: local behaviors of highly correlated systems can be very sensitive to boundary conditions. Indeed, a salient feature of random tiling models is that the local densities of tiles can differ considerably in different regions of the domain, depending on the shape of the domain. Thus, a question of interest is how the shape of the domain affects the statistics of a random tiling. A general prediction in this context is that, while the family of possible domains is infinite-dimensional, the limiting statistics should only depend on a few parameters. In this talk, we will explain this universality phenomenon in the context of random lozenge tilings (equivalently, the dimer model on the hexagonal lattice). Various possible limiting local statistics can arise, depending on whether one probes the bulk of the domain; the edge of a facet; or the boundary of the domain. We will describe recent works that verify the universality of these statistics in each of these regimes.
April 22: J. E. Paguyo (USC)
Cycle structure of random parking functions
Parking functions were introduced by Konheim and Weiss in their study of the hash storage structure, and have since found many applications in combinatorics, probability, and computer science. Consider $n$ cars, each with a preferred parking spot in a one-way street with $n$ spots. Each car drives to its preferred spot and parks if possible, and otherwise takes the next available spot if it exists. A sequence of preferences is a parking function if all cars are able to park. In this talk, we discuss various combinatorial and probabilistic properties of random parking functions. In particular, we show that the joint distribution of cycle counts converges to a process of independent Poisson random variables. This partially answers a question of Diaconis and Hicks. Our proof uses a multivariate Stein’s method via exchangeable pairs, which also provides rates of convergence.
April 29: Robert Webber (Caltech)
Randomized low-rank approximation with higher accuracy and reduced costs
Randomized low-rank matrix approximation is one of the great success stories of randomized numerical linear algebra. It is a fast, scalable approach that has been widely adopted in scientific computing. However, the approach is most successful at approximating matrices with quickly decaying singular values since there can be significant random errors when the singular values decay slowly. Our ongoing work improves the accuracy of randomized low-rank matrix approximation for matrices with slow singular value decay. We study an emerging approach called ‘randomized block Krylov iteration’, which builds a rich approximation space from the output of repeated matrix-matrix multiplications. Our contributions are two-fold: first, we derive an efficient implementation of randomized block Krylov iteration that reduces the standard runtime by a half, and second, we establish error bounds that quantify the rapid convergence of randomized block Krylov iteration, demonstrating its advantage over alternative methods.
Fall 2021:
August 27:
September 3: Mehtaab Sawhney (MIT)
Approximate counting and sampling via local central limit theorems
We give deterministic approximate counting algorithms for the number of matchings and independent sets of a given size in bounded degree graphs. For matchings, the algorithm applies to any size bounded away from the maximum by a constant factor; for independent sets the algorithm applies to sizes up to the NP-hardness threshold. Our results are based on exploiting local central limit theorems as an algorithmic tool. For our results for independent sets, we prove a new local central limit theorem for the hard-core model. (with Vishesh Jain, Will Perkins, and Ashwin Sah)
September 24: Sourav Chatterjee (Stanford)
Local KPZ growth in a class of random surfaces
One of the main difficulties in proving convergence of discrete models of surface growth to the Kardar-Parisi-Zhang (KPZ) equation in dimensions higher than one is that the correct way to take a scaling limit, so that the limit is nontrivial, is not known in a rigorous sense. The same problem has so far prevented the construction of nontrivial solutions of the KPZ equation in dimensions higher than one. To understand KPZ growth without being hindered by this issue, I will introduce a new concept in this talk, called “local KPZ behavior”, which roughly means that the instantaneous growth of the surface at a point decomposes into the sum of a Laplacian term, a gradient squared term, a noise term, and a remainder term that is negligible compared to the other three terms and their sum. The main result is that for a general class of surfaces, which contains the model of directed polymers in a random environment as a special case, local KPZ behavior occurs under arbitrary scaling limits, in any dimension.
October 1: Xinyu Wu (CMU)
Separations between classical and quantum computational complexity through stochastic calculus
The Forrelation problem is: given two random Boolean vectors x and y, decide if x and y are uniformly random or if x is uniformly random and y is the Walsh-Fourier transform of x. Variants of this problem show up as examples of functions hard for some classical complexity classes but easy for the corresponding quantum complexity classes. In this talk, I will present an approach to analyze the complexity of the Forrelation problem using techniques from stochastic calculus. Using this technique, I will give some simplified proofs of recent results by Raz-Tal and Girish-Raz-Zhan.
October 8: Ankur Moitra (MIT)
Rethinking Robustness: From Classification to Contextual Bandits
What does it mean for a learning algorithm to be robust? We could hope to tolerate adversarial corruptions, but this makes many simple learning tasks computationally intractable. We could assume a particular stochastic model for noise, but then our algorithms might be overtuning to this one distribution, which was only meant to be a simplifying assumption.
In this talk we will revisit some classic learning problems, like learning halfspaces, online regression and contextual bandits, in models that blend worst-case and average-case noise. We will give simple twists on old algorithms that allow them to enjoy strong provable robustness guarantees. Finally we explore some intriguing connections between robustness and fairness.
October 15: [no talk] [Fall break]
October 22: Song Mei (UC Berkeley)
Local convexity of the TAP free energy and AMP convergence for Z2-synchronization
October 29: Deeparnab Chakrabarty (Dartmouth)
A polynomial lower bound on the number of rounds for efficient submodular function minimization
Submodular function minimization (SFM) is a fundamental discrete optimization problem generalizing problems such as minimum cut problems in graphs and hypergraphs, and the matroid intersection problem. It is well-known and remarkable that a submodular function on an N-element universe can be minimized using only poly(N) evaluation queries. However, all such efficient algorithms have poly(N) rounds of adaptivity. A natural question is whether one can get efficient parallel SFM algorithms with much fewer rounds of adaptivity. Recently, in STOC 2020, Balkanski and Singer proved an Omega(log N/log log N) lower bound on the rounds of adaptivity. However, it left open the possibility of a polylogarithmic depth efficient algorithm; such algorithms are indeed known for efficient (approximate) submodular function *maximization* versions.
November 5: Gabor Lugosi (ICREA)
Learning the structure of graphical models by covariance queries
The dependence structure of high-dimensional distributions is often modeled by graphical models. The problem of learning the graph underlying such distributions has received a lot of attention in statistics and machine learning. In problems of very high dimension, it is often too costly even to store the sample covariance matrix. We propose a new model in which one can query single entries of the covariance matrix. We propose computationally efficient algorithms for structure recovery in Gaussian graphical models with computational complexity that is quasi-linear in the dimension. We present algorithms that work for trees and, more generally, for graphs of small treewidth. The talk is based on joint work with Jakub Truszkowski, Vasiliki Velona, and Piotr Zwiernik.
Thursday, November 11, 10AM: Afonso Bandeira (ETH Zurich)
NOTE THE NONSTANDARD DAY AND TIME
Noncommutative Matrix Concentration Inequalities
Matrix Concentration inequalities for the spectrum of random matrices, such as Matrix Bernstein, have played an important role in many areas of pure and applied mathematics. These inequalities are intimately related to the celebrated noncommutative Khintchine inequality of Lust-Piquard and Pisier. In this talk we leverage ideas from Free Probability to remove the dimensional dependence in the noncommutative Khintchine in a range of instances, yielding sharp bounds in many settings of interest. As a byproduct we develop non-assymptotic matrix concentration inequalities that capture non-commutativity (or, to be more precise, “freeness”).
Joint work with March Boedihardjo and Ramon van Handel, more information at arXiv:2108.06312 [math.PR].
November 19: Nina Holden (ETH Zurich/ NYU) [talk via Zoom]
Conformal welding in Liouville quantum gravity: recent results and applications
November 26: [no talk] [Thanksgiving]
December 3: Melih Iseri (USC)
Set Values for Mean Field Games
Previous Talks:
Spring 2021:
January 15: Steven Heilman (USC)
Three Candidate Plurality is Stablest for Small Correlations
Suppose we model n votes in an election between two candidates as n i.i.d. uniform random variables in {-1,1}, so that 1 represents a vote for the first candidate, and -1 represents a vote for the other candidate. Then, for each vote, we flip a biased coin (with fixed probability larger than 1/2 of landing heads). If the coin lands tails, the vote is changed to the other candidate. In an election where each voter has a small influence on the election’s outcome, and each candidate has an equal chance of winning, the majority function best preserves the election’s outcome, when comparing the original election vote to the corrupted vote. This Majority is Stablest Theorem was proven in 2005 by Mossel-O’Donnell-Oleszkiewicz. The corresponding statement for elections between three or more voters has remained open since its formulation in 2004 by Khot-Kindler-Mossel-O’Donnell. We show that Plurality is Stablest for elections with three candidates, when the original and corrupted votes have a sufficiently small correlation. In fact, this result is a corollary of a more general structure theorem that applies for elections with any number of candidates and any correlation parameter.
(joint with Alex Tarter)
January 22: Jimmy He (Stanford University)
Random walks on finite fields with deterministic jumps
Recently, Chatterjee and Diaconis showed that most bijections, if applied between steps of a Markov chain, cause the resulting chain to mix much faster. However, explicit examples of this speedup phenomenon are rare. I will discuss recent work studying such walks on finite fields where the bijection is algebraically defined, and give a near-linear mixing time. This work gives a large collection of examples where this speedup phenomenon occurs. These walks can be seen as a non-linear analogue of the Chung-Diaconis-Graham process, where the bijection is multiplication by a non-zero element of the finite field.
January 29: Hoi Nguyen (Ohio State University)
Universality results for the cokernels of random integral matrices
For various general random matrix models with integral entries we discuss the probability that the cokernels are isomorphic to a given finite abelian group, or when they are cyclic. We will show that these statistics are asymptotically universal, depending only on the symmetry class of the matrices.
Based on joint works with M. M. Wood.
February 5: Dmitrii Ostrovskii (USC)
Near-Optimal Model Discrimination
In the standard setup of parametric M-estimation with convex loss, we consider two population risk minimizers associated with two possible distributions of a random observation, and then ask the following question:
Given the value of one of the two minimizers and access to i.i.d. sampling from both distributions, what sample sizes allow to identify the “right” distribution, i.e., the one corresponding to the given value?
February 12: Tomasz Tkocz (Carnegie Mellon University)
Khinchin inequalities and sections of convex bodies
I shall discuss some recent results on sharp moment comparison inequalities for weighted sums of i.i.d. random variables, a.k.a. Khinchin inequalities, as well as highlight how this relates to the problem of finding sections of convex bodies of extremal volume.
February 19: Nick Cook (Duke)
Universality for the minimum modulus of random trigonometric polynomials
We consider the restriction to the unit circle of random degree-n polynomials with iid normalized coefficients (Kac polynomials). Recent work of Yakir and Zeitouni shows that for Gaussian coefficients, the minimum modulus (suitably rescaled) follows a limiting exponential distribution. We show this is a universal phenomenon, extending their result to arbitrary sub-Gaussian coefficients, such as Rademacher signs. Our approach relates the joint distribution of small values at several angles to that of a random walk in high-dimensional phase space, for which we obtain strong central limit theorems. For discrete coefficients this requires dealing with arithmetic structure among the angles. Based on joint work with Hoi Nguyen.
February 26: Anderson Ye Zhang (Wharton School of the University of Pennsylvania)
Optimal Ranking Recovery from Pairwise Comparisons
Ranking from pairwise comparisons is a central problem in a wide range of learning and social contexts. Researchers in various disciplines have made significant methodological and theoretical contributions to it. However, many fundamental statistical properties remain unclear especially for the recovery of ranking structure. This talk presents two recent projects towards optimal ranking recovery, under the Bradley-Terry-Luce (BTL) model.
In the first project, we study the problem of top-k ranking. That is, to optimally identify the set of top-k players. We derive the minimax rate and show that it can be achieved by MLE. On the other hand, we show another popular algorithm, the spectral method, is in general suboptimal.
In the second project, we study the problem of full ranking among all players. The minimax rate exhibits a transition between an exponential rate and a polynomial rate depending on the magnitude of the signal-to-noise ratio of the problem. To the best of our knowledge, this phenomenon is unique to full ranking and has not been seen in any other statistical estimation problem. A divide-and-conquer ranking algorithm is proposed to achieve the minimax rate.
March 5: Paata Ivanisvili (NC State)
Exponential integrability for Gaussian measures
Let X be the standard Gaussian random vector. In this talk I will discuss conditions on a real valued test function f such that the exponential integral E exp(f(X)) is finite. Talagrand showed that if the exponential of the square of the gradient of f divided by 2 is integrable with respect to the standard Gaussian measure then this guarantees finiteness of E exp(f(X)). In the joint work with Ryan Russell we sharpen this result, and we also provide quantitative dimension independent bounds relating these two quantities.
March 12: Movie Screening: Secrets of the Surface, The Mathematical Vision of Maryam Mirzakhani [Wellness Day]
March 19: Jianfeng Zhang (USC)
Mean Field Game Master Equations with Non-separable Hamiltonians and Displacement Monotonicity
It is well known that the monotonicity condition is crucial for the global wellposedness of mean field game master equations. In the literature the monotonicity condition is proposed only for separable Hamiltonians. In this talk, we propose a structural condition on non-separable Hamiltonians, in the spirit of the so-called displacement monotonicity condition, and then establish the global wellposedness for such master equations.The talk is based on a joint work with Gangbo, Meszaros, and Mou.
April 9: Arash A. Amini (UCLA)
Adjusted chi-square test for degree-corrected block models
We propose a goodness-of-fit test for degree-corrected stochastic block models (DCSBM). The test is based on an adjusted chi-square statistic for measuring equality of means among groups of $n$ multinomial distributions with $d_1,\dots,d_n$ observations. In the context of network models, the number of multinomials, $n$, grows much faster than the number of observations, $d_i$, corresponding to the degree of node $i$, hence the setting deviates from classical asymptotics. We show that a simple adjustment allows the statistic to converge in distribution, under null, as long as the harmonic mean of $\{d_i\}$ grows to infinity.
When applied sequentially, the test can also be used to determine the number of communities. The test operates on a (row) compressed version of the adjacency matrix, conditional on the degrees, and as a result is highly scalable to large sparse networks. We incorporate a novel idea of compressing the rows based on a $(K+1)$-community assignment when testing for $K$ communities. This approach increases the power in sequential applications without sacrificing computational efficiency, and we prove its consistency in recovering the number of communities. Since the test statistic does not rely on a specific alternative, its utility goes beyond sequential testing and can be used to simultaneously test against a wide range of alternatives outside the DCSBM family.
The test can also be easily applied to Poisson count arrays in clustering or biclustering applications, as well as bipartite and directed networks. We show the effectiveness of the approach by extensive numerical experiments with simulated and real data. In particular, applying the test to the Facebook-100 dataset, a collection of one hundred social networks, we find that a DCSBM with a small number of communities (say $ < 25$) is far from a good fit in almost all cases. Despite the lack of fit, we show that the statistic itself can be used as an effective tool for exploring community structure, allowing us to construct a community profile for each network.
April 16: Ramon van Handel (Princeton University)
Vector concentration inequalities
How do classical concentration inequalities extend to functions taking values in normed spaces? Such questions arise in various settings (in functional analysis, random matrix theory, etc.), and are generally studied on a case by case basis. In the Gaussian case, however, Pisier discovered a powerful principle that explains in complete generality how vector-valued functions concentrate. Unfortunately, Pisier’s argument appears to be very specific to the Gaussian distribution, and until recently this phenomenon was not known to hold in any other setting.
In this talk, I will explain that Pisier’s principle is much more universal than expected: a completely analogous result holds under the classical (Bakry-Emery) conditions used in the theory of scalar concentration inequalities. Moreover, a modified form of this principle holds also in discrete settings (as was first observed, in the special case of the discrete cube, in joint work with Ivanisvili and Volberg that settled an old conjecture of Enflo in Banach space theory). These new inequalities provide very general tools for establishing concentration
properties of vector-valued functions.
April 23: Tselil Schramm (Stanford University)
Computational Barriers to Estimation from Low-Degree Polynomials
April 30: Mark Rychnovsky (Columbia) [Wellness Day]
Extreme particles for sticky Brownian motions
Sticky Brownian motions behave as independent Brownian motions when they are separated and interact when they touch, so that the coincidence time has positive Lebesgue measure with positive probability. For a specific sticky interaction we show that as the number of particles, and the time the particles are allowed to evolve are simultaneously sent to infinity, the fluctuations of the extremal particle are given by the GUE Tracy-Widom distribution. The proof involves realizing a specific sticky interaction as the limit of an exactly solvable statistical mechanics model, the Beta random walk in random environment, and using this to derive exact integral formulas.
This is joint work with Guillaume Barraquand.
Fall 2020:
Aug 28: Varun Jog (U. Wisconsin-Madison)
Sep 04: Zhou Fan (Yale University)
Sep 11: Katherine Bouman (Caltech)
Capturing the First Image of a Black Hole & Designing the Future of Black Hole Imaging
This talk will present the methods and procedures used to produce the first image of a black hole from the Event Horizon Telescope, as well as discuss future developments for black hole imaging. It had been theorized for decades that a black hole would leave a “shadow” on a background of hot gas. Taking a picture of this black hole shadow would help to address a number of important scientific questions, both on the nature of black holes and the validity of general relativity. Unfortunately, due to its small size, traditional imaging approaches require an Earth-sized radio telescope. In this talk, I discuss techniques the Event Horizon Telescope Collaboration has developed to photograph a black hole using the Event Horizon Telescope, a network of telescopes scattered across the globe. Imaging a black hole’s structure with this computational telescope required us to reconstruct images from sparse measurements, heavily corrupted by atmospheric error. The talk will also discuss future developments, including new imaging techniques and how we are developing machine learning methods to help design future telescope arrays.
Sep 18: Michael Cranston (UC Irvine)
The Riemann zeta distribution
Sep 25: Philippe Sosoe (Cornell University)
Oct 2: James Lee (University of Washington)
Scaling exponents in stationary random media
Oct 9: Josh Starmer (StatQuest)
The Elements of StatQuest: How to clearly explain complicated sounding things.
Oct 16: Yosi Rinott (Hebrew University of Jerusalem)
215PM START. NON-STANDARD TIME
Statistical Aspects of Quantum Supremacy Demonstration
Quantum supremacy as presented by Google’s team in 2019, and as it is expected to be in the near future, consists of demonstrating that a quantum circuit is capable of generating bitstrings from a distribution (which is itself random) that is considered hard to simulate on classical computers. This is called a sampling task (which is impractical for now). Quantum computers perform such a task with a great deal of noise, and efficient statistical analysis is needed in order is to verify that indeed the right distribution was sampled, and to estimate the noise level or fidelity. Some interesting statistical questions arise. (joint with Gil Kalai)
Oct 23: Xin Tong (USC Marshall)
Individual-centered partial information in social networksMost existing statistical network analysis literature assumes a global view of the network, under which community detection, testing, and other statistical procedures are developed. Yet in the real world, people frequently make decisions based on their partial understanding of the network information. As individuals barely know beyond friends’ friends, we assume that an individual of interest knows all paths of length up to L = 2 that originate from her. As a result, this individual’s perceived adjacency matrix B differs significantly from the usual adjacency matrix A based on the global information. The new individual-centered partial information framework sparks an array of interesting endeavors from theory to practice. Key general properties on the eigenvalues and eigenvectors of BE, a major term of B, are derived. These general results, coupled with the classic stochastic block model, lead to a new theory-backed spectral approach to detecting the community memberships based on an anchored individual’s partial information. Real data analysis delivers interesting insights that cannot be obtained from global network analysis. (joint with Xiao Han)
Oct 30: Andrea Montanari (Stanford University)
Self-induced regularization from linear regression to neural networks
Modern machine learning methods –most noticeably multi-layer neural networks– require to fit highly non-linear models comprising tens of thousands to millions of parameters. Despite this, little attention is paid to the regularization mechanism to control model’s complexity. Indeed, the resulting models are often so complex as to achieve vanishing training error: they interpolate the data. Despite this, these models generalize well to unseen data : they have small test error.I will discuss several examples of this phenomenon, beginning with a simple linear regression model, and ending with two-layers neural networks in the so-called lazy regime. For these examples precise asymptotics could be determined mathematically, using tools from random matrix theory. I will try to extract a unifying picture. A common feature is the fact that a complex unregularized nonlinear model becomes essentially equivalent to a simpler model, which is however regularized in a non-trivial way.
[Based on joint papers with: Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, Feng Ruan, Youngtak Sohn, Jun Yan, Yiqiao Zhong]
Nov 6: Xiaohui Chen (University of Illinois at Urbana-Champaign)
A diffusion perspective of manifold clustering
Nov 13: Sumit Mukherjee (Columbia University)
Spring 2020:
January 17: Tianyi Zheng (UCSD)
Invariant random subgroups of groups acting on trees
A group is called permutation stable if almost homomorphisms to the symmetric groups must be close to actual homomorphisms asymptotically. Recently Becker, Lubotzky and Thom showed a characterization of permutation stability for amenable groups in terms of invariant random subgroups (IRSs). We will discuss some examples of amenable groups acting on the rooted tree, including Grigorchuk groups, whose IRSs have special properties. The information on IRSs allows to verify the Becker-Lubotzky-Thom criterion, thus providing a large collection of permutation stable groups.
January 24: Asaf Ferber (UC Irvine)
Resilience of the rank of random matrices
Let M be an n × m matrix of independent Rademacher (±1) random variables. It is well known that if n ≤ m, then M is of full rank with high probability. We show that this property is resilient to adversarial changes to M. More precisely, if m ≥ n+n^{1-ε}, then even after changing the sign of (1 − ε)m/2 entries, M is still of full rank with high probability. Note that this is asymptotically best possible as one can easily make any two rows proportional with at most m/2 changes. Moreover, this theorem gives an asymptotic solution to a slightly weakened version of a conjecture made by Van Vu.
Joint work with Kyle Luh and Gwen McKinley
January 31: Carina Betken (Ruhr University Bochum)
Limit theorems in dynamic random networks via Stein’s method
Many real world phenomena can be modelled by dynamic random networks. We will focus on preferential attachment models where the networks grow in time and new vertices connect to older ones with probability depending on a sublinear function of the degree of the older vertex. Developing Stein’s method for a class of limiting degree distributions provides rates of convergence for the total variation distance as the number of vertices tends to infinity. Stein’s method also yields a central limit theorem for the number of isolated vertices for networks with specific attachment functions.
This is joint work with Hanna Döring and Marcel Ortgiese.
February 7: Aditi Dandapani (Ecole Polytechnique/Columbia University) [Joint with Math Finance Colloquium]
From Quadratic Hawkes to Rough Volatility and Zumbach Effect
Using microscopic price models based on Hawkes processes, it has been shown that under some no-arbitrage condition, the high degree of endogeneity of markets together with the phenomenon of metaorders splitting generate rough Heston-type volatility at the macroscopic scale. One additional important feature of financial dynamics, at the heart of several influential works in econophysics, is the so-called feedback or Zumbach effect. This essentially means that past trends in returns convey significant information on future volatility. A natural way to reproduce this property in microstructure modeling is to use quadratic versions of Hawkes processes. We show that after suitable rescaling, the long term limits of these processes are refined versions of rough Heston models where the volatility coefficient is enhanced compared to the square root characterizing Heston-type dynamics. Furthermore the Zumbach effect remains explicit in these limiting rough volatility models
February 14: Moritz Voss (UCSB) [Joint with Math Finance Colloquium]
A two-player price impact game
We study the competition of two strategic agents for liquidity in the benchmark portfolio tracking setup of Bank, Soner, Voss (2017), both facing common aggregated temporary and permanent price impact à la Almgren and Chriss (2001). The resulting stochastic linear quadratic differential game with terminal state constraints allows for an explicitly available open-loop Nash equilibrium in feedback form. Our results reveal how the equilibrium strategies of the two players take into account the other agent’s trading targets: either in an exploitative intent or by providing liquidity to the competitor, depending on the ratio between temporary and permanent price impact. As a consequence, different behavioral patterns can emerge as optimal in equilibrium. These insights complement existing studies in the literature on predatory trading models examined in the context of optimal portfolio liquidation problems.
February 21: Ken Alexander (USC)
Geodesics, bigeodesics, and coalescence in first passage percolation in general dimension
In first passage percolation, iid passage times are attached to the bonds of the lattice Zd. The geodesic from x to y is the path which minimizes the sum of passage times; a θ-ray is a path to ∞ with asymptotic direction θ for which every finite segment is a geodesic, and a bigeodesic is an analogous doubly infinite geodesic path. It is known that under mild hypotheses, for each starting point x and direction θ, there is a θ-ray from x. In d = 2 it is a.s. unique, and furthermore, for all x,y the θ-rays from x and y eventually coalesce, and there are no bigeodesics with θ as either asymptotic direction. We show that in general dimension, under somewhat stronger hypotheses, a weak form of coalescence holds: we take all θ-rays starting next to a hyperplane H0 , translate the hyperplane forward by distance R to give HR , and consider the density of sites in HR where one of the θ-rays first crosses
HR . We show this density approaches 0 as R → ∞, with near-optimal bounds on the rate. Essentially as a consequence, we show that there are no bigeodesics in any direction.
February 28: Jianfeng Zheng (USC)
March 6: Pierre Baldi (UC Irvine)
Mathematical Foundations of Deep Learning
March 13: [no talk]
March 20: [no talk]
[TBA]
March 25: Yaniv Plan [UBC] [CANCELLED]
April 3: [TBA]
April 10: Wuchen Li (UCLA)
Information Newton’s flow in probability space
We introduce a framework for Newton’s flows in probability space with information metrics, named information Newton’s flows. Here two information metrics are considered, including both the Fisher-Rao metric and the Wasserstein-2 metric. Several examples of information Newton’s flows for learning objective/loss functions are provided, such as Kullback-Leibler (KL) divergence, Maximum mean discrepancy (MMD), and cross-entropy. The asymptotic convergence results of proposed Newton’s methods are provided. A known fact is that overdamped Langevin dynamics corresponds to Wasserstein gradient flow of KL divergence. Extending this fact to Wasserstein Newton’s flow of KL divergence, we derive the Newton’s Langevin dynamics. We provide examples of Newton’s Langevin dynamics in both one-dimensional space and Gaussian families. In numerics, we design a sampling efficient variational method to approximate information Newton’s directions. We restrict the dual variable of the Newton’s direction into a affine function space. Several numerical examples in Gaussian families and Bayesian logistic regression are shown to demonstrate the effectiveness of the proposed method.
April 17: [TBA]
April 24: Qi Feng (USC)
Sub-Riemannian Ricci curvature via generalized Gamma z calculus and functional inequalities.
A drift-diffusion process with non-degenerate diffusion coefficient matrix posses good properties: convergence to equilibrium, entropy dissipation rate, etc. The degenerate drift-diffusion process possess non-positive definite diffusion coefficient matrix, which makes it difficult to govern the convergence property and entropy dissipation rate by drift-diffusion coefficients on its own because of lacking control for the system. In general, the degenerate drift-diffusion is intrinsically equipped with sub-Riemannian structure defined by the diffusion coefficients. We propose a new methodology to systematically study general drift-diffusion process through sub-Riemannian geometry and Wasserstein geometry. We generalize the Bakry-E´mery calculus and Gamma z calculus to define a new notion of sub-Riemannian Ricci curvature tensor. With the new Ricci curvature tensor, we are able to establish generalized curvature dimension bounds on sub-Riemannnian manifolds which goes beyond step two condition. As application, for the first time, we establish analytical bounds for logarithmic Sobolev inequalities for the weighted measure on displacement group and Engel group. Our result also provides entropy dissipation rate for Langevin dynamics with gradient drift and variable temperature matrix. The talk is based on joint works with Wuchen Li.
May 1: Galyna Livshyts (Georgia Tech)
A discussion on the Log-Brunn-Minkowski conjecture and related questions
We shall discuss the Log-Brunn-Minkowski conjecture, a conjectured strengthening of the Brunn-Minkowski inequality proposed by Boroczky, Lutwak, Yang and Zhang. The discussion will involve introduction and explanation of how the local version of the conjecture arises naturally, a collection of ‘’hands on’’ examples and elementary geometric tricks leading to various related partial results, statements of related questions as well as a discussion of more technically involved approaches and results. Based on work with Johannes Hosle and Alexander Kolesnikov, as well as on previous joint results with Colesanti, Marsiglietti, Nayar, Zvavitch.
Fall 2019:
Sep 6: Sarah Cannon (Department of Mathematical Sciences, Claremont Mckenna College)
Counting independent sets in unbalanced bipartite graphs
We study the hard-core model (independent sets) on bipartite graphs using the cluster expansion from statistical physics. When there is a sufficient imbalance in the degrees or fugacities between the sides (L,R) of the bipartition, we can rewrite the hard-core partition function in terms of deviations from independent sets that are empty on one side of the bipartition and show this expression has a convergent cluster expansion. This has interesting algorithmic and probabilistic consequences. On the algorithmic side, we address an open problem in approximate counting and give a polynomial-time algorithm for approximating the partition function for a large class of bounded degree bipartite graphs; this includes, among others, the unweighted biregular case where the degrees satisfy d_R >= 7 d_L log d_L. Our approximation algorithm is based on truncating the cluster expansion. As a consequence of the method, we also prove that the hard-core model on such graphs exhibits exponential decay of correlations by utilizing connections between the cluster expansion and joint cumulants.
Sep 13: Joint seminar with CAMS. Speaker: Sunčica Čanić (Department of Mathematics, UC Berkeley and Department of Mathematics, University of Houston)
Weak solutions to fluid-mesh-shell interaction problem
We give an overview of the recent developments in the design of constructive existence proofs for nonlinear moving boundary problems involving 3D incompressible, viscous fluids and various elastic structures. A special attention will be paid to the interaction involving elastic mesh-supported shells. Real life examples of such problems are many, including the interaction between blood flow and vascular walls treated with mesh-like devices called stents. Examples of applications to vascular procedures will be shown.
September 20: Qing Zhou (Department of Statistics, UCLA)
September 27: Samuel B. Hopkins (Department of Electrical Engineering and Computer Sciences, UC Berkeley)
How to Estimate the Mean of a (Heavy-Tailed) Random Vector
We study polynomial time algorithms for estimating the mean of a multivariate random vector under very mild assumptions: we assume only that the random vector X has finite mean and covariance. This allows for X to be heavy-tailed. In this setting, the radius of confidence intervals achieved by the empirical mean are exponentially larger in the case that X is Gaussian or sub-Gaussian. That is, the empirical mean is poorly concentrated.
We offer the first polynomial time algorithm to estimate the mean of X with sub-Gaussian-size confidence intervals under such mild assumptions. That is, our estimators are exponentially better-concentrated than the empirical mean. Our algorithm is based on a new semidefinite programming relaxation of a high-dimensional median. Previous estimators which assumed only existence of finitely-many moments of X either sacrifice sub-Gaussian performance or are only known to be computable via brute-force search procedures requiring time exponential in the dimension.
Based on https://arxiv.org/abs/1809.07425 to appear in Annals of Statistics.
October 4: Elizaveta Rebrova (UCLA, Department of Mathematics)
Sketching for iterative linear solvers: strengths, weaknesses, and tools from random matrix theory.
A group of projection based approaches for solving large-scale linear systems is one of the most efficient in its simplicity. For example, well-known Kaczmarz algorithm iteratively projects the previous approximation x_k onto the solution spaces of the next equation in the system (an elegant proof of the exponential convergence of this method using correct randomization of the process was given in 2009 by Strohmer and Vershynin, and succeeded by many extensions and generalizations in the works of Needell, Tropp, Ward, Srebro, Tan etc). “Sketch-and-project” framework is an interesting unifying view on a number of iterative solvers (introduced by Gower and Richtarik in 2016). One can observe that a random selection of the next equation (or, a subset of equations) can be represented as a sketch, that is, left multiplication by a random vector (or matrix). I will give an overview of some of these methods and discuss the role that random matrix theory plays in proving their convergence. And then talk about our recent work with Deanna Needell on the block Gaussian sketch and project methods.
October 11: Simo Ndaoud (USC, Department of Mathematics)
Scaled minimax optimality in high-dimensional linear regression: A non-convex algorithmic regularization approach
Oct 18: [No talk]
October 25: Richard Arratia (USC, Department of Mathematics)
Proof of a 60-year-old conjecture of Sol Golomb, on the longest cycle for a random feedback shift register
In joint work with Rod Canfield and Al Hales, a 1959 conjecture by Sol Golomb has been proved. In simplest form, writing N=2^n, the length L(n) of the longest cycle, for a random feedback shift register of width n, with all 2^{N/2} feedback logics equally likely, has the same limit distribution for L(n)/N, as it would in the case of a random permutation of the N binary n-tuples with all N! permutations equally likely. The proof starts with a deterministic study of shotgun versus sequential editing, to avoid n-tuple repeats, and makes heavy use of the process version of Chen-Stein Poisson approximation, from Arratia, Goldstein, and Gordon 1992, applied to the indicators of leftmost (n-1)-tuple repeats in fair coin tossing sequences. This use of Chen-Stein is not to get a Poisson approximation for the probability of having no such repeats, but rather, in a regime where the expected number of such repeats is huge, to conclude that the process of all indicators of such repeats is close, in total variation distance, to the independent process with the same marginals. This work is available at https://arxiv.org/abs/1903.09183. Lots of background info is at https://writings.stephenwolfram.com/2016/05/solomon-golomb-19322016/
Nov 1: Pierre Bellec (Rutgers University)
A well-understood degrees-of-freedom adjustment appears in Stein’s Unbiased Risk Estimate (SURE) to construct an unbiased estimate of the mean square risk of almost any estimator μ͂; here the divergence of μ͂ plays the role of degrees-of-freedom or the estimator. Thanks to Stein’s formula, not only unbiased estimates can be constructed for the risk of μ͂, but also for the risk of SURE itself (SURE for SURE): a simple unbiased estimate provides information about the squared distance between SURE and the squared estimation error of μ͂, again involving degrees-of-freedom adjustments.
A second area where degrees-of-freedom appear is the construction of confidence intervals. A novel analysis reveals that degrees-of-freedom adjustments play a major role in de-biasing methodologies to construct confidence intervals in high-dimension. We will see that in sparse linear regression for the Lasso for Gaussian designs, existing de-biasing schemes need to be modified with an adjustment that accounts for the degrees-of-freedom of the Lasso. This degrees-of-freedom adjustment is necessary for statistical efficiency in the regime s>>>n^{2/3}. Here, the necessity of degrees-of-freedom adjustment is explained using the interpolation path between Gaussian vectors typically used to derive Slepian’s lemma.
Nov 8: Sangchul Lee (UCLA, Department of Mathematics)
November 15: Deanna Needell (UCLA)
Simple approaches to complicated data analysis
Recent advances in technology have led to a monumental increase in large-scale data across many platforms. One mathematical model that has gained a lot of recent attention is the use of sparsity. Sparsity captures the idea that high dimensional signals often contain a very small amount of intrinsic information. Using this notion, one may design efficient low-dimensional representations of large-scale data as well as robust reconstruction methods for those representations. Binary, or one-bit, representations of data for example, arise naturally in many applications, and are appealing in both hardware implementations and algorithm design. In this talk, we provide a brief background to sparsity and 1-bit measurements, and present new results on the problem of data classification with low computation and resource costs. We illustrate the utility of the proposed approach on recently acquired data about Lyme disease.
November 22: Steven Heilman (USC)
Independent Sets in Random Graphs and Random Trees
An independent set of size k in a finite undirected graph is a set of k vertices of the graph, no two of which are connected by an edge. The structure of independent sets of size k as k varies is of interest in probability, statistical physics, combinatorics, and computer science. In 1987, Alavi, Malde, Schwenk and Erdos conjectured that the number of independent sets of size k in a tree is a unimodal sequence (this number goes up and then it goes down), and this problem is still open. A variation on this question is: what do the number of independent sets of size k look like in an Erdos-Renyi random graph, or random tree, when k varies? By adapting an argument of Coja-Oghlan and Efthymiou, we show unimodality for Erdos-Renyi random graphs (with high probability as the number of vertices in the graph goes to infinity, when the expected degree of a single vertex is large) Time permitting, we will discuss a completely different problem: finding the maximum Gaussian perimeter of a convex set in the plane.
Nov 29: [No talk]
Spring 2019:
January 11: Vlad Vicol (Princeton University) [joint with CAMS Colloquium]
Convex integration on thin sets
I will discuss the construction of wild weak solutions to the Navier-Stokes equation which are smooth on the complement of a thin set of times (with Hausdorff dimension strictly less than 1). This is based on joint work with T. Buckmaster and M. Colombo.
January 18: Omer Tamuz (Caltech Mathematics)
The Poisson boundary and the infinite conjugacy class property
Joint work with Joshua Frisch, Yair Hartman and Pooya Vahidi Ferdowsi.
January 25: Farruh Shahidi (Penn State)
Thermodynamics via inducing
We consider continuous maps f: X to X on compact metric spaces admitting inducing schemes of hyperbolic type as well as the induced maps \tilde{f} : \tilde{X}\to\tilde{X} and the associated tower maps \hat{f}:\hat{X}\to\hat{X}. For a certain class of potential functions on X, which includes all Holder continuous functions, we establish thermodynamic formalism for each of the above three systems and we describe some relations between the corresponding equilibrium measures. Furthermore we study ergodic properties of these equilibrium measures including the Bernoulli property, decay of correlations, and the Central Limit Theorem (CLT). Finally, we prove analyticity of the pressure function for the three systems.
Feb 1: Jun Yin (UCLA)
Anisotropic law and eigenvector empirical spectral distribution
Feb 8: Georg Menz (UCLA)
Ergodicity of the infinite swapping algorithm at low temperature
About a joint work with André Schlichting, Wenpin Tang, and Tianqi Wu. Sampling Gibbs measures at low temperature is a very important task but computationally very challenging. Numeric evidence suggest that the infinite-swapping algorithm (isa) is a promising method. The isa can be seen as an improvement of replica methods which are very popular. We rigorously analyze the ergodic properties of the isa in the low temperature regime deducing Eyring-Kramers formulas for the spectral gap (or Poincaré constant) and the log- Sobolev constant. Our main result shows that the effective energy barrier can be reduced drastically using the isa compared to the classical over-damped Langevin dynamics. As a corollary we derive a deviation inequality showing that sampling is also improved by an exponential factor. Furthermore, we analyze simulated annealing for the isa and show that isa is again superior to the over-damped Langevin dynamics.
Feb 15: Qi Feng (USC)
Stochastic Analysis on Totally Geodesic Riemannian Foliations
In this talk, we will present stochastic analysis on totally geodesic Riemannian foliations equipped with sub-Riemannian structure. We first show that the horizontal Wiener measure is quasi-invariant under the action of flows generated by suitable tangent processes. On the path space of totally geodesic foliations, we derive various Driver’s IBP(integration by parts) formulas, as well as concentration property derived from the improved Log-Sobolev inequalities on the path space. I will use Heisenberg group as a example to show the idea. In the end, I shall comment on the connections between stochastic analysis and Ricci flow.
Feb 22: Todd Kemp (UCSD)
Eigenvalues of Brownian Motion on GL(N), in the Large-N Limit
Mar 1: Nicolai Haydn (USC)
Local escape rates in dynamics
We consider a dynamical system with an invariant measure and the probability distribution of the time it takes to exit through a hole. This distribution is typically exponential where the rate depends on the measure of the hole. This rate normalised by the measure of the hole yields in the limit the localised escape rate which, for $\phi$-mixing measures, we show to be equal to 1 at non-periodic points and equal to the extremal index at periodic points. We also give some examples including equilibrium states where we obtain the `Pitskel value’ at periodic points.
Mar 8: Daniel Lacker (Columbia) (joint with Math Finance Colloquium)
Beyond mean field limits: Local dynamics for large sparse networks of interacting processes
We study large systems of stochastic processes (particles), in discrete or continuous time, in which each particle is associated with a vertex in a graph and interacts only with its neighbors. It is known that when the graph is complete and the numbers of particles grows to infinity, the system is well-described by a nonlinear (McKean-Vlasov) process, which describes the behavior of one typical particle. For general (sparse) graphs, however, the system is no longer exchangeable, and the mean field approximation fails. Nevertheless, if the underlying graph is locally tree-like, we show that a single particle and its nearest neighbors are characterized by an autonomous evolution we call the “local dynamics.” These local dynamics are peculiar in that they are inevitably non-Markovian even if the original system is Markovian; the dynamics depend on the conditional law of the solution given the histories of certain particles. Joint work with Kavita Ramanan and Ruoyu Wu.
Mar 15: no seminar (spring break)
Mar 22: Krishnakumar Balasubramanian (UC Davis)
Mar 29: Daniel Kane (UCSD)
Computationally Efficient Robust Statistics
Robust statistics studies what can be learned from data even when some of it may have been corrupted. For example, the median is a robust estimator of the mean of a Gaussian. While a few wrong samples can throw off the mean significantly, their effect on the median is limited. However, when trying to generalize this simple estimator to many dimensions, things become substantially harder. Surprisingly, until recently all known techniques either produced much larger error than information-theoretically attainable, or required estimators that were computationally intractable to find. Only in the last few years were algorithms discovered that overcame both of these difficulties. We discuss these new algorithms and some of the related theory that has developed over the past few years.
Apr 5: Hanbaek Lyu (UCLA)
Phase transition in random contingency tables with non-uniform margins
Contingency tables are matrices with nonnegative integer entries with fixed row and column margins. Understanding the structure of uniformly chosen contingency table with given margins is an important problem especially in statistics. For parameters $n,\delta,B,$ and $C$, let $X=(X_{k\ell})$ be the random uniform contingency table whose first $\lfloor n^{\delta} \rfloor $ rows and columns have margin $\lfloor BCn \rfloor$ and the other $n$ rows and columns have margin $\lfloor Cn \rfloor$. For any $0<\delta<1$, we establish a sharp phase transition of the limiting distribution of each entries of $X$ at the critical value $B_{c}=1+\sqrt{1+1/C}$. One of our main result shows that, for $1/2<\delta<1$, all entries have uniformly bounded expectation for $B<B_{c}$, but the mass concentrates at the smallest block and grows in the order of $n^{1-\delta}$ for $B>B_{c}$. We also establish a strong law of large numbers for the row sums within blocks. Joint work with Igor Pak and Sam Dittmer.
Apr 12: [no speaker]
Apr 19: Bruce Driver (UCSD)
Global Existence of RDEs on Manifolds
In this talk we will discuss a theorem guaranteeing the existence of global (in time) solutions to rough path differential equations (RDEs) on a smooth manifold. These results will depend on quantitative estimates of the quality of the truncated Baker-Cambel-Hausdorff-Dynkin formula for vector fields on the manifold.
Apr 26: Ilias Diakonikolas (USC CS)
The Degree-d Chow Parameters Problem
The Degree-d Chow Parameters of a Boolean function are its degree at most d Fourier coefficients.A degree-d Polynomial Threshold Function (PTF) is a Boolean function of the form f(x) = sgn(p(x)), where p is a degree at most d real polynomial. A classical result, dating back to the 1960s, shows that degree-d PTFs are uniquely specified by their degree-d Chow parameters. The algorithmic problem of reconstructing degree-d PTFs from their degree-d Chow parameters is of fundamental interest in several scientific disciplines, including learning theory, computational complexity, and game theory. In this talk, we will provide an overview of the recent progress on this problem.
Based on joint works with Daniel Kane (UCSD) and Chrystalla Pavlou (Edinburgh).
Fall 2018:
August 31: Johannes Muhle-Karbe (Carnegie Mellon University)
Equilibrium Asset Pricing with Transaction Costs
September 7: Roman Vershynin (UC Irvine)
From number theory to machine learning: hunting for smooth Boolean functions
September 14: Xin Tong (USC Marshall School of Business)
Neyman-Pearson classification
In many binary classification applications, such as disease diagnosis and spam detection, practitioners commonly face the need to limit type I error (that is, the conditional probability of misclassifying a class 0 observation as class 1) so that it remains below a desired threshold. To address this need, the Neyman-Pearson (NP) classification paradigm is a natural choice; it minimizes type II error (that is, the conditional probability of misclassifying a class 1 observation as class 0) while enforcing an upper bound, alpha, on the type I error. Although the NP paradigm has a century-long history in hypothesis testing, it has not been well recognized and implemented in classification schemes. Common practices that directly limit the empirical type I error to no more than alpha do not satisfy the type I error control objective because the resulting classifiers are still likely to have type I errors much larger than alpha. This talk introduces the speaker’s work on NP classification algorithms and their applications and raises current challenges under the NP paradigm.
September 21: Wei-Min Shen (USC Information Sciences Institute)
Self-Reconfigurable “SuperBot” Robots and Surprise-Based Learning (SBL)
Self-reconfigurable robots/UAVs would autonomously adapt their logical or physical configurations and locomotion/manipulation capabilities (such as shape, size, and function) to the environments and tasks in hand. With modularity, versatility, self-healing-ability, and low cost for mass production, such systems could provide powerful solutions for complex tasks in unstructured, uncertain, and dynamic environments. This talk first describes the major challenges for such research, including dynamic network topology, limited individual resource, lack of reliable global communication and synchronization, constraints for centralized control. We then present a particular robotic system called “SuperBot” for self-assembly, self-healing, and self-reconfiguration, as well as coordinated precision locomotion and manipulation. Individual SuperBot units are modeled “robotic stem cells” and use a biologically inspired mechanism call “digital hormones” to physically connect and disconnect with each other and collaborate via content-based messages without any globally unique identifiers. Featured demonstration movies will show their modular designs, distributed control, topology-triggered discovery and behaviors, and their potential applications. A non-parametric machine learning technology called “Surprise-Based Learning (SBL)” has been developed over 30 years to enable individual or global systems to autonomously learn from their own experiences in both deterministic and uncertain environments. Based on logic and probability, SBL could detect and rectify model deficiency or anomalies, build and refine predictive models (such as POMDP), and recover from local failures. We will discuss some details of SBL from theoretical and practical aspects, including developmental psychology, cognitive architecture, system reliability, detectability and learnability of surprises, model convergence, correctness, and scalability, with collected simulation and physical experimental results.
September 28:
Zachary Feinstein (Washington University Saint Louis)
Moving Scalarizations for Time Consistency in Dynamic Multivariate Problems in Finance
In this talk we will consider the time consistency problem for dynamic multivariate risk measures. Such risk measures arise in markets with transaction costs and, separately, when studying systemic risk. We will introduce the family of all market-consistent scalar multivariate risk measures. Duality of such risk measures will be presented. These results will be used to provide a recursive formulation of multivariate scalar risk measures. We relate this recursive formulation to a notion of a “moving scalarization”, i.e. where the consistent risk-weights applied to the assets (in markets with transaction costs) or banks (in systemic risk) must change dynamically in time.
October 5: Jianfeng Zhang (USC)
Viscosity Solutions to Parabolic Master Equations
The master equation is a type of PDE whose state variable involves the distribution of certain underlying state process, and thus is by nature an infinite dimensional problem. It is a powerful tool for studying limit behavior of large interacting systems, including mean field games and systemic risk. It also appears naturally in stochastic control problems with partial information and in time inconsistent problems. In this talk we propose a novel notion of viscosity solution for parabolic master equations, arising mainly from control problems, and investigate its wellposedness. Our main innovation is to restrict the involved measures to certain set of semimartingale measures which satisfies the desired compactness.
October 12: no seminar
October 19: Matthew Junge (Department of Mathematics, Duke University)
Diffusion–Limited Annihilating Systems
October 26: Steven Heilman (USC)
Symmetric Sets with Minimal Gaussian Surface Area
It is well known that a Euclidean set with fixed Gaussian volume and minimal Gaussian surface area is a half-space. On the other hand, finding the symmetric Euclidean set of fixed Gaussian volume and minimal Gaussian surface area has remained elusive since at least a work of Barthe from 2001. We show that if this set or its complement is convex and if a certain integral of its curvature is not close to 1, then the set must be a round cylinder. That is, with a small range of exceptions, we resolve the convex case of Barthe’s conjecture. Time permitting, we will discuss applications to theoretical computer science and to finding Euclidean partitions of fixed Gaussian volume and minimal Gaussian surface area.
November 2: Aaditya Ramdas (Department of Statistics and Data Science, Carnegie Mellon University)
Exponential line-crossing inequalities
This paper develops a class of exponential bounds for the probability that a martingale sequence crosses a time-dependent linear threshold. Our key insight is that it is both natural and fruitful to formulate exponential concentration inequalities in this way. We illustrate this point by presenting a single assumption and a single theorem that together strengthen many tail bounds for martingales, including classical inequalities (1960-80) by Bernstein, Bennett, Hoeffding, and Freedman; contemporary inequalities (1980-2000) by Shorack and Wellner, Pinelis, Blackwell, van de Geer, and de la Pena; and several modern inequalities (post-2000) by Khan, Tropp, Bercu and Touati, Delyon, and others. In each of these cases, we give the strongest and most general statements to date, quantifying the time-uniform concentration of scalar, matrix, and Banach-space-valued martingales, under a variety of nonparametric assumptions in discrete and continuous time. In doing so, we bridge the gap between existing line-crossing inequalities, the sequential probability ratio test, the Cramer-Chernoff method, self-normalized processes, and other parts of the literature.
(joint work with Steve Howard, Jas Sekhon and Jon McAuliffe, preprint https://arxiv.org/abs/1808.03204)
November 9: Joe Neeman (Department of Mathematics, UT-Austin)
November 16: no seminar (USC mathematical finance conference)
Monday November 19 [Note the nonstandard day]: Cesar Cuenca (Department of Mathematics, MIT)
November 23: no seminar (Thanksgiving)
Wednesday November 28 [Math colloquium]: Tim Austin (Department of Mathematics, UCLA)
The Weak Pinsker Property
This talk is about the structure theory of measure-preserving systems: transformations of a finite measure space that preserve the measure. Many important examples arise from stationary processes in probability, and simplest among these are the i.i.d. processes. In ergodic theory, i.i.d. processes are called Bernoulli shifts. Some of the main results of ergodic theory concern an invariant of systems called their entropy, which turns out to be intimately related to the existence of maps from a general system to Bernoulli shifts.
I will give an overview of this area and its history, ending with a recent advance in this direction. A measure-preserving system has the weak Pinsker property if it can be split, in a natural sense, into a direct product of a Bernoulli shift and a system of arbitrarily low entropy. The recent result is that all ergodic measure-preserving systems have this property.
This talk will assume some familiarity with measure theory and the basic language of random variables and their distributions. Past exposure to entropy or ergodic theory will not be essential.
Spring 2018:
January 12: Pierre-François Rodriguez (UCLA, Department of Mathematics)
On random walks and local limits in dimension 2
We will discuss recent developments relating Poissonian “loop soups” à la Lawler-Werner, Le Jan and Sznitman, to study random walks on large two-dimensional tori. We will show how the underlying correspondence can be used to effectively describe certain critical phenomena in such systems, which occur when the walk is run up to suitable timescales while forced to avoid a given point.
January 19: Andrew Stuart (Caltech)
Large Graph Limits of Learning Algorithms
Many problems in machine learning require the classification of high dimensional data. One methodology to approach such problems is to construct a graph whose vertices are identified with data points, with edges weighted according to some measure of affinity between the data points. Algorithms such as spectral clustering, probit classification and the Bayesian level set method can all be applied in this setting. The goal of the talk is to describe these algorithms for classification, and analyze them in the limit of large data sets. Doing so leads to interesting problems in the calculus of variations, in stochastic partial differential equations and in Monte Carlo Markov Chain, all of which will be highlighted in the talk. These limiting problems give insight into the structure of the classification problem, and algorithms for it.
The talk is based on collaboration with Matt Dunlop (Caltech), Dejan Slepcev (CMU) and Matt Thorpe (Cambridge).
February 2: Jacob Bien (USC, Marshall School of Business)
Rare Feature Selection in High Dimensions
February 16: Timothy Cannings (USC, Marshall School of Business)
Local nearest neighbour classification with applications to semi-supervised learning.
We derive a new asymptotic expansion for the global excess risk of a local $k$-nearest neighbour classifier, where the choice of $k$ may depend upon the test point. We prove that, provided the $d$-dimensional marginal distribution of the features has a finite $\rho$th moment for some $\rho > 4$ (as well as other regularity conditions), a local choice of $k$ can yield a rate of convergence of the excess risk of $O(n^{-4/(d+4)})$, where $n$ is the sample size, whereas for the standard $k$-nearest neighbour classifier, our theory would require $d \geq 5$ and $\rho > 4d/(d-4)$ finite moments to achieve this rate. Our results motivate a new $k$-nearest neighbour classifier for semi-supervised learning problems, where the unlabelled data are used to obtain an estimate of the marginal feature density, and fewer neighbours are used for classification when this density estimate is small. The potential improvements over the standard $k$-nearest neighbour classifier are illustrated both through our theory and via a simulation study.
February 23: Wen Sun (USC, Marshall School of Business)
A General Framework for Information Pooling in Large-Scale Multiple Testing
This talk discusses a general framework for exploiting the sparsity information in two-sample multiple testing problems. We propose to first construct a covariate sequence, in addition to the usual primary test statistics, to capture the sparsity structure, and then incorporate the auxiliary covariates in inference via a three-step algorithm consisting of grouping, adjusting and pooling (GAP). The GAP procedure provides a simple and effective framework for information pooling. An important advantage of GAP is its capability of handling various dependence structures such as those arise from multiple testing for high-dimensional linear regression, differential correlation analysis, and differential network analysis. We establish general conditions under which GAP is asymptotically valid for false discovery rate control, and show that these conditions are fulfilled in a range of applications. Numerical results demonstrate that existing methods can be much improved by the proposed framework. An application to a breast cancer study for identifying gene-gene interactions will be discussed if time permits.
March 2: Akihiko Nishimura (UCLA)
Discontinuous Hamiltonian Monte Carlo for models with discrete parameters and discontinuous likelihoods
March 23: Joseph Salmon (TELECOM ParisTech )
Generalized Concomitant Multi-Task Lasso for sparse multimodal regression
For standard Lasso theory to hold though, the regularization parameter should be proportional to the noise level, which is generally unknown in practice. A remedy is to consider estimators, such as the Concomitant Lasso, which jointly optimize over the regression coefficients and the noise level. However, when data from different sources are pooled to increase sample size, or when dealing with multimodal data, noise levels differ and new dedicated estimators are needed. We provide new statistical and computational solutions to perform heteroscedastic regression, with an emphasis on functional brain imaging with magneto- and electroencephalographic (M/EEG) signals. This joint work with M. Massias, O. Fercoq and A. Gramfort is to appear to AISTAT 2018. Pdf <https://arxiv.org/abs/1705.09778>. Python code <https://github.com/mathurinm/SHCL>.
March 30: Yaming Yu (UC Irvine, Department of Statistics)
Successive sampling, monotonicity, and Bayesian bandits problems
Stochastic orders appear naturally in many problems in statistics and applied probability and can be used to derive useful inequalities. We discuss monotonicity in classical limit theorems, structural results in Bayesian bandit problems, and comparisons between successive sampling and conditional Poisson sampling in sample surveys.
April 6: Leila Setayeshgar (Providence College)
Large Deviations for a Class of Stochastic Semilinear Partial Differential Equations
Standard approaches to large deviations analysis for stochastic partial differential equa- tions (SPDEs) are often based on approximations. These approximations are mostly technical and often onerous to carry out. In 2008, Budhiraja, Dupuis and Maroulas, employed the weak convergence approach and showed that these approximations can be avoided for many infinite dimensional models. Large deviations analysis for such systems instead relied on demonstrating existence, uniqueness and tightness properties of certain perturbations of the original process. In this talk, we use the weak convergence approach, and establish the large deviation principle for the law of the solutions to a class of semilinear SPDEs. Our family of semilinear SPDEs contains, as special cases, both the stochastic Burgers’ equation, and the stochastic reaction-diffusion equation.
April 13: Henry Schellhorn (Claremont Graduate University, Department of Mathematics)
Density formula for functionals of compound Poisson processes using Malliavin calculus
We extend the work of Nourdin and Viens (Electronic Journal of Probability 2009) to obtain a new exact formula for the density of the law of a random variable Z, which is measurable and di§erentiable with respect to a compound Poisson process. The main restriction is that the Levy measure must charge the whole real line.
This talk is work in progress, and not all results have been exploited. For this reason time will probably allow for me to describe another project.
April 20: Guido Montufar (UCLA, Department of Mathematics)
Mixtures and Products in Two Graphical Models
This talk is about two graphical models with hidden variables. One is a mixture model and the other is a product of mixtures called restricted Boltzmann machine. We derive relations between theoretical properties of restricted Boltzmann machines and several natural notions from discrete mathematics and convex geometry. We take a closer look at the first non-trivial case, with three observed binary variables. Although the mixture and the product of mixtures look different from their parametrizations, we show that in this case they represent the same set of distributions on the interior of the probability simplex, and are equal up to closure. We give a semi-algebraic description of this model in terms of six binomial inequalities and obtain closed form expressions for the maximum likelihood estimates.
This talk is based on joint work with Jason Morton and Anna Seigal.
April 27: Leonard Wong (USC, Department of Mathematics)
Information geometry and optimal transport
Information geometry studies spaces of probability distributions using differential geometry. On the other hand, optimal transport is about finding efficient methods of transporting one distribution to another. Both fields apply geometric ideas in probability and have found numerous applications in statistics and machine learning. In this talk we explain some new connections between the two fields. We study a family of logarithmic divergences (distance-like quantities) which generalizes the Bregman divergence (of which the relative entropy is a prime example). These divergences have a dual structure in terms of an optimal transport map, satisfy a generalized Pythagorean theorem, and correspond to natural extensions of exponential family. Geometrically, they characterize (locally) statistical manifolds with constant curvatures.