Probability and Statistics Seminar 2024-2025

NEW [May 2021]: Our YOUTUBE CHANNEL contains videos of some recent talks.

Fridays 3:30pm in KAP 414; tea is usually provided at 3:00pm

NEW [Aug 2023]: To sign up for our MAILING LIST, email: sympa@mymaillists.usc.edu with subject: Subscribe probstat-l Firstname Lastname [besides the subject line, the email itself should be empty]

Organizers: Xiaohui Chen <xiaohuic@usc.edu> Evgeni Dimitrov <edimitro@usc.edu> Steven Heilman <stevenmheilman@gmail.com> Stanislav Minsker <minsker@usc.edu> Yizhe Zhu <yizhezhu@usc.edu>

For seminars/colloquia on other topics, see the Department of Mathematics webpage.

Older seminars: Spring 2018-Fall 2024, Fall 2018, Spring 2018, Fall 2017, Spring 2017, Fall 2016, Spring 2016, Fall 2015, Spring 2015, Fall 2014, 2013-2014

Spring 2025:

January 17: Bradley Rava (University of Sydney’s Business School)

Ask for More Than Bayes Optimal: A Theory of Indecisions for Classification

Selective classification frameworks are useful tools for automated decision making in highly risky scenarios, since they allow for a classifier to only make highly confident decisions, while abstaining from making a decision when it is not confident enough to do so, which is otherwise known as an indecision. For a given level of classification accuracy, we aim to make as many decisions as possible. For many problems, this can be achieved without abstaining from making decisions. But when the problem is hard enough, we show that we can still control the misclassification rate of a classifier up to any user specified level, while only abstaining from the minimum necessary amount of decisions, even if this level of misclassification is smaller than the Bayes optimal error rate. In many problem settings, the user could obtain a dramatic decrease in misclassification while only paying a comparatively small price in terms of indecisions.

January 31: Paata Ivanisvili (UCI)

Jackson’s Inequality on the hypercube

I will talk about the uniform polynomial approximation problem on the hypercube of dimension n. I will present two results, first indicating that there is a threshold power n/2, i.e., polynomials of degree at most 0.4999n will not always approximate well enough functions of constant sensitivity. The second result, on the opposite side, gives quantitative estimates on the error of approximation when degree is close to n. There will be two applications presented: one showing that sensitivity theorem does not hold for bounded real valued functions when degree is replaced by approximate degree. The second application will be a counterexample to reverse Markov–Bernstein inequality for functions in L1 tail space having frequencies at least 0.4999n. This is joint work with Roman Vershynin and Xinyuan Xie.

February 7: Nikita Gladkov (UCLA)

Inequalities for Connectedness Events in Bernoulli Percolation

Edge percolation on a graph is a random process that divides the edges into two groups: open and closed. Events such as “vertices v and w are connected via a path of open edges” occur within this process. We investigate the dependencies between these events and derive inequalities concerning their probabilities.

February 14 : Po-Ling Loh (University of Cambridge)

Differentially private M-estimation via noisy optimization

We present a noisy composite gradient descent algorithm for differentially private statistical estimation in high dimensions. We begin by providing general rates of convergence for the parameter error of successive iterates under assumptions of local restricted strong convexity and local restricted smoothness. Our analysis is local, in that it ensures a linear rate of convergence when the initial iterate lies within a constant-radius region of the true parameter. At each iterate, multivariate Gaussian noise is added to the gradient in order to guarantee that the output satisfies Gaussian differential privacy. We then derive consequences of our theory for linear regression and mean estimation. Motivated by M-estimators used in robust statistics, we study loss functions which downweight the contribution of individual data points in such a way that the sensitivity of function gradients is guaranteed to be bounded, even without the usual assumption that our data lie in a bounded domain. We prove that the objective functions thus obtained indeed satisfy the restricted convexity and restricted smoothness conditions required for our general theory. We then show how the private estimators obtained by noisy composite gradient descent may be used to obtain differentially private confidence intervals for regression coefficients, by leveraging work in Lasso debiasing proposed in high-dimensional statistics. We complement our theoretical results with simulations that illustrate the favorable finite-sample performance of our methods.

This is based on joint work with Marco Avella-Medina, Casey Bradshaw, and Zheng Liu.

February 21: Lijun Ding (UCSD)

Flat minima generalize for low-rank matrix recovery

Empirical evidence suggests that for a variety of overparameterized nonlinear models, most notably in neural network training, the growth of the loss around a minimizer strongly impacts its performance. Flat minima — those around which the loss grows slowly — appear to generalize well. This work takes a step towards understanding this phenomenon by focusing on the simplest class of overparameterized nonlinear models: those arising in low-rank matrix recovery. We analyze overparameterized matrix and bilinear sensing, robust PCA, covariance matrix estimation, and single hidden layer neural networks with quadratic activation functions. In all cases, we show that flat minima, measured by the trace of the Hessian, exactly recover the ground truth under standard statistical assumptions. For matrix completion, we establish weak recovery, although empirical evidence suggests exact recovery holds here as well.

February 28: Nils Detering (Heinrich-Heine University)

joint with Math Finance Colloquium

A class of point-wise operating SPDE coefficients for HJM models

We present a dynamic model for forward curves within the Heath-Jarrow-Morton framework under the Musiela parametrization. The forward curves take values in a function space $\h$, and their dynamics follows a stochastic partial differential equation with state-dependent coefficients. In particular, the coefficients are defined through point-wise operating maps on $\h$, resulting in a locally state-dependent structure. We first explore conditions under which these point-wise operators are well defined on $\h$. Next, we determine conditions to ensure that the resulting coefficient functions satisfy local growth and Lipschitz properties, so to guarantee the existence and uniqueness of mild solutions. The proposed model captures the behavior of the entire forward curve through a single equation, yet retains remarkable simplicity. Notably, we demonstrate that certain one-dimensional projections of the model are Markovian and satisfy a one-dimensional stochastic differential equation. This connects our Hilbert-space approach to well established models for forward contracts with fixed delivery times, for which existing formulas and numerical techniques can be applied. This link allows us to examine also conditions for maintaining positivity of the solutions. As concrete examples, we analyze Hilbert-space valued variants of an exponential model and of a constant elasticity of variance model.

March 7: Pedro Abdalla Teixeira (UCI)

Covariance Estimation through Empirical Process Theory

In this talk, we revisit the classical problem of covariance estimation from the perspective of empirical process theory. In particular, we explore the relationship between various covariance estimation problems and their associated empirical processes. This talk is based on a series of works by the author in collaboration with Nikita Zhivotovskiy and Shahar Mendelson.

March 28: Nikita Zhivotovskiy (UC Berkeley)

Sharper Risk Bounds for Statistical Aggregation

In this talk, we revisit classical results in the theory of statistical aggregation, focusing on the transition from global complexity to a more manageable local one. The goal of aggregation is to combine several base predictors to achieve a prediction nearly as accurate as the best one, without assumptions on the class structure or target. Though studied in both sequential and statistical settings, they traditionally use the same “global” complexity measure. We highlight the lesser-known PAC-Bayes localization enabling us to prove a localized bound for the exponential weights estimator by Leung and Barron, and a deviation-optimal localized bound for Q-aggregation. Finally, we demonstrate that our improvements allow us to obtain bounds based on the number of near-optimal functions in the class, and achieve polynomial improvements in sample size in certain nonparametric situations. This is contrary to the common belief that localization doesn’t benefit nonparametric classes. Joint work with Jaouad Mourtada and Tomas Vaškevičius.

April 4: Bohan Zhou (UCSB)

The Generalized Wasserstein Barycenter Problem

The Wasserstein barycenter problem, introduced by Agueh and Carlier, has gained widespread interest in machine learning, statistics, computer graphics, and engineering. Given the prominence of the Wasserstein distance, computing its barycenter is often a foundational step in understanding the properties of probability spaces equipped with this metric. In this talk, we explore the generalized Wasserstein barycenter, allowing for negative weights. This extension naturally arises when transitioning from interpolation to extrapolation, and it has immediate implications for high-order schemes in gradient flow (by Han, Esedoglu, and Garikipati), adversarial multiclass classification (by Garcia Trillos, Jacobs, and Kim), and Wasserstein Regression (by Chen, Lin and Muller) We will discuss the theoretical results (existence, optimality condition and uniqueness) of the problem, analyze special cases where all weights are positive except one or all are negative except one, and introduce new numerical algorithms applicable to both generalized and classical settings, concluding with experimental results.

April 25: Mo Zhou (UCLA)

Fall 2025:

September 19: Mihai Cucuringu (UCLA)

Fall 2024:

August 30: Yiyun He (UC Irvine)

Differentially Private Algorithms for Synthetic Data

We present a highly effective algorithmic approach, PMM, for generating differentially private synthetic data in a bounded metric space with near-optimal utility guarantees under the 1-Wasserstein distance. In particular, for a dataset in the hypercube [0,1]^d, our algorithm generates synthetic dataset such that the expected 1-Wasserstein distance between the empirical measure of true and synthetic dataset is O(n^{-1/d}) for d>1. Our accuracy guarantee is optimal up to a constant factor for d>1, and up to a logarithmic factor for d=1. Also, PMM is time-efficient with a fast running time of O(\epsilon d n). Derived from the PMM algorithm, more variations of synthetic data publishing problems can be studied under different settings.

September 6: Omer Tamuz (Caltech)

Asymptotic Renyi Entropies of Random Walks on Groups

We introduce asymptotic Renyi entropies as a parameterized family of invariants for random walks on groups. These invariants interpolate between various well-studied properties of the random walk, including the growth rate of the group, the Shannon entropy, and the spectral radius. They furthermore offer large deviation counterparts of the Shannon-McMillan-Breiman Theorem. We prove some basic properties of asymptotic Renyi entropies that apply to all groups, and discuss their analyticity and positivity for the free group and lamplighter groups.

Joint with Kimberly Golubeva and Minghao Pan

September 20: Weixin Yao (UC Riverside)

New Regression Model: Modal Regression

Built on the ideas of mean and quantile, mean regression and quantile regression are extensively investigated and popularly used to model the relationship between a dependent variable Y and covariates x. However, the research about the regression model built on the mode is rather limited. In this talk, we propose a new regression tool, named modal regression, that aims to find the most probable conditional value (mode) of a dependent variable Y given covariates x rather than the mean that is used by the traditional mean regression. The modal regression can reveal new interesting data structure that is possibly missed by the conditional mean or quantiles. In addition, modal regression is resistant to outliers and heavy-tailed data, and can provide shorter prediction intervals when the data are skewed. Furthermore, unlike traditional mean regression, the modal regression can be directly applied to the truncated data. Modal regression could be a potentially very useful regression tool that can complement the traditional mean and quantile regressions.

October 4: Richard Y. Zhang (UIUC)

Rank Overparameterization and Global Optimality Certification for Large-Scale Low-rank Optimization

Numerous important problems across applied statistics reduce into nonconvex estimation / optimization over a low-rank matrix. In principle, these can be reliably solved to global optimality via convex relaxation, but the computational costs can become prohibitive on a large scale. In practice, it is much more common to optimize over the low-rank matrices directly, as in the Burer-Monteiro approach, but their nonconvexity can cause failure by getting stuck at a spurious local minimum. For safety-critical societal applications, such as the operation and planning of an electricity grid, our inability to reliably achieve global optimality can have significant real-world consequences.

In the first part of this talk, we investigate how overparameterizing the low-rank factorization can render its nonconvexity increasingly benign. In the smooth and strongly-convex setting, we rigorously show that, as the rank is increased, spurious local minima become increasingly rare in a step-wise fashion. In other words, rank-2 has fewer spurious local minima than rank-1, and rank-3 has fewer than rank-2, etc. Once the rank exceeds an O(1) threshold, every remaining local minimum is a global minimum, and every saddle point can be escaped. In the second part of this talk, we use the rank deficiency brought on by rank overparameterization to certify convergence to global optimality after the fact. The certification is an a posteriori guarantee that is valid under much weaker assumptions than typical “no spurious local minima” guarantees. However, rank deficiency significantly slows down the convergence of gradient descent, from a linear rate to a sublinear rate. We propose an inexpensive preconditioner that restores the convergence rate of gradient descent back to linear in the overparameterized case.

Main related papers:
https://arxiv.org/abs/2207.01789
https://arxiv.org/abs/2206.03345 (joint work with Gavin Zhang and Salar Fattahi)

October 11: [no classes]

October 18: Camilo Hernández (USC ISE)

The mean field Schrödinger problem: a mean field control perspective

The mean field Schrödinger problem (MFSP) is the problem of finding the most likely path of a McKean-Vlasov type particle with constrained initial and final configurations. It was first introduced by Backhoff et al. (2020), who studied its existence and long-time behavior. This talk aims to show how ideas from mean field control theory allow us to derive new interesting results on the MFSP. In particular, we study its existence, characterization, and the so-called convergence problem. The method rests upon studying suitably penalized problems and stochastic control techniques. This talk is based on a joint work with Ludovic Tangpi (Princeton).

October 25: Yizhe Zhu (USC)

Non-convex matrix sensing: Breaking the quadratic rank barrier in the sample complexity

For the problem of reconstructing a low-rank matrix from a few linear measurements, two classes of algorithms have been widely studied in the literature: convex approaches based on nuclear norm minimization, and non-convex approaches that use factorized gradient descent. Under certain statistical model assumptions, it is known that nuclear norm minimization recovers the ground truth as soon as the number of samples scales linearly with the number of degrees of freedom of the ground truth. In contrast, while non-convex approaches are computationally less expensive, existing recovery guarantees assume that the number of samples scales at least quadratically with the rank. In this talk, we consider the problem of reconstructing a positive semidefinite matrix from a few Gaussian measurements. We improve the previous rank-dependence in the sample complexity of non-convex matrix factorization from quadratic to linear. Our proof relies on a probabilistic decoupling argument, where we show that the gradient descent iterates are only weakly dependent on the individual entries of the measurement matrices. Joint work with Dominik Stöger (KU Eichstätt-Ingolstadt).

November 1: Wojciech Ozanski (Florida State University)
[Joint with PDE seminar]

Instantaneous continuous loss of regularity for the SQG equation

The issue of loss of regularity of unique solutions to the 3D incompressible Euler equations is an important open question of fluid mechanics, and is closely related to the emergence of turbulence. We will discuss recent results regarding loss of regularity of solutions of the 2D and 3D Euler equations, and of the surface quasi-geostrophic equations (SQG), which is a well-established 2D model equation of the 3D Euler equations. We will discuss a result of continuous-in-time loss of Sobolev regularity of solutions to the SQG equation. Namely, given $s\in (3/2,2)$ and $\varepsilon >0$, we will describe a construction of a compactly supported initial data $\theta_0$ such that $\| \theta_0 \|_{H^s}\leq \varepsilon$ and there exist $T>0$, $c>0$ and a local-in-time solution $\theta$ of the SQG equation such that $ \theta (\cdot ,t )$ belongs to ${H^{s/(1+ct)}}$ and does not belong to any other ${H^\beta }$, where $\beta > s/(1+ct)$. Moreover $\theta$ is continuous and differentiable on $\R^2\times [0,T]$, and is unique among all solutions with initial condition $\theta_0$ which belong to $C([0,T];H^{1+\alpha })$ for any $\alpha >0$.

This is the first result of this kind in incompressible fluid mechanics. It is also the first ill-posedness result in the supercritical regime which has compact support in space.

November 8: Johannes Wiesel (CMU)
[Joint with Mathematical Finance seminar]

Bounding adapted Wasserstein metrics

The Wasserstein distance $\mathcal{W}_p$ is an important instance of an optimal transport cost. Its numerous mathematical properties as well as applications to various fields such as mathematical finance and statistics have been well studied in recent years. The adapted Wasserstein distance $\mathcal{A}\mathcal{W}_p$ extends this theory to laws of discrete time stochastic processes in their natural filtrations, making it particularly well suited for analyzing time-dependent stochastic optimization problems.

While the topological differences between $\mathcal{A}\mathcal{W}_p$ and $\mathcal{W}_p$ are well understood, their differences as metrics remain largely unexplored beyond the trivial bound $\mathcal{W}_p\lesssim \mathcal{A}\mathcal{W}_p$. This paper closes this gap by providing upper bounds of $\mathcal{A}\mathcal{W}_p$ in terms of $\mathcal{W}_p$ through investigation of the smooth adapted Wasserstein distance. Our upper bounds are explicit and are given by a sum of $\mathcal{W}_p$, Eder’s modulus of continuity and a term characterizing the tail behavior of measures. As a consequence, upper bounds on $\mathcal{W}_p$ automatically hold for $\mathcal{AW}_p$ under mild regularity assumptions on the measures considered. A particular instance of our findings is the inequality $\mathcal{A}\mathcal{W}_1\le C\sqrt{\mathcal{W}_1}$ on the set of measures that have Lipschitz kernels.

Our work also reveals how smoothing of measures affects the adapted weak topology. In fact, we find that the topology induced by the smooth adapted Wasserstein distance exhibits a non-trivial interpolation property, which we characterize explicitly: it lies in between the adapted weak topology and the weak topology, and the inclusion is governed by the decay of the smoothing parameter.

This talk is based on joint work with Jose Blanchet, Martin Larsson and Jonghwa Park.

November 15: Greta Panova (USC)

Algebra meets probability: permutons from pipe dreams via integrable probability

Pipe dreams are tiling models originally introduced to study objects related to the Schubert calculus and K-theory of the Grassmannian. They can also be viewed as ensembles of lattice walks with various interaction constraints. We determine the typical permutations and reveal new interesting limiting objects (permutons), which is proved via the theory of the Totally Asymmetric Simple Exclusion Process (TASEP). Deeper connections with free fermion 6 vertex models (Alternating Sign Matrices) and domino tilings of the Aztec diamond allowed us to describe the extreme cases of the original algebraic problem. As a byproduct we determine the permutations maximizing the principle specialization of the Grothendieck polynomials at beta=1. This sheds further light on the original question of Stanley to determine the maximal principle specialization of Schubert polynomials, which correspond to a statistical mechanics model with long-range interactions.

Based on joint work with A. H. Morales, L. Petrov, D. Yeliussizov.

November 22: Gil Kur (ETH Zürich)

Note the non-standard time and location: 1pm in KAP 265

Minimum Norm Interpolation Meets The Local Theory of Banach Spaces

Minimum-norm interpolators have recently gained attention as an analyzable model to shed light on the double descent phenomenon observed for neural networks. Most of the work has focused on analyzing interpolators in Hilbert spaces, where, typically, an effectively low-rank structure of the feature covariance prevents a large bias. This work takes a first step towards establishing a general framework that connects generalization properties of the interpolators to well-known concepts from high-dimensional geometry, specifically, from the local theory of Banach spaces.

November 29: [no classes]

December 6: Elliot Paquette (McGill) [2PM! Nonstandard time!]

Random matrix theory for high dimensional optimization, and an application to scaling laws

We describe a program of analysis of stochastic gradient methods on high-dimensional random objectives. We illustrate some assumptions under which the loss curves are universal, in that they can completely be described in terms of some underlying covariance structure of the problem setup. Furthermore, we give a description of these loss curves that can be analyzed precisely.

As a motivating application, we show how this can be applied to the power-law-random-features model. This is a simple two-hyperparameter family of optimization problems, which displays 5 distinct phases of SGD loss curves; these phases are determined by the relative complexities of the target, data distribution, and whether these are ‘high-dimensional’ or not (which in context can be precisely defined). In each phase, we can also give, for a given compute budget, the optimal random-feature dimensionality.

Joint work with Courtney Paquette (McGill & Google Deepmind), Jeffrey Pennington (Google Deepmind), and Lechao Xiao (Google Deepmind).

Spring 2024:

[CANCELLED] January 12: Roman Vershynin (UC Irvine) [Undergraduate lecture]

January 19: Grigory Franguridi, 2PM, (USC Center for Economic and Social Research)

Estimation of panels with attrition and refreshment samples

It has long been established that, if a panel dataset suffers from attrition, auxiliary (refreshment) sampling may restore full identification under weak nonparametric assumptions on attrition. Despite their generality, these identification strategies have not been employed in empirical research due to the intractability of induced estimation procedures. We show that practical estimation is nevertheless possible without parametric approximations. Our two-step estimation algorithm utilizes the well-known iterative proportional fitting procedure, which does not require optimization and exhibits fast convergence even with continuous data. We show that our estimator is consistent and asymptotically normal under smoothness assumptions and appropriate choice of kernel and bandwidth. We also demonstrate its excellent performance in simulations and provide an empirical illustration using the Understanding of America survey panel.

(with Jinyong Hahn, Pierre Hoonhout, Arie Kapteyn, and Geert Ridder)

January 26: Mahdi Soltanolkotabi (USC)

Foundations for feature learning via gradient descent

One of the key mysteries in modern learning is that a variety of models such as deep neural networks when trained via (stochastic) gradient descent can extract useful features and learn high quality representations directly from data simultaneously with fitting the labels. This feature learning capability is also at the forefront of the recent success of a variety of contemporary paradigms such as transformer architectures, self-supervised and transfer learning. Despite a flurry of exciting activity over the past few years, existing theoretical results are often too crude and/or pessimistic to explain feature/representation learning in practical regimes of operation or serve as a guiding principle for practitioners. Indeed, existing literature often requires unrealistic hyperparameter choices (e.g. very small step sizes, large initialization or wide models). In this talk I will focus on demystifying this feature/representation learning phenomena for a variety of problems spanning single index models, low-rank factorization, matrix reconstruction, and neural networks. Our results are based on an intriguing spectral bias phenomena for gradient descent, that puts the iterations on a particular trajectory towards solutions that are not only globally optimal but also generalize well by simultaneously finding good features/representations of the data while fitting to the labels. The proofs combine ideas from high-dimensional probability/statistics, optimization and nonlinear control to develop a precise analysis of model generalization along the trajectory of gradient descent. Time permitting, I will explain the implications of these theoretical results for more contemporary use cases including transfer learning, self-attention, prompt-tuning via transformers and simple self-supervised learning settings.

February 2: Timothy Armstrong (USC)

Robust Estimation and Inference in Panels with Interactive Fixed Effects

We consider estimation and inference for a regression coefficient in panels with interactive fixed effects (i.e., with a factor structure). We show that previously developed estimators and confidence intervals (CIs) might be heavily biased and size-distorted when some of the factors are weak. We propose estimators with improved rates of convergence and bias-aware CIs that are uniformly valid regardless of whether the factors are strong or not. Our approach applies the theory of minimax linear estimation to form a debiased estimate using a nuclear norm bound on the error of an initial estimate of the interactive fixed effects. We use the obtained estimate to construct a bias-aware CI taking into account the remaining bias due to weak factors. In Monte Carlo experiments, we find a substantial improvement over conventional approaches when factors are weak, with little cost to estimation error when factors are strong.

Paper link: Timothy B. Armstrong, Martin Weidner, Andrei Zeleneev. https://arxiv.org/abs/2210.06639

February 9: Yuehao Bai (USC)

On the Efficiency of Finely Stratified Experiments

This paper studies the efficient estimation of a large class of treatment effect parameters that arise in the analysis of experiments. Here, efficiency is understood to be with respect to a broad class of treatment assignment schemes for which the marginal probability that any unit is assigned to treatment equals a pre-specified value, e.g., one half. Importantly, we do not require that treatment status is assigned in an i.i.d. fashion, thereby accommodating complicated treatment assignment schemes that are used in practice, such as stratified block randomization and matched pairs. The class of parameters considered are those that can be expressed as the solution to a restriction on the expectation of a known function of the observed data, including possibly the pre-specified value for the marginal probability of treatment assignment. We show that this class of parameters includes, among other things, average treatment effects, quantile treatment effects, local average treatment effects as well as the counterparts to these quantities in experiments in which the unit is itself a cluster. In this setting, we establish two results. First, we derive a lower bound on the asymptotic variance of estimators of the parameter of interest in the form of a convolution theorem. Second, we show that the na ̈ıve method of moments estimator achieves this bound on the asymptotic variance quite generally if treatment is assigned using a “finely stratified” design. By a “finely stratified” design, we mean experiments in which units are divided into groups of a fixed size and a proportion within each group is assigned to treatment uniformly at random so that it respects the restriction on the marginal probability of treatment assignment. In this sense, “finely stratified” experiments lead to efficient estimators of treatment effect parameters “by design” rather than through ex post covariate adjustment.

Paper link: Yuehao Bai, Jizhou Liu, Azeem M. Shaikh, Max Tabord-Meehan
https://arxiv.org/abs/2307.15181

February 23: Tryphon Georgiou (UC Irvine)

Stochastic Control meets Non-equilibrium Thermodynamics: Fundamental limits of power generation in thermodynamic engines

Thermodynamics was born in the 19th century in quest of a way to quantify efficiency of steam engines at the dawn of the industrial age. In the time since, thermodynamics has impacted virtually all other areas in science, from chemistry and biology to the physics of black holes, and yet, progress beyond the classical quasi-static limit towards finite-time thermodynamic transitions has been slow; finite-time is of essence for non-vanishing power generation. In recent years a deeper understanding of non-equilibrium processes has been achieved based on stochastic models with degrees of freedom (state variables) that are subject to Brownian excitation that models heat baths. Within this framework we will explain energy transduction, we will give insights on how anisotropy in thermal or chemical potentials can be tapped for power generation in engineered and physical processes, and we will highlight fundamental bounds on the amount of power that can drawn during finite-time thermodynamic transitions.

The talk is based on joint works with Rui Fu (UCI), Olga Movilla (UCI), Amir Taghvaei (UCI) and Yongxin Chen (GaTech). Research funding by AFOSR, ARO and NSF is gratefully acknowledged.

Monday February 26, 3:30PM, KAP 427: Yongtao Guan (Chinese University of Hong Kong, Shenzhen)

Group Network Hawkes Process

In this work, we study the event occurrences of individuals interacting in a network. To characterize the dynamic interactions among the individuals, we propose a group network Hawkes process (GNHP) model whose network structure is observed and fixed. In particular, we introduce a latent group structure among individuals to account for the heterogeneous user-specific characteristics. A maximum likelihood approach is proposed to simultaneously cluster individuals in the network and estimate model parameters. A fast EM algorithm is subsequently developed by utilizing the branching representation of the proposed GNHP model. Theoretical properties of the resulting estimators of group memberships and model parameters are investigated under both settings when the number of latent groups G is over-specified or correctly specified. A data-driven criterion that can consistently identify the true G under mild conditions is derived. Extensive simulation studies and an application to a data set collected from Sina Weibo are used to illustrate the effectiveness of the proposed methodology.

March 1: Morris Yau (MIT)

Are Neural Networks Optimal Approximation Algorithms

In this talk, we discuss the power of neural networks to compute solutions to NP-hard optimization problems focusing on the class of constraint satisfaction problems (boolean sat, Sudoku, etc.). We find there is a graph neural network architecture (OptGNN) that captures the optimal approximation algorithm for constrainst satisfaction, up to complexity theoretic assumptions via tools in semidefinite programming. Furthermore, OptGNN can act as a convex program solver and hence output a dual (a bound) on the optimality of a combinatorial problem. Evaluating OptGNN on benchmarks in the neural combinatorial optimization literature, we find our approach is competitive with the state of the art unsupervised neural baselines. We discuss further connections between neural networks and computation, and point to directions for future work.

March 8: Annie Qu (UC Irvine)

A Model-Agnostic Graph Neural Network for Integrating Local and Global Information

Graph neural networks (GNNs) have achieved promising performance in a variety of graph focused tasks. Despite their success, the two major limitations of existing GNNs are the capability of learning various-order representations and providing interpretability of such deep learning-based black-box models. To tackle these issues, we propose a novel Model-agnostic Graph Neural Network (MaGNet) framework. The proposed framework is able to extract knowledge from high-order neighbors, sequentially integrates information of various orders, and offers explanations for the learned model by identifying influential compact graph structures. In particular, MaGNet consists of two components: an estimation model for the latent representation of complex relationships under graph topology, and an interpretation model that identifies influential nodes, edges, and important node features. Theoretically, we establish the generalization error bound for MaGNet via empirical Rademacher complexity and showcase its power to represent the layer-wise neighborhood mixing. We conduct comprehensive numerical studies using both simulated data and a real-world case study on investigating the neural mechanisms of the rat hippocampus,
demonstrating that the performance of MaGNet is competitive with state-of-the-art methods.

March 15: no talk [spring break]

March 22: Xiaowu Dai (UCLA)

Kernel ordinary differential equations

The ordinary differential equation (ODE) is widely used in modelling biological and physical processes in science. A new reproducing kernelbased approach is proposed for the estimation and inference of ODE given noisy observations. The functional forms in ODE are not assumed to be known or restricted to be linear or additive, and pairwise interactions are allowed. Sparse estimation is performed to select individual functionals and construct confidence intervals for the estimated signal trajectories. The estimation optimality and selection consistency of kernel ODE are established under both the low-dimensional and high-dimensional settings, where the number of unknown functionals can be smaller or larger than the sample size. The proposal tackles several important problems that are not yet fully addressed in smoothing spline analysis of variance (SS-ANOVA) framework, and extends the existing methods of dynamic causal modeling.

March 29: Kengo Kato (Cornell)

Entropic optimal transport: limit theorems and algorithms

In this talk, I will discuss my recent work on entropic optimal transport (EOT). In the first part, I will discuss limit theorems for EOT maps, dual potentials, and the Sinkhorn divergence. The key technical tool we use is a first and second-order Hadamard differentiability analysis of EOT potentials with respect to the marginals, from which the limit theorems, bootstrap consistency, and asymptotic efficiency of the empirical estimators follow. The second part concerns the entropic Gromov-Wasserstein (EGW) distance, which serves as a computationally efficient proxy for the Gromov-Wasserstein distance. By leveraging a variational representation that ties the EGW problem with a series of EOT problems, we derive stability results of EGW with respect to the auxiliary matrix, which enables us to develop efficient algorithms for solving the EGW problem. This talk is based on joint work with Ziv Goldfeld, Gabriel Rioux, and Ritwik Sadhu.

April 5: Weixuan Xia (USC)

Set-Valued Stochastic Integrals and Convoluted Lévy Processes

In this talk, I will discuss set-valued Volterra-type stochastic integrals driven by Lévy processes. I will explain the definition of set-valued convoluted stochastic integrals by taking the closed decomposable hull of integral functionals over time, thereby extending classical definitions to convoluted integrals with square-integrable kernels. Two key insights include: (1) Aside from established results for set-valued Itô integrals, while set-valued stochastic integrals with respect to a finite-variation Poisson random measure are guaranteed to be integrally bounded for bounded integrands, this is not true when the random measure represents infinite variation; (2) It is a mutual effect of kernel singularity and jumps that the set-valued convoluted integrals are possibly explosive and can take extended vector values. These findings carry significant implications for the construction of set-valued fractional dynamical systems. Additionally, I will explore two classes of set-monotone processes for practical interests in economic and financial modeling.

April 12: Yosi Rinott (The Hebrew University of Jerusalem)

On the behavior of posterior probabilities with additional data: monotonicity and nonmonotonicity, asymptotic rates, log-concavity, and Turán’s inequality

Given a parametric model, a prior, and data, Bayesian statisticians quantify their belief that the true parameter is ϑ₀ by its posterior probability. The starting question of this paper is whether the posterior at ϑ₀ increases when the data are generated under ϑ₀, and how it behaves when the data come from ϑ ≠ ϑ₀. Can it decrease and then increase, and thus additional data may mislead Bayesian statisticians?

For data arriving sequentially, we consider monotonicity properties of the posterior probabilities as a function of the sample size with respect to certain stochastic orders, specifically starting with likelihood ratio dominance.
When the data is generated by ϑ ≠ ϑ₀, Doob’s consistency theorem says that the posterior at ϑ₀converges a.s. to zero and therefore its expectation converges to zero. We obtain precise asymptotic rates of the latter convergence for observations from an exponential family and show that the expectation of the ϑ₀ -posterior under ϑ ≠ ϑ₀ is eventually strictly decreasing. Finally, we show that in a number of interesting cases this expectation is a log-concave function of the sample size, and thus unimodal. In the Bernoulli case we obtain this result by developing an inequality that is related to Turán’s inequality for Legendre polynomials.

The talk is based on a joint work with Sergiu Hart.

April 19: Andrew Holbrook (UCLA, Department of Biostatistics)

Quantum Markov chain Monte Carlo(s)

I discuss how one can use quantum circuits to accelerate multiproposal MCMC and point to promising avenues of future research, including quantum HMC, quantum-accelerated nonreversible MCMC and quantum-accelerated locally-balanced MCMC.

April 26: Sui Tang (UCSB)

Learning interaction kernels in particle and agent based systems

Interacting particle systems showcase a variety of collective behaviors and are fundamental to many scientific and engineering domains, such as the flocking of birds and the milling of fish. These systems are typically modeled using differential equations to elucidate how individual behaviors drive collective dynamics, an essential inquiry across multiple disciplines. Although recent theoretical and numerical studies have successfully replicated many qualitative collective patterns seen in nature, there is still a notable deficiency in quantitatively matching these models with empirical data.

We explore the data-driven discovery of latent interaction kernels from observed trajectory data in particle and agent-based systems. We discuss recent findings in stochastic systems where interaction kernels are derived from pairwise distances and introduce a nonparametric inference strategy using a regularized maximum likelihood estimator for precise kernel estimation. We show this approach can achieve near-optimal convergence rates, regardless of the state space dimensionality when dealing with multiple trajectory data. Additionally, we conduct error analysis related to discrete-time observations and validate our methodology through numerical experiments on models such as stochastic opinion dynamics and the Lennard-Jones potential. Moreover, we also consider microscopic models and advance our approach to estimate nonlocal interaction potentials in aggregation-diffusion equations from noisy data, using sparsity-promoting techniques. This research is conducted in collaboration with Fei Lu, Mauro Maggioni, Jose A. Carrillo, Gissell Estrada-Rodriguez, and Laszlo Mikolas.

USC Probability and Statistics Seminars