Fridays 3:30pm in KAP 414; tea is usually provided at 3:00pm

Organizer: Stanislav Minsker

For seminars/colloquia on other topics, see the Department of Mathematics webpage.

Older seminars: Fall 2016,  Spring 2016,  Fall 2015,  Spring 2015,  Fall 2014,  2013-2014

  • Spring 2018 seminars

    January 12: Pierre-François Rodriguez (UCLA, Department of Mathematics)

    On random walks and local limits in dimension 2

    We will discuss recent developments relating Poissonian “loop soups” à la Lawler-Werner, Le Jan and Sznitman, to study random walks on large two-dimensional tori. We will show how the underlying correspondence can be used to effectively describe certain critical phenomena in such systems, which occur when the walk is run up to suitable timescales while forced to avoid a given point.


    January 19: Andrew Stuart (Caltech)

    Large Graph Limits of Learning Algorithms

    Many problems in machine learning require the classification of high dimensional data. One methodology to approach such problems is to construct a graph whose vertices are identified with data points, with edges weighted according to some measure of affinity between the data points. Algorithms such as spectral clustering, probit classification and the Bayesian level set method can all be applied in this setting. The goal of the talk is to describe these algorithms for classification, and analyze them in the limit of large data sets. Doing so leads to interesting problems in the calculus of variations, in stochastic partial differential equations and in Monte Carlo Markov Chain, all of which will be highlighted in the talk. These limiting problems give insight into the structure of the classification problem, and algorithms for it.

    The talk is based on collaboration with Matt Dunlop (Caltech), Dejan Slepcev (CMU) and Matt Thorpe (Cambridge).


    February 2: Jacob Bien (USC, Marshall School of Business)

    Rare Feature Selection in High Dimensions

    Many prediction problems include a large number of features that count the frequency of various events. In several common application areas, nearly all of these events are rare, leading to design matrices that are highly sparse. The challenge posed by such “rare features” has received very little attention despite its prevalence. We show, both theoretically and empirically, that not explicitly accounting for the rareness of features can greatly reduce the effectiveness of an analysis. We next propose a framework for aggregating rare features into denser features in a flexible manner that creates better predictors of the response. Our strategy leverages side information in the form of a tree that encodes feature similarity.
    We apply our method to data from Trip Advisor, in which we predict the numerical rating of a hotel based on the text of the associated review. Our method achieves high accuracy by making effective use of rare words; by contrast, the lasso is unable to identify highly predictive words if they are too rare.

    February 16: Timothy Cannings (USC, Marshall School of Business)

    Local nearest neighbour classification with applications to semi-supervised learning.

    We derive a new asymptotic expansion for the global excess risk of a local $k$-nearest neighbour classifier, where the choice of $k$ may depend upon the test point.  We prove that, provided the $d$-dimensional marginal distribution of the features has a finite $\rho$th moment for some $\rho > 4$ (as well as other regularity conditions), a local choice of $k$ can yield a rate of convergence of the excess risk of $O(n^{-4/(d+4)})$, where $n$ is the sample size, whereas for the standard $k$-nearest neighbour classifier, our theory would require $d \geq 5$ and $\rho > 4d/(d-4)$ finite moments to achieve this rate.  Our results motivate a new $k$-nearest neighbour classifier for semi-supervised learning problems, where the unlabelled data are used to obtain an estimate of the marginal feature density, and fewer neighbours are used for classification when this density estimate is small.  The potential improvements over the standard $k$-nearest neighbour classifier are illustrated both through our theory and via a simulation study.


    February 23: Wen Sun (USC, Marshall School of Business)

    A General Framework for Information Pooling in Large-Scale Multiple Testing

    This talk discusses a general framework for exploiting the sparsity information in two-sample multiple testing problems. We propose to first construct a covariate sequence, in addition to the usual primary test statistics, to capture the sparsity structure, and then incorporate the auxiliary covariates in inference via a three-step algorithm consisting of grouping, adjusting and pooling (GAP). The GAP procedure provides a simple and effective framework for information pooling. An important advantage of GAP is its capability of handling various dependence structures such as those arise from multiple testing for high-dimensional linear regression, differential correlation analysis, and differential network analysis. We establish general conditions under which GAP is asymptotically valid for false discovery rate control, and show that these conditions are fulfilled in a range of applications. Numerical results demonstrate that existing methods can be much improved by the proposed framework. An application to a breast cancer study for identifying gene-gene interactions will be discussed if time permits.


    March 2: Akihiko Nishimura (UCLA)

    Discontinuous Hamiltonian Monte Carlo for models with discrete parameters and discontinuous likelihoods

    Hamiltonian Monte Carlo (HMC) is a powerful sampling algorithm employed by several probabilistic programming languages. Its fully automatic implementations have made HMC a standard tool for applied Bayesian modeling. While its performance is often superior to alternatives under a wide range of models, one prominent weakness of HMC is its inability to handle discrete parameters. In this talk, I present discontinuous HMC, an extension that can efficiently explore discrete spaces involving ordinal parameters as well as target distributions with discontinuous densities. The proposed algorithm is based on two key ideas: embedding of discrete parameters into a continuous space and simulation of Hamiltonian dynamics on a piecewise smooth density function. The latter idea has been explored under special cases in the literature, but the extensions introduced here are critical in turning the idea into a general and practical sampling algorithm. When properly-tuned, discontinuous HMC is guaranteed to outperform a Metropolis-within-Gibbs algorithm as the two algorithms coincide under a specific (and sub-optimal) implementation of discontinuous HMC. We apply our algorithm to challenging posterior inference problems to demonstrate its wide applicability and competitive performance.
        To make the talk more accessible, I will start my talk with a review of the essential ideas and notions behind HMC. A brief review of Bayesian inference and Markov chain Monte Carlo will also be provided.

    March 23: Joseph Salmon (TELECOM ParisTech )

    Generalized Concomitant Multi-Task Lasso for sparse multimodal regression

    For standard Lasso theory to hold though, the regularization parameter should be proportional to the noise level, which is generally unknown in practice. A remedy is to consider estimators, such as the Concomitant Lasso, which jointly optimize over the regression coefficients and the noise level. However, when data from different sources are pooled to increase sample size, or when dealing with multimodal data, noise levels differ and new dedicated estimators are needed. We provide new statistical and computational solutions to perform heteroscedastic regression, with an emphasis on functional brain imaging with magneto- and electroencephalographic (M/EEG) signals. This joint work with M. Massias, O. Fercoq and A. Gramfort is to appear to AISTAT 2018. Pdf <https://arxiv.org/abs/1705.09778>. Python code <https://github.com/mathurinm/SHCL>.


    March 30: Yaming Yu (UC Irvine, Department of Statistics)

    Successive sampling, monotonicity, and Bayesian bandits problems

    Stochastic orders appear naturally in many problems in statistics and applied probability and can be used to derive useful inequalities. We discuss monotonicity in classical limit theorems, structural results in Bayesian bandit problems, and comparisons between successive sampling and conditional Poisson sampling in sample surveys.


    April 6: Leila Setayeshgar (Providence College)

    Large Deviations for a Class of Stochastic Semilinear Partial Differential Equations

    Standard approaches to large deviations analysis for stochastic partial differential equa- tions (SPDEs) are often based on approximations. These approximations are mostly technical and often onerous to carry out. In 2008, Budhiraja, Dupuis and Maroulas, employed the weak convergence approach and showed that these approximations can be avoided for many infinite dimensional models. Large deviations analysis for such systems instead relied on demonstrating existence, uniqueness and tightness properties of certain perturbations of the original process. In this talk, we use the weak convergence approach, and establish the large deviation principle for the law of the solutions to a class of semilinear SPDEs. Our family of semilinear SPDEs contains, as special cases, both the stochastic Burgers’ equation, and the stochastic reaction-diffusion equation.


    April 13: Henry Schellhorn (Claremont Graduate University, Department of Mathematics)

    Density formula for functionals of compound Poisson processes using Malliavin calculus

    We extend the work of Nourdin and Viens (Electronic Journal of Probability 2009) to obtain a new exact formula for the density of the law of a random variable Z, which is measurable and di§erentiable with respect to a compound Poisson process. The main restriction is that the Levy measure must charge the whole real line.

    This talk is work in progress, and not all results have been exploited. For this reason time will probably allow for me to describe another project.


    April 20: Guido Montufar (UCLA, Department of Mathematics)

    Mixtures and Products in Two Graphical Models

    This talk is about two graphical models with hidden variables. One is a mixture model and the other is a product of mixtures called restricted Boltzmann machine. We derive relations between theoretical properties of restricted Boltzmann machines and several natural notions from discrete mathematics and convex geometry. We take a closer look at the first non-trivial case, with three observed binary variables. Although the mixture and the product of mixtures look different from their parametrizations, we show that in this case they represent the same set of distributions on the interior of the probability simplex, and are equal up to closure. We give a semi-algebraic description of this model in terms of six binomial inequalities and obtain closed form expressions for the maximum likelihood estimates.

    This talk is based on joint work with Jason Morton and Anna Seigal.


    April 27: Leonard Wong (USC, Department of Mathematics)

    Information geometry and optimal transport

    Information geometry studies spaces of probability distributions using differential geometry. On the other hand, optimal transport is about finding efficient methods of transporting one distribution to another. Both fields apply geometric ideas in probability and have found numerous applications in statistics and machine learning. In this talk we explain some new connections between the two fields. We study a family of logarithmic divergences (distance-like quantities) which generalizes the Bregman divergence (of which the relative entropy is a prime example). These divergences have a dual structure in terms of an optimal transport map, satisfy a generalized Pythagorean theorem, and correspond to natural extensions of exponential family. Geometrically, they characterize (locally) statistical manifolds with constant curvatures.