This talk will draw a few perspectives on the broad topic of Machine Learning, with non-specialists in mind. We will go through major subfields like supervised, unsupervised, or active learning, never forgetting the emergent reinforcement learning. We will cover a few different trends over recent years, like the mathematically inclined Support Vector Machine, or the empirical Deep Learning.

In 2010, when the LHC started colliding proton pairs in earnest, multi-variate analyses were newfangled methods starting to make inroads in experimental particle physics. These methods faced widespread skepticism as to their performance and biases, reflecting a winter of suspicion over overtrained neural networks that set in in the late 1990s. Thanks to more robust techniques, like boosted decision trees, it became possible to make better and more extensive use of the full information recorded in particle collisions at the Tevatron and LHC colliders.

The Higgs boson discovery by the CMS and ATLAS collaborations in 2012 was only possible because of the use of multi-variate techniques that enhanced the sensitivity by up to the equivalent of having 50% more collision data available for analysis.

We will review the use of classification and regression in the Higgs to diphoton search and subsequent discovery, a concrete example of a decade-old ML-based analysis in high-energy particle physics. Particular emphasis will be placed in the modular design of the analysis and the inherent explainability advantages, used to great effect in assuaging concerns raised by hundreds of initially-skeptical colleagues in the CMS collaboration.

Finally, we'll quickly highlight some particle physics challenges that have contributed to, and made use of, the last decade of graph, adversarial, and deep ML developments.

Stochastic optimal control theory deals with the problem to compute an optimal set of actions to attain some future goal. Examples are found in many contexts such as motor control tasks for robotics, planning and scheduling tasks or managing a financial portfolio. The computation of the optimal control is typically very difficult due to the size of the state space and the stochastic nature of the problem. Special cases for which the computation is tractable are linear dynamical systems with quadratic cost and deterministic control problems. For a special class of non-linear stochastic control problems, the solution can be mapped onto a statistical inference problem. For these so-called path integral control problems the optimal cost-to-go solution of the Bellman equation is given by the minimum of a free energy. I will give a high level introduction to the underlying theory and illustrate with some examples from robotics and other areas.

When faced with a data analysis, learning, or statistical inference problem, the amount and quality of data available fundamentally determines whether such tasks can be performed with certain levels of accuracy. Indeed, many theoretical disciplines study limits of such tasks by investigating whether a dataset effectively contains the information of interest. With the growing size of datasets however, it is crucial not only that the underlying statistical task is possible, but also that is doable by means of efficient algorithms. In this talk we will discuss methods aiming to establish limits of when statistical tasks are possible with computationally efficient methods or when there is a fundamental Statistical-to-Computational gap in which an inference task is statistically possible but inherently computationally hard.

This is intimately related to understanding the geometry of random functions, with connections to statistical physics, study of spin glasses, random geometry; and in an important example, algebraic invariant theory.

This talk summarises some new developments in Bayesian statistical methodology for performing inference in high-dimensional inverse problems with an underlying convex geometry. We pay particular attention to problems related to imaging sciences and to new stochastic computation methods that tightly combine proximal convex optimisation and Markov chain Monte Carlo sampling techniques. The new computation methods are illustrated with a range of imaging experiments, where they are used to perform uncertainty quantification analyses, automatically adjust regularisation parameters, and objectively compare alternative models in the absence of ground truth.

Data are increasingly measured, in ever tinier minutiae, by networks of spatially distributed agents. Illustrative examples include a team of robots searching a large region, a collection of sensors overseeing a critical infra-structure, or a swarm of drones policing a wide area.

How to learn from these large, spatially distributed datasets? In the centralized approach each agent forwards its dataset to a fusion center, which then carries out the learning from the pile of amassed datasets. This approach, however, prevents the number of agents to scale up: as more and more agents ship data to the center, not only the communication channels near the center quickly swell to congestion, but also the computational power of the center is rapidly outpaced.

In this seminar, I describe the alternative approach of distributed learning. Here, no fusion center exists, and the agents themselves recreate the centralized computation by exchanging short messages (not data) between network neighbors. To illustrate, I describe two learning algorithms: one solves convex learning problems via a token that randomly roams through the network, and the other solves a classification problem via random meetings between agents (e.g., gossip), each agent measuring only its own stream of features.

This seminar is aimed at non-specialists. Rather than trying to impart the latest developments of the field, I hope to open a welcoming door to those wishing to have a peek at this bubbling field of research, where optimization, control, probability, and machine learning mingle happily.

Off-policy evaluation is the problem of predicting the value of a policy given some batch of data. In the language of statistics, this is also called counterfactual estimation. Batch policy optimization refers to the problem of finding a good policy, again, given some logged data.

In this talk, I will consider the case of contextual bandits, give a brief (and incomplete) review of the approaches proposed in the literature and explain why this problem is difficult. Then, I will describe a new approach based on self-normalized importance weighting. In this approach, a semi-empirical Efron-Stein concentration inequality is combined with Harris' inequality to arrive at non-vacuous high-probability value lower bounds, which can then be used in a policy selection phase. On a number of synthetic and real datasets this new approach is found to be significantly superior than its main competitors, both in terms of tightness of the confidence intervals and the quality of the policies chosen.

The talk is based on joint work with Ilja Kuzborskij, Claire Vernade and Andras Gyorgy.

The interplay between physics and deep learning is typically divided into two themes.

The first is “physics for deep learning,” where techniques from physics are brought to bear on understanding dynamics of learning. The second is “deep learning for physics,” which focuses on application of deep learning techniques to physics problems. I will present a more nuanced view of this interplay with examples of how the structure of physics problems have inspired advances in deep learning and how it yields insights on topics such as inductive bias, interpretability, and causality.

When attempting to avoid global warming, individuals often face a social dilemma in which, besides securing future benefits, it is also necessary to reduce the chances of future losses. In this talk, I will resort to game theory and populations of adaptive agents to offer a theoretical analysis of this type of dilemmas, in which the risk of failure plays a central role in individual decisions. I will discuss both deterministic dynamics in large populations, and stochastic social learning dynamics in finite populations. This class of models can be shown to capture some of the essential features discovered in recent key experiments while allowing one to extend in non-trivial ways the experimental conditions to regions of practical interest. Moreover, this approach leads us to identify useful parallels between ecological and socio-economic systems, particularly in what concerns the evolution and self-organization of their institutions. Particularly, our results suggest that global coordination for a common good should be attempted through a polycentric structure of multiple small-scale agreements, in which perception of risk is high and uncertainty in collective goals is minimized. Whenever the perception of risk is low, our results indicate that sanctioning institutions may significantly enhance the chances of coordinating to tame the planet's climate, as long as they are implemented in a bottom-up manner. I will discuss the impact on public goods dilemmas of heterogeneous political networks and wealth inequality, including distribution of wealth representative of existing inequalities among nations. Finally, I will briefly discuss the impact of scientific uncertainty — both in what concerns the collective targets and the time window available for action — on individuals' strategies and polarization of preferences.

The aim of this seminar is to explain, to a wide audience, how to combine optimal control techniques with reinforcement learning, by using approximate dynamic programming, and artificial neural networks, to obtain adaptive optimal controllers. Although with roots since the end of the XX century, this problem has been the subject of an increasing attention. In addition to the promising tools that it offers to tackle difficult nonlinear problems with major engineering importance (ranging from robotics to biomedical engineering and beyond), it has the charm of creating a meeting point between the control and machine learning research communities.

Understanding the great performances of deep neural networks is a very active direction of research with contributions coming from a wide variety of fields. The statistical mechanics of learning is a theoretical framework dating back to the 80s studying learning problems from a physicist viewpoint and using tools from the physics of disordered systems. In this talk, I will first go over this traditional framework, which relies on the teacher-student scenario, bayesian analysis and mean-field approximations. Then I will discuss some recent advances in the corresponding analysis of modern deep neural network, and highlight remaining challenges.

In this talk, I introduce TensorFlow Quantum (TFQ), an open source library that was launched by Google in March 2020, for the rapid prototyping of hybrid quantum-classical models for classical or quantum data.This framework offers high-level abstractions for the design, training, and testing of both discriminative and generative quantum models under TensorFlow and supports high-performance quantum circuit simulators. I provide an overview of the software architecture and building blocks through several examples and illustrate TFQ functionalities via constructing hybrid quantum-classical convolutional neural networks for quantum state classification.

Deep Learning is a powerful collection of techniques for statistical learning, which has shown dramatic applications in many different directions, including including the study of data sets of images, text, and time series. It uses neural networks, specifically convolutional neural networks (CNN's), to produce these results. What we have observed recently is that methods of topology can contribute to this effort, in diagnosing behavior within the CNN's, in the design of neural networks with excellent computational properties, and in improving generalization, i.e. the transfer of results of one neural network from one data set to another of similar type. We'll discuss topological methods in data science, as well as there application to this interesting set of techniques.

Neural network-based deep learning is capable of approximating functions in very high dimension with unprecedented efficiency and accuracy. This has opened up many exciting new possibilities, not just in traditional areas of artificial intelligence, but also in scientific computing and computational science. At the same time, deep learning has also acquired the reputation of being a set of “black box” type of tricks, without fundamental principles. This has been a real obstacle for making further progress in machine learning.

In this talk, I will try to address the following two questions:

How machine learning will impact computational mathematics and computational science?

How computational mathematics, particularly numerical analysis, can impact machine learning? We describe some of the most important progresses that have been made on these issues so far. Our hope is to put things into a perspective that will help to integrate machine learning with computational science.

Modern particle physics detectors generate copious amounts of data packed with meaning that provides the means for high-quality measurements in demanding experimental environments. To achieve these measurements there is a trend towards finer granularity in these detectors and that implies the data read out has less intrinsic structure. Accurate pattern recognition is required to define the signatures of particles within those detectors and simultaneously extract physical parameters for the particles. Typically, algorithms to achieve these goals are written using well known unsupervised algorithms, but recent advances in machine learning on graph structures, "Graph Neural Networks" (GNNs), provide powerful new methodologies for designing pattern recognition algorithms. In particular, methodologies for predicting the link structure between pieces of data from detectors are well suited to the particle physics pattern recognition task. Furthermore, there are interesting avenues for enforcing known symmetries of the data into the output of such networks and there is ongoing research in this direction. This talk will discuss the challenges of pattern recognition, the advent of GNNs and the connections to particle physics, and the paths of research ahead for fully utilizing this powerful new tool.

Interacting agent-based systems are ubiquitous in science, from modeling of particles in Physics to prey-predator and colony models in Biology, to opinion dynamics in economics and social sciences. Oftentimes the laws of interactions between the agents are quite simple, for example they depend only on pairwise interactions, and only on pairwise distance in each interaction. We consider the following inference problem for a system of interacting particles or agents: given only observed trajectories of the agents in the system, can we learn what the laws of interactions are? We would like to do this without assuming any particular form for the interaction laws, i.e. they might be "any" function of pairwise distances. We consider this problem both the mean-field limit (i.e. the number of particles going to infinity) and in the case of a finite number of agents, with an increasing number of observations, albeit in this talk we will mostly focus on the latter case. We cast this as an inverse problem, and study it in the case where the interaction is governed by an (unknown) function of pairwise distances. We discuss when this problem is well-posed, and we construct estimators for the interaction kernels with provably good statistically and computational properties. We measure their performance on various examples, that include extensions to agent systems with different types of agents, second-order systems, and families of systems with parametric interaction kernels. We also conduct numerical experiments to test the large time behavior of these systems, especially in the cases where they exhibit emergent behavior.

This is joint work with F. Lu, J.Miller, S. Tang and M. Zhong.

The increasing dimensionality of data in the modern machine learning age presents new challenges and opportunities. The high-dimensional settings allow one to use powerful asymptotic methods from probability theory and statistical physics to obtain precise characterizations and develop new algorithmic approaches. There is indeed a decades-long tradition in statistical physics with building and solving such simplified models of neural networks.

I will give examples of recent works that build on powerful methods of physics of disordered systems to analyze different problems in machine learning and neural networks, including overparameterization, kernel methods, and the gradient descent algorithm in a high dimensional non-convex setting.

High-dimensional learning remains an outstanding phenomena where experimental evidence outpaces our current mathematical understanding, mostly due to the recent empirical successes of Deep Learning algorithms. Neural Networks provide a rich yet intricate class of functions with statistical abilities to break the curse of dimensionality, and where physical priors can be tightly integrated into the architecture to improve sample efficiency. Despite these advantages, an outstanding theoretical challenge in these models is computational, ie providing an analysis that explains successful optimization and generalization in the face of existing worst-case computational hardness results.

In this talk, I will focus on the framework that lifts parameter optimization to an appropriate measure space. I will cover existing results that guarantee global convergence of the resulting Wasserstein gradient flows, as well as recent results that study typical fluctuations of the dynamics around their mean field evolution. We will also discuss extensions of this framework beyond vanilla supervised learning, to account for symmetries in the function, as well as for competitive optimization.

Deep learning continues to dominate machine learning and has been successful in computer vision, natural language processing, etc. Its impact has now expanded to many research areas in science and engineering. In this talk, I will mainly focus on some recent impact of deep learning on computational mathematics. I will present our recent work on bridging deep neural networks with numerical differential equations. On the one hand, I will show how to design transparent deep convolutional networks to uncover hidden PDE models from observed dynamical data. On the other hand, I will present our preliminary attempt to establish a deep reinforcement learning based framework to solve 1D scalar conservation laws, and a meta-learning approach for solving linear parameterized PDEs based on the multigrid method.

Inverse problems in imaging range from tomographic reconstruction (CT, MRI, etc) to image deconvolution, segmentation, and classification, just to name a few. In this talk I will discuss approaches to inverse imaging problems which have both a mathematical modelling (knowledge driven) and a machine learning (data-driven) component. Mathematical modelling is crucial in the presence of ill-posedness, making use of information about the imaging data, for narrowing down the search space. Such an approach results in highly generalizable reconstruction and analysis methods which come with desirable solutions guarantees. Machine learning on the other hand is a powerful tool for customising methods to individual data sets. Highly parametrised models such as deep neural networks in particular, are powerful tools for accurately modelling prior information about solutions. The combination of these two paradigms, getting the best from both of these worlds, is the topic of this talk, furnished with examples for image classification under minimal supervision and for tomographic image reconstruction.

I will discuss the impact of nuisance parameters on the effectiveness of supervised classification in high energy physics problems, and techniques that may mitigate or remove their effect in the search for optimal selection criteria and variable transformations. The approaches discussed include nuisance parametrized models, modified or adversary losses, semi supervised learning approaches and inference-aware techniques.

Pure model-based approaches are today often insufficient for solving complex inverse problems in imaging. At the same time, we witness the tremendous success of data-based methodologies, in particular, deep neural networks for such problems. However, pure deep learning approaches often neglect known and valuable information from physics.

In this talk, we will provide an introduction to this problem complex and then discuss a general conceptual approach to inverse problems in imaging, which combines deep learning and physics. This hybrid approach is based on shearlet-based sparse regularization and deep learning and is guided by a microlocal analysis viewpoint to pay particular attention to the singularity structures of the data. Finally, we will present several applications such as tomographic reconstruction and show that our approach outperforms previous methodologies, including methods entirely based on deep learning.

The collection of massive observational datasets has led to unprecedented opportunities for causal inference, such as using electronic health records to identify risk factors for disease. However, our ability to understand these complex data sets has not grown the same pace as our ability to collect them. While causal inference has traditionally focused on pairwise relationships between variables, biological systems are highly complex and knowing when events may happen is often as important as knowing whether they will. In the first half of this talk I discuss new methods that allow causal relationships to be reliably inferred from complex observational data, motivated by analysis of intensive care unit and other medical data. Causes are useful because they allow us to take action, but how there is a gap between the output of machine learning and what helps people make decisions. In the second part of this talk I discuss our recent findings in testing just how people fare when using the output of machine learning and how we can go from data to knowledge to decisions.

Recent work has shown that tools from dynamical systems can be used to analyze accelerated optimization algorithms. For example, it has been shown that the continuous limit of Nesterov’s accelerated gradient (NAG) gives an ODE whose convergence rate matches that of NAG for convex, unconstrained, and smooth problems. Conversely, it has been shown that NAG can be obtained as the discretization of an ODE, however since different discretizations lead to different algorithms, the choice of the discretization becomes important. The first part of this talk will extend this type of analysis to convex, constrained and non-smooth problems by using Lyapunov stability theory to analyze continuous limits of the Alternating Direction Method of Multipliers (ADMM). The second part of this talk will show that many existing and new optimization algorithms can be obtained by suitably discretizing a dissipative Hamiltonian. As an example, we will present a new method called Relativistic Gradient Descent (RGD), which empirically outperforms momentum, RMSprop, Adam and AdaGrad on several non-convex problems.

This is joint work with Guilherme França, Daniel Robinson and Jeremias Sulam.