Deep learning has transformed Machine Learning and Artificial Intelligence in the past decade. It raises fundamental questions for mathematics and theory of computer science, since it relies upon solving large-scale nonconvex problems via gradient descent and its variants. This talk will be an introduction to mathematical questions raised by deep learning, and some partial understanding obtained in recent years with respect to optimization, generalization, self-supervised learning, privacy etc.

Given a set of distances amongst points, determining what metric representation is most "consistent" with the input distances or the metric that best captures the relevant geometric features of the data is a key step in many machine learning algorithms. In this talk, we focus on 3 specific metric constrained problems, a class of optimization problems with metric constraints: metric nearness (Brickell et al. (2008)), weighted correlation clustering on general graphs (Bansal et al. (2004)), and metric learning (Bellet et al. (2013); Davis et al. (2007)).

Because of the large number of constraints in these problems, however, these and other researchers have been forced to restrict either the kinds of metrics learned or the size of the problem that can be solved. We provide an algorithm, PROJECT AND FORGET, that uses Bregman projections with cutting planes, to solve metric constrained problems with many (possibly exponentially) inequality constraints. We also prove that our algorithm converges to the global optimal solution. Additionally, we show that the optimality error decays asymptotically at an exponential rate. We show that using our method we can solve large problem instances of three types of metric constrained problems, out-performing all state of the art methods with respect to CPU times and problem sizes.

Finally, we discuss the adaptation of PROJECT AND FORGET to specific types of metric constraints, namely tree and hyperbolic metrics.

In this talk I will review essentials of quantum field theory (QFT) and demonstrate how the function-space distribution of many neural networks (NNs) shares similar properties. This allows, for instance, computation of correlators of neural network outputs in terms of Feynman diagrams and a direct analogy between non-Gaussian corrections in NN distributions and particle interactions. Some cases yield divergences in perturbation theory, requiring the introduction of regularization and renormalization. Potential advantages of this perspective will be discussed, including a duality between function-space and parameter-space descriptions of neural networks.

Graph neural networks (GNNs) have become the standard toolkit for analyzing and learning from data on graphs. As the field grows, it becomes critical to identify key architectures and validate new ideas that generalize to larger, more complex datasets. Unfortunately, it has been increasingly difficult to gauge the effectiveness of new models in the absence of a standardized benchmark with consistent experimental settings. In this work, we introduce a reproducible GNN benchmarking framework, with the facility for researchers to add new models conveniently for arbitrary datasets. We demonstrate the usefulness of our framework by presenting a principled investigation into the recent Weisfeiler-Lehman GNNs (WL-GNNs) compared to message passing-based graph convolutional networks (GCNs) for a variety of graph tasks with medium-scale datasets.

Algorithmic decisions are now being used on a daily basis, and based on Machine Learning (ML) processes that may be complex and biased. This raises several concerns given the critical impact that biased decisions may have on individuals or on society as a whole. Not only unfair outcomes affect human rights, they also undermine public trust in ML and AI. In this talk, we will address fairness issues of ML models based on decision outcomes, and we will show how the simple idea of feature dropout followed by an ensemble approach can improve model fairness without compromising its accuracy. To illustrate we will present a general workflow that relies on explainers to tackle process fairness, which essentially measures a model's reliance on sensitive or discriminatory features. We will present different applications and empirical settings that show improvements not only with respect to process fairness but also other fairness metrics.

Massive data collection holds the promise of a better understanding of complex phenomena and ultimately, of better decisions. An exciting opportunity in this regard stems from the growing availability of perturbation / intervention data (drugs, knockouts, overexpression, etc.) in biology. In order to obtain mechanistic insights from such data, a major challenge is the development of a framework that integrates observational and interventional data and allows predicting the effect of yet unseen interventions or transporting the effect of interventions observed in one context to another. I will present a framework for causal structure discovery based on such data and highlight the role of overparameterized autoencoders. We end by demonstrating how these ideas can be applied for drug repurposing in the current SARS-CoV-2 crisis.

Linear (and generalized linear) regression (LR) is an old, but still essential, statistical tool: its goal is to learn to predict a (response) variable from a linear combination of other (explanatory) variables. A central problem in LR is the selection of relevant variables, because using fewer variables tends to yield better generalization and because this identification may be meaningful (e.g., which genes are relevant to predict a certain disease). In the past quarter-century, variable selection (VS) based on sparsity-inducing regularizers has been a central paradigm, the most famous example being the LASSO, which has been intensively studied, extended, and applied.

In many contexts, it is natural to have highly-correlated variables (e.g., several genes that are strongly co-regulated), thus simultaneously relevant as predictors. In this case, sparsity-based VS may fail: it may select an arbitrary subset of these variables and it is unstable. Moreover, it is often desirable to identify all the relevant variables, not just an arbitrary subset thereof, a goal for which several approaches have been proposed. This talk will be devoted to a recent class of such approaches, called ordered weighted l1 (OWL). The key feature of OWL is that it is provably able to explicitly identify (i.e. cluster) sufficiently-correlated features, without having to compute these correlations. Several theoretical results characterizing OWL will be presented, including connections to the mathematics of economic inequality. Computational and optimization aspects will also be addressed, as well as recent applications in subspace clustering, learning Gaussian graphical models, and deep neural networks.

Identifying the relevant coarse-grained degrees of freedom in a complex physical system is a key stage in developing effective theories. The renormalization group (RG) provides a framework for this task, but its practical execution in unfamiliar systems is fraught with ad hoc choices. Machine learning approaches, on the other hand, though promising, often lack formal interpretability: it is unclear what relation, if any, the architecture- and training-dependent learned "relevant" features bear to standard objects of physical theory. I will present recent results addressing both issues. We develop a fast algorithm, the RSMI-NE, employing state-of-art results in machine-learning-based estimation of information-theoretic quantities to construct the optimal coarse-graining. We use it to develop a new approach to identifying the most relevant field theory operators describing a statistical system, which we validate on the example of interacting dimer model. I will also discuss formal results underlying the method: we establish equivalence between the information-theoretic notion of relevance defined in the Information Bottleneck (IB) formalism of compression theory, and the field-theoretic relevance of the RG. We show analytically that for statistical physical systems the "relevant" degrees of freedom found using IB compression indeed correspond to operators with the lowest scaling dimensions, providing a dictionary connecting two distinct theoretical toolboxes.

The past few decades have witnessed a significant research effort in the field of Lyapunov model based control design. In parallel, optimal control and optimization model based design have also expanded their range of applications, and nowadays, receding horizon approaches can be considered a mature field for particular classes of control systems.

In this talk, I will argue that Lyapunov based techniques play an important role for analysis of model based optimization methodologies and moreover, both approaches can be combined for control design resulting in powerful frameworks with formal guarantees of robustness, stability, performance, and safety. Illustrative examples in the area of motion control of autonomous robotic vehicles will be presented for Autonomous Underwater Vehicles (AUVs), Autonomous Surface Vehicles (ASVs) and Unmanned Aerial Vehicles (UAVs).

We compare the complexity of training classical and quantum machine learning (ML) models for predicting outcomes of physical experiments. The experiments depend on an input parameter x and involve the execution of a (possibly unknown) quantum process $E$. Our figure of merit is the number of runs of $E$ needed during training, disregarding other measures of complexity. A classical ML performs a measurement and records the classical outcome after each run of $E$, while a quantum ML can access $E$ coherently to acquire quantum data; the classical or quantum data is then used to predict outcomes of future experiments. We prove that, for any input distribution $D(x)$, a classical ML can provide accurate predictions on average by accessing $E$ a number of times comparable to the optimal quantum ML. In contrast, for achieving accurate prediction on all inputs, we show that exponential quantum advantage exists in certain tasks. For example, to predict expectation values of all Pauli observables in an $n-$qubit system, we present a quantum ML using only $O(n)$ data and prove that a classical ML requires $2^{\Omega(n)}$ data.

In the last two decades the field of nonequilibrium quantum many-body physics has seen a rapid development driven, in particular, by the remarkable progress in quantum simulators, which today provide access to dynamics in quantum matter with an unprecedented control. However, the efficient numerical simulation of nonequilibrium real-time evolution in isolated quantum matter still remains a key challenge for current computational methods especially beyond one spatial dimension. In this talk I will present a versatile and efficient machine learning inspired approach. I will first introduce the general idea of encoding quantum many-body wave functions into artificial neural networks. I will then identify and resolve key challenges for the simulation of real-time evolution, which previously imposed significant limitations on the accurate description of large systems and long-time dynamics. As a concrete example, I will consider the dynamics of the paradigmatic two-dimensional transverse field Ising model, where we observe collapse and revival oscillations of ferromagnetic order and demonstrate that the reached time scales are comparable to or exceed the capabilities of state-of-the-art tensor network methods.

Many tasks in fluid mechanics, such as design optimization and control, are challenging because fluids are nonlinear and exhibit a large range of scales in both space and time. This range of scales necessitates exceedingly high-dimensional measurements and computational discretization to resolve all relevant features, resulting in vast data sets and time-intensive computations. Indeed, fluid dynamics is one of the original big data fields, and many high-performance computing architectures, experimental measurement techniques, and advanced data processing and visualization algorithms were driven by decades of research in fluid mechanics. Machine learning constitutes a growing set of powerful techniques to extract patterns and build models from this data, complementing the existing theoretical, numerical, and experimental efforts in fluid mechanics. In this talk, we will explore current goals and opportunities for machine learning in fluid mechanics, and we will highlight a number of recent technical advances. Because fluid dynamics is central to transportation, health, and defense systems, we will emphasize the importance of machine learning solutions that are interpretable, explainable, generalizable, and that respect known physics.

In this presentation, I will introduce some traditional Reinforcement Learning problems and algorithms, and analyze how some problems can be avoided and convergence results obtained using a two-time scale variation of the usual stochastic approximation approach.

This variation was inspired by the practical successes of Deep Q-Learning in attaining superhuman performance at some classical Atari games by Deepmind's research team in 2015. Machine Learning practical successes like this often have no corresponding explaining theory. The work that will be presented intends to contribute to that goal.

Joint work with Diogo Carvalho and Francisco Melo from INESC-ID.

Optimal transport (OT) has recently gained lot of interest in machine learning. It is a natural tool to compare in a geometrically faithful way probability distributions. It finds applications in both supervised learning (using geometric loss functions) and unsupervised learning (to perform generative model fitting). OT is however plagued by the curse of dimensionality, since it might require a number of samples which grows exponentially with the dimension. In this talk, I will explain how to leverage entropic regularization methods to define computationally efficient loss functions, approximating OT with a better sample complexity.

More information and references can be found on the website of our book "Computational Optimal Transport", https://optimaltransport.github.io/