# 2020 seminars

## 06/01/2021, Wednesday, 18:00–19:00 Europe/Lisbon — Online

Sanjeev Arora, Computer Science Department, Princeton University
The quest for mathematical understanding of deep learning

Deep learning has transformed Machine Learning and Artificial Intelligence in the past decade. It raises fundamental questions for mathematics and theory of computer science, since it relies upon solving large-scale nonconvex problems via gradient descent and its variants. This talk will be an introduction to mathematical questions raised by deep learning, and some partial understanding obtained in recent years with respect to optimization, generalization, self-supervised learning, privacy etc.

## 13/01/2021, Wednesday, 18:00–19:00 Europe/Lisbon — Online

Anna C. Gilbert, Yale University
Metric representations: Algorithms and Geometry

Given a set of distances amongst points, determining what metric representation is most "consistent" with the input distances or the metric that best captures the relevant geometric features of the data is a key step in many machine learning algorithms. In this talk, we focus on 3 specific metric constrained problems, a class of optimization problems with metric constraints: metric nearness (Brickell et al. (2008)), weighted correlation clustering on general graphs (Bansal et al. (2004)), and metric learning (Bellet et al. (2013); Davis et al. (2007)).

Because of the large number of constraints in these problems, however, these and other researchers have been forced to restrict either the kinds of metrics learned or the size of the problem that can be solved. We provide an algorithm, PROJECT AND FORGET, that uses Bregman projections with cutting planes, to solve metric constrained problems with many (possibly exponentially) inequality constraints. We also prove that our algorithm converges to the global optimal solution. Additionally, we show that the optimality error decays asymptotically at an exponential rate. We show that using our method we can solve large problem instances of three types of metric constrained problems, out-performing all state of the art methods with respect to CPU times and problem sizes.

Finally, we discuss the adaptation of PROJECT AND FORGET to specific types of metric constraints, namely tree and hyperbolic metrics.

## 20/01/2021, Wednesday, 18:00–19:00 Europe/Lisbon — Online

James Halverson, Northeastern University
Neural Networks and Quantum Field Theory

In this talk I will review essentials of quantum field theory (QFT) and demonstrate how the function-space distribution of many neural networks (NNs) shares similar properties. This allows, for instance, computation of correlators of neural network outputs in terms of Feynman diagrams and a direct analogy between non-Gaussian corrections in NN distributions and particle interactions. Some cases yield divergences in perturbation theory, requiring the introduction of regularization and renormalization. Potential advantages of this perspective will be discussed, including a duality between function-space and parameter-space descriptions of neural networks.

### Additional file

Halverson slides.pdf

## 27/01/2021, Wednesday, 11:00–12:00 Europe/Lisbon — Online

Xavier Bresson, Nanyang Technological University
Benchmarking Graph Neural Networks

Graph neural networks (GNNs) have become the standard toolkit for analyzing and learning from data on graphs. As the field grows, it becomes critical to identify key architectures and validate new ideas that generalize to larger, more complex datasets. Unfortunately, it has been increasingly difficult to gauge the effectiveness of new models in the absence of a standardized benchmark with consistent experimental settings. In this work, we introduce a reproducible GNN benchmarking framework, with the facility for researchers to add new models conveniently for arbitrary datasets. We demonstrate the usefulness of our framework by presenting a principled investigation into the recent Weisfeiler-Lehman GNNs (WL-GNNs) compared to message passing-based graph convolutional networks (GCNs) for a variety of graph tasks with medium-scale datasets.

### Additional file

Bresson slides.pdf

## 03/02/2021, Wednesday, 18:00–19:00 Europe/Lisbon — Online

Miguel Couceiro, Université de Lorraine
Making ML Models fairer through explanations, feature dropout, and aggregation

Algorithmic decisions are now being used on a daily basis, and based on Machine Learning (ML) processes that may be complex and biased. This raises several concerns given the critical impact that biased decisions may have on individuals or on society as a whole. Not only unfair outcomes affect human rights, they also undermine public trust in ML and AI. In this talk, we will address fairness issues of ML models based on decision outcomes, and we will show how the simple idea of feature dropout followed by an ensemble approach can improve model fairness without compromising its accuracy. To illustrate we will present a general workflow that relies on explainers to tackle process fairness, which essentially measures a model's reliance on sensitive or discriminatory features. We will present different applications and empirical settings that show improvements not only with respect to process fairness but also other fairness metrics.

### Additional file

Couceiro slides.pdf

## 10/02/2021, Wednesday, 18:00–19:00 Europe/Lisbon — Online

Caroline Uhler, MIT and Institute for Data, Systems and Society
Causal Inference and Overparameterized Autoencoders in the Light of Drug Repurposing for SARS-CoV-2

Massive data collection holds the promise of a better understanding of complex phenomena and ultimately, of better decisions. An exciting opportunity in this regard stems from the growing availability of perturbation / intervention data (drugs, knockouts, overexpression, etc.) in biology. In order to obtain mechanistic insights from such data, a major challenge is the development of a framework that integrates observational and interventional data and allows predicting the effect of yet unseen interventions or transporting the effect of interventions observed in one context to another. I will present a framework for causal structure discovery based on such data and highlight the role of overparameterized autoencoders. We end by demonstrating how these ideas can be applied for drug repurposing in the current SARS-CoV-2 crisis.

Uhler slides.pdf

## 17/02/2021, Wednesday, 18:00–19:00 Europe/Lisbon — Online

Mário Figueiredo, Instituto Superior Técnico and IT
Dealing with Correlated Variables in Supervised Learning

Linear (and generalized linear) regression (LR) is an old, but still essential, statistical tool: its goal is to learn to predict a (response) variable from a linear combination of other (explanatory) variables. A central problem in LR is the selection of relevant variables, because using fewer variables tends to yield better generalization and because this identification may be meaningful (e.g., which genes are relevant to predict a certain disease). In the past quarter-century, variable selection (VS) based on sparsity-inducing regularizers has been a central paradigm, the most famous example being the LASSO, which has been intensively studied,
extended, and applied.

In many contexts, it is natural to have highly-correlated variables (e.g., several genes that are strongly co-regulated), thus simultaneously relevant as predictors. In this case, sparsity-based VS may fail: it may select an arbitrary subset of these variables and it is unstable. Moreover, it is often desirable to identify all the relevant variables, not just an arbitrary subset thereof, a goal for which several approaches have been proposed. This talk will be devoted to a recent class of such approaches, called ordered weighted l1 (OWL). The key feature of OWL is that it is provably able to explicitly identify (i.e. cluster) sufficiently-correlated features, without having to compute these correlations. Several theoretical results characterizing OWL will be presented, including connections to the mathematics of economic inequality. Computational and optimization aspects will also be addressed, as well as recent applications in subspace clustering, learning Gaussian graphical models, and deep neural networks.

### Additional file

Figueiredo slides.pdf

## 22/02/2021, Monday, 17:00–18:00 Europe/Lisbon — Online

Maciej Koch-Janusz, University of Zurich
Statistical physics through the lens of real-space mutual information

Identifying the relevant coarse-grained degrees of freedom in a complex physical system is a key stage in developing effective theories. The renormalization group (RG) provides a framework for this task, but its practical execution in unfamiliar systems is fraught with ad hoc choices. Machine learning approaches, on the other hand, though promising, often lack formal interpretability: it is unclear what relation, if any, the architecture- and training-dependent learned "relevant" features bear to standard objects of physical theory.
I will present recent results addressing both issues. We develop a fast algorithm, the RSMI-NE, employing state-of-art results in machine-learning-based estimation of information-theoretic quantities to construct the optimal coarse-graining. We use it to develop a new approach to identifying the most relevant field theory operators describing a statistical system, which we validate on the example of interacting dimer model. I will also discuss formal results underlying the method: we establish equivalence between the information-theoretic notion of relevance defined in the Information Bottleneck (IB) formalism of compression theory, and the field-theoretic relevance of the RG. We show analytically that for statistical physical systems the "relevant" degrees of freedom found using IB compression indeed correspond to operators with the lowest scaling dimensions, providing a dictionary connecting two distinct theoretical toolboxes.

### Additional file

Koch-Janusz slides.pdf

## 03/03/2021, Wednesday, 18:00–19:00 Europe/Lisbon — Online

A. Pedro Aguiar, Faculdade de Engenharia, Universidade do Porto
Model based control design combining Lyapunov and optimization tools: Examples in the area of motion control of autonomous robotic vehicles

The past few decades have witnessed a significant research effort in the field of Lyapunov model based control design. In parallel, optimal control and optimization model based design have also expanded their range of applications, and nowadays, receding horizon approaches can be considered a mature field for particular classes of control systems.

In this talk, I will argue that Lyapunov based techniques play an important role for analysis of model based optimization methodologies and moreover, both approaches can be combined for control design resulting in powerful frameworks with formal guarantees of robustness, stability, performance, and safety. Illustrative examples in the area of motion control of autonomous robotic vehicles will be presented for Autonomous Underwater Vehicles (AUVs), Autonomous Surface Vehicles (ASVs) and Unmanned Aerial Vehicles (UAVs).

### Additional file

Aguiar slides.pdf

## 17/03/2021, Wednesday, 18:00–19:00 Europe/Lisbon — Online

We compare the complexity of training classical and quantum machine learning (ML) models for predicting outcomes of physical experiments. The experiments depend on an input parameter x and involve the execution of a (possibly unknown) quantum process $E$. Our figure of merit is the number of runs of $E$ needed during training, disregarding other measures of complexity. A classical ML performs a measurement and records the classical outcome after each run of $E$, while a quantum ML can access $E$ coherently to acquire quantum data; the classical or quantum data is then used to predict outcomes of future experiments. We prove that, for any input distribution $D(x)$, a classical ML can provide accurate predictions on average by accessing $E$ a number of times comparable to the optimal quantum ML. In contrast, for achieving accurate prediction on all inputs, we show that exponential quantum advantage exists in certain tasks. For example, to predict expectation values of all Pauli observables in an $n-$qubit system, we present a quantum ML using only $O(n)$ data and prove that a classical ML requires $2^{\Omega(n)}$ data.

Huang slides.pdf

## 22/03/2021, Monday, 17:00–18:00 Europe/Lisbon — Online

Markus Heyl, Max-Planck Institute for the Physics of Complex Systems, Dresden
Quantum many-body dynamics in two dimensions with artificial neural networks

In the last two decades the field of nonequilibrium quantum many-body physics has seen a rapid development driven, in particular, by the remarkable progress in quantum simulators, which today provide access to dynamics in quantum matter with an unprecedented control. However, the efficient numerical simulation of nonequilibrium real-time evolution in isolated quantum matter still remains a key challenge for current computational methods especially beyond one spatial dimension. In this talk I will present a versatile and efficient machine learning inspired approach. I will first introduce the general idea of encoding quantum many-body wave functions into artificial neural networks. I will then identify and resolve key challenges for the simulation of real-time evolution, which previously imposed significant limitations on the accurate description of large systems and long-time dynamics. As a concrete example, I will consider the dynamics of the paradigmatic two-dimensional transverse field Ising model, where we observe collapse and revival oscillations of ferromagnetic order and demonstrate that the reached time scales are comparable to or exceed the capabilities of state-of-the-art tensor network methods.

Heyl slides.pdf

## 31/03/2021, Wednesday, 18:00–19:00 Europe/Lisbon — Online

Steve Brunton, University of Washington
Machine learning for Fluid Mechanics

Many tasks in fluid mechanics, such as design optimization and control, are challenging because fluids are nonlinear and exhibit a large range of scales in both space and time. This range of scales necessitates exceedingly high-dimensional measurements and computational discretization to resolve all relevant features, resulting in vast data sets and time-intensive computations. Indeed, fluid dynamics is one of the original big data fields, and many high-performance computing architectures, experimental measurement techniques, and advanced data processing and visualization algorithms were driven by decades of research in fluid mechanics. Machine learning constitutes a growing set of powerful techniques to extract patterns and build models from this data, complementing the existing theoretical, numerical, and experimental efforts in fluid mechanics. In this talk, we will explore current goals and opportunities for machine learning in fluid mechanics, and we will highlight a number of recent technical advances. Because fluid dynamics is central to transportation, health, and defense systems, we will emphasize the importance of machine learning solutions that are interpretable, explainable, generalizable, and that respect known physics.

### Additional file

Brunton slides.pdf

## 09/04/2021, Friday, 14:00–15:00 Europe/Lisbon — Online

Pedro A. Santos, Instituto Superior Técnico and INESC-ID
Two-time scale stochastic approximation for reinforcement learning with linear function approximation

In this presentation, I will introduce some traditional Reinforcement Learning problems and algorithms, and analyze how some problems can be avoided and convergence results obtained using a two-time scale variation of the usual stochastic approximation approach.

This variation was inspired by the practical successes of Deep Q-Learning in attaining superhuman performance at some classical Atari games by Deepmind's research team in 2015. Machine Learning practical successes like this often have no corresponding explaining theory. The work that will be presented intends to contribute to that goal.

Joint work with Diogo Carvalho and Francisco Melo from INESC-ID.

### Additional file

Santos PA slides.pdf

## 14/04/2021, Wednesday, 18:00–19:00 Europe/Lisbon — Online

Gabriel Peyré, École Normale Supérieure
Scaling Optimal Transport for High dimensional Learning

Optimal transport (OT) has recently gained lot of interest in machine learning. It is a natural tool to compare in a geometrically faithful way probability distributions. It finds applications in both supervised learning (using geometric loss functions) and unsupervised learning (to perform generative model fitting). OT is however plagued by the curse of dimensionality, since it might require a number of samples which grows exponentially with the dimension. In this talk, I will explain how to leverage entropic regularization methods to define computationally efficient loss functions, approximating OT with a better sample complexity.

More information and references can be found on the website of our book
"Computational Optimal Transport", https://optimaltransport.github.io/

## 23/04/2021, Friday, 14:00–15:00 Europe/Lisbon — Online

Jan Peters, Technische Universitaet Darmstadt
Robot Learning - Quo Vadis?

Autonomous robots that can assist humans in situations of daily life have been a long standing vision of robotics, artificial intelligence, and cognitive sciences. A first step towards this goal is to create robots that can learn tasks triggered by environmental context or higher level instruction. However, learning techniques have yet to live up to this promise as only few methods manage to scale to high-dimensional manipulator or humanoid robots. In this talk, we investigate a general framework suitable for learning motor skills in robotics which is based on the principles behind many analytical robotics approaches. It involves generating a representation of motor skills by parameterized motor primitive policies acting as building blocks of movement generation, and a learned task module that transforms these movements into motor commands. We discuss learning on three different levels of abstraction, i.e., learning for accurate control is needed to execute, learning of motor primitives is needed to acquire simple movements, and learning of the task-dependent „hyperparameters“ of these motor primitives allows learning complex tasks. We discuss task-appropriate learning approaches for imitation learning, model learning and reinforcement learning for robots with many degrees of freedom. Empirical evaluations on a several robot systems illustrate the effectiveness and applicability to learning control on an anthropomorphic robot arm. These robot motor skills range from toy examples (e.g., paddling a ball, ball-in-a-cup, juggling) to playing robot table tennis against a human being and manipulation of various objects.

## 28/04/2021, Wednesday, 18:00–19:00 Europe/Lisbon — Online

Mikhail Belkin, Halicioğlu Data Science Institute, University of California San Diego
Two mathematical lessons of deep learning

Recent empirical successes of deep learning have exposed significant gaps in our fundamental understanding of learning and optimization mechanisms. Modern best practices for model selection are in direct contradiction to the methodologies suggested by classical analyses. Similarly, the efficiency of SGD-based local methods used in training modern models, appeared at odds with the standard intuitions on optimization.

First, I will present evidence, empirical and mathematical, that necessitates revisiting classical notions, such as over-fitting. I will continue to discuss the emerging understanding of generalization, and, in particular, the "double descent" risk curve, which extends the classical U-shaped generalization curve beyond the point of interpolation.

Second, I will discuss why the landscapes of over-parameterized neural networks are generically never convex, even locally. Instead, as I will argue, they satisfy the Polyak-Lojasiewicz condition across most of the parameter space instead, which allows SGD-type methods to converge to a global minimum.

A key piece of the puzzle remains - how does optimization align with statistics to form the complete mathematical picture of modern ML?

## 07/05/2021, Friday, 14:00–15:00 Europe/Lisbon — Online

Rebecca Willett, University of Chicago
Machine Learning and Inverse Problems: Deeper and More Robust

Many challenging image processing tasks can be described by an ill-posed linear inverse problem: deblurring, deconvolution, inpainting, compressed sensing, and superresolution all lie in this framework. Recent advances in machine learning and image processing have illustrated that it is often possible to learn a regularizer from training data that can outperform more traditional approaches by large margins. In this talk, I will describe the central prevailing themes of this emerging area and a taxonomy that can be used to categorize different problems and reconstruction methods. We will also explore mechanisms for model adaptation; that is, given a network trained to solve an initial inverse problem with a known forward model, we propose novel procedures that adapt the network to a perturbed forward model, even without full knowledge of the perturbation. Finally, I will describe a new class of approaches based on "infinite-depth networks" that can yield up to a 4dB PSNR improvement in reconstruction accuracy above state-of-the-art alternatives and where the computational budget can be selected at test time to optimize context-dependent trade-offs between accuracy and computation.

### Additional file

Willett slides.pdf

## 21/05/2021, Friday, 14:00–15:00 Europe/Lisbon — Online

Kyriakos Vamvoudakis, Georgia Institute of Technology
Learning-Based Actuator Placement and Receding Horizon Control for Security against Actuation Attacks

Cyber-physical systems (CPS) comprise interacting digital, analog, physical, and human components engineered for function through integrated physics and logic. Incorporating intelligence in CPS, however, makes their physical components more exposed to adversaries that can potentially cause failure or malfunction through actuation attacks. As a result, augmenting CPS with resilient control and design methods is of grave significance, especially if an actuation attack is stealthy. Towards this end, in the first part of the talk, I will present a receding horizon controller, which can deal with undetectable actuation attacks by solving a game in a moving horizon fashion. In fact, this controller can guarantee stability of the equilibrium point of the CPS, even if the attackers have an information advantage. The case where the attackers are not aware of the decision-making mechanism of one another is also considered, by exploiting the theory of bounded rationality. In the second part of the talk, and for CPS that have partially unknown dynamics, I will present an online actuator placement algorithm, which chooses the actuators of the CPS that maximize an attack security metric. It can be proved that the maximizing set of actuators is found in finite time, despite the CPS having uncertain dynamics.

### Additional file

Vamvoudakis slides.pdf

## 28/05/2021, Friday, 14:00–15:00 Europe/Lisbon — Online

Gustau Camps-Valls, Universitat de València
Physics Aware Machine Learning for the Earth Sciences

Most problems in Earth sciences aim to do inferences about the system, where accurate predictions are just a tiny part of the whole problem. Inferences mean understanding variables relations, deriving models that are physically interpretable, that are simple parsimonious, and mathematically tractable. Machine learning models alone are excellent approximators, but very often do not respect the most elementary laws of physics, like mass or energy conservation, so consistency and confidence are compromised. I will review the main challenges ahead in the field, and introduce several ways to live in the Physics and machine learning interplay that allows us (1) to encode differential equations from data, (2) constrain data-driven models with physics-priors and dependence constraints, (3) improve parameterizations, (4) emulate physical models, and (5) blend data-driven and process-based models. This is a collective long-term AI agenda towards developing and applying algorithms capable of discovering knowledge in the Earth system.

### Additional file

Camps-Valls slides.pdf

## 04/06/2021, Friday, 14:00–15:00 Europe/Lisbon — Online

Mathieu Blondel, Google Research, Brain team, Paris
Efficient and Modular Implicit Differentiation

Automatic differentiation (autodiff) has revolutionized machine learning. It allows expressing complex computations by composing elementary ones in creative ways and removes the burden of computing their derivatives by hand. More recently, differentiation of optimization problem solutions has attracted widespread attention with applications such as optimization as a layer, and in bi-level problems such as hyper-parameter optimization and meta-learning. However, the formulas for these derivatives often involve case-by-case tedious mathematical derivations. In this work, we propose a unified, efficient and modular approach for implicit differentiation of optimization problems. In our approach, the user defines (in Python in the case of our implementation) a function F capturing the optimality conditions of the problem to be differentiated. Once this is done, we leverage autodiff of F and implicit differentiation to automatically differentiate the optimization problem. Our approach thus combines the benefits of implicit differentiation and autodiff. It is efficient as it can be added on top of any state-of-the-art solver and modular as the optimality condition specification is decoupled from the implicit differentiation mechanism. We show that seemingly simple principles allow to recover many recently proposed implicit differentiation methods and create new ones easily. We demonstrate the ease of formulating and solving bi-level optimization problems using our framework. We also showcase an application to the sensitivity analysis of molecular dynamics.

### Additional file

Blondel slides.pdf

## 11/06/2021, Friday, 14:00–15:00 Europe/Lisbon — Online

Ulugbek Kamilov, University of Washington
Computational Imaging: Reconciling Physical and Learned Models

Computational imaging is a rapidly growing area that seeks to enhance the capabilities of imaging instruments by viewing imaging as an inverse problem. There are currently two distinct approaches for designing computational imaging methods: model-based and learning-based. Model-based methods leverage analytical signal properties and often come with theoretical guarantees and insights. Learning-based methods leverage data-driven representations for best empirical performance through training on large datasets. This talk presents Regularization by Artifact Removal (RARE), as a framework for reconciling both viewpoints by providing a learning-based extension to the classical theory. RARE relies on pre-trained “artifact-removing deep neural nets” for infusing learned prior knowledge into an inverse problem, while maintaining a clear separation between the prior and physics-based acquisition model. Our results indicate that RARE can achieve state-of-the-art performance in different computational imaging tasks, while also being amenable to rigorous theoretical analysis. We will focus on the applications of RARE in biomedical imaging, including magnetic resonance and tomographic imaging.

This talk will be based on the following references

• J. Liu, Y. Sun, C. Eldeniz, W. Gan, H. An, and U. S. Kamilov, “RARE: Image Reconstruction using Deep Priors Learned without Ground Truth,” IEEE J. Sel. Topics Signal Process., vol. 14, no. 6, pp. 1088-1099, October 2020.

• Z. Wu, Y. Sun, A. Matlock, J. Liu, L. Tian, and U. S. Kamilov, “SIMBA: Scalable Inversion in Optical Tomography using Deep Denoising Priors,” IEEE J. Sel. Topics Signal Process., vol. 14, no. 6, pp. 1163-1175, October 2020.

• J. Liu, Y. Sun, W. Gan, X. Xu, B. Wohlberg, and U. S. Kamilov, “SGD-Net: Efficient Model-Based Deep Learning with Theoretical Guarantees,” IEEE Trans. Comput. Imag., in press.

### Additional file

Kamilov slides.pdf

## 18/06/2021, Friday, 14:00–15:00 Europe/Lisbon — Online

This work develops a class of relaxations in between the big-M and convex hull formulations of disjunctions, drawing advantages from both. We show that this class leads to mixed-integer formulations for trained ReLU neural networks. The approach balances model size and tightness by partitioning node inputs into a number of groups and forming the convex hull over the partitions via disjunctive programming. At one extreme, one partition per input recovers the convex hull of a node, i.e., the tightest possible formulation for each node. For fewer partitions, we develop smaller relaxations that approximate the convex hull, and show that they outperform existing formulations. Specifically, we propose strategies for partitioning variables based on theoretical motivations and validate these strategies using extensive computational experiments. Furthermore, the proposed scheme complements known algorithmic approaches, e.g., optimization-based bound tightening captures dependencies within a partition.

This joint work with Calvin Tsay, Jan Kronqvist, Alexander Thebelt is based on two papers (https://arxiv.org/abs/2102.04373, https://arxiv.org/abs/2101.12708).

### Additional file

Misener slides.pdf

## 25/06/2021, Friday, 14:00–15:00 Europe/Lisbon — Online

Yuejie Chi, Carnegie Mellon University
Policy Optimization in Reinforcement Learning: A Tale of Preconditioning and Regularization

Policy optimization, which learns the policy of interest by maximizing the value function via large-scale optimization techniques, lies at the heart of modern reinforcement learning (RL). In addition to value maximization, other practical considerations arise commonly as well, including the need of encouraging exploration, and that of ensuring certain structural properties of the learned policy due to safety, resource and operational constraints. These considerations can often be accounted for by resorting to regularized RL, which augments the target value function with a structure-promoting regularization term, such as Shannon entropy, Tsallis entropy, and log-barrier functions. Focusing on an infinite-horizon discounted Markov decision process, this talk first shows that entropy-regularized natural policy gradient methods converge globally at a linear convergence that is near independent of the dimension of the state-action space. Next, a generalized policy mirror descent algorithm is proposed to accommodate a general class of convex regularizers beyond Shannon entropy. Encouragingly, this general algorithm inherits similar convergence guarantees, even when the regularizer lacks strong convexity and smoothness. Our results accommodate a wide range of learning rates, and shed light upon the role of regularization in enabling fast convergence in RL.

Chi slides.pdf

## 02/07/2021, Friday, 14:00–15:00 Europe/Lisbon — Online

Ard Louis, University of Oxford
Deep neural networks have an inbuilt Occam's razor

One of the most surprising properties of deep neural networks (DNNs) is that they perform best in the overparameterized regime. We are taught early on that having more parameters than data points is a terrible idea. So why do DNNs work so well in a regime where classical learning theory predicts they should heavily overfit? By adapting the coding theorem from algorithmic information theory (which every physicist should learn about!) we show that DNNs are exponentially biased at initialisation to functions that have low descriptional (Kolmogorov) complexity. In other words, DNNs have an inbuilt Occam's razor, a bias towards simple functions. We next show that stochastic gradient descent (SGD), the most popular optimisation method for DNNs, follows the same bias, and so does not itself explain the good generalisation of DNNs. Our approach naturally leads to a marginal-likelihood PAC-Bayes generalisation bound which performs better than any other bounds on the market. Finally, we discuss why this bias towards simplicity allows DNNs to perform so well, and speculate on what this may tell us about the natural world.

Louis slides.pdf

## 09/07/2021, Friday, 14:00–15:00 Europe/Lisbon — Online

In many emerging applications, it is of paramount interest to learn hidden parameters from data. For example, self-driving cars may use onboard cameras to identify pedestrians, highway lanes, or traffic signs in various light and weather conditions. Problems such as these can be framed as classification, regression, or risk minimization in general, at the heart of which lies stochastic optimization and machine learning. In many practical scenarios, distributed and decentralized learning methods are preferable as they benefit from a divide-and-conquer approach towards data at the expense of local (short-range) communication. In this talk, I will present our recent work that develops a novel algorithmic framework to address various aspects of decentralized stochastic first-order optimization methods for non-convex problems. A major focus will be to characterize regimes where decentralized solutions outperform their centralized counterparts and lead to optimal convergence guarantees. Moreover, I will characterize certain desirable attributes of decentralized methods in the context of linear speedup and network independent convergence rates. Throughout the talk, I will demonstrate such key aspects of the proposed methods with the help of provable theoretical results and numerical experiments on real data.

Khan slides.pdf

## 28/07/2021, Wednesday, 17:00–18:00 Europe/Lisbon — Online

Simon Du, University of Washington
Provable Representation Learning

Representation learning has been widely used in many applications. In this talk, I will present our work, which uncovers when and why representation learning provably improves the sample efficiency, from a statistical learning point of view. I will show

1. the existence of a good representation among all tasks, and
2. the diversity of tasks are key conditions that permit improved statistical efficiency via multi-task representation learning.

These conditions provably improve the sample efficiency for functions with certain complexity measures as the representation. If time permits, I will also talk about leveraging the theoretical insights to improve practical performance.

Du slides.pdf

## 16/09/2021, Thursday, 17:00–18:00 Europe/Lisbon — Online

J. Nathan Kutz, University of Washington
Deep learning for the discovery of parsimonious physics models

A major challenge in the study of dynamical systems is that of model discovery: turning data into reduced order models that are not just predictive, but provide insight into the nature of the underlying dynamical system that generated the data. We introduce a number of data-driven strategies for discovering nonlinear multiscale dynamical systems and their embeddings from data. We consider two canonical cases: (i) systems for which we have full measurements of the governing variables, and (ii) systems for which we have incomplete measurements. For systems with full state measurements, we show that the recent sparse identification of nonlinear dynamical systems (SINDy) method can discover governing equations with relatively little data and introduce a sampling method that allows SINDy to scale efficiently to problems with multiple time scales, noise and parametric dependencies. For systems with incomplete observations, we show that the Hankel alternative view of Koopman (HAVOK) method, based on time-delay embedding coordinates and the dynamic mode decomposition, can be used to obtain a linear models and Koopman invariant measurement systems that nearly perfectly captures the dynamics of nonlinear quasiperiodic systems. Neural networks are used in targeted ways to aid in the model reduction process. Together, these approaches provide a suite of mathematical strategies for reducing the data required to discover and model nonlinear multiscale systems.

Kutz slides.pdf

## 23/09/2021, Thursday, 10:00–11:00 Europe/Lisbon — Online

Leong Chuan Kwek, Nanyang Technological University, Singapore
Machine Learning and Quantum Technology

The rise of machine learning in recent times has remarkably transformed science and society. The goal of machine learning is to get computers to act without being explicitly programmed. Machine learning with deep reinforcement learning (RL) was recently recognized as a powerful tool to engineer dynamics in quantum system. Also, recently there has been some interest to exploit and leverage the limited available quantum resources for performing classically challenging tasks with noisy intermediate-scale quantum (NISQ) computers. Here, we discuss some of our recent work on the applications of machine learning to quantum systems.

## 30/09/2021, Thursday, 17:00–18:00 Europe/Lisbon — Online

Volkan Cevher, Laboratory for Information and Inference Systems – LIONS, EPFL
Optimization Challenges in Adversarial Machine Learning

Thanks to neural networks (NNs), faster computation, and massive datasets, machine learning (ML) is under increasing pressure to provide automated solutions to even harder real-world tasks beyond human performance with ever faster response times due to potentially huge technological and societal benefits. Unsurprisingly, the NN learning formulations present a fundamental challenge to the back-end learning algorithms despite their scalability, in particular due to the existence of traps in the non-convex optimization landscape, such as saddle points, that can prevent algorithms from obtaining “good” solutions.

In this talk, we describe our recent research that has demonstrated that the non-convex optimization dogma is false by showing that scalable stochastic optimization algorithms can avoid traps and rapidly obtain locally optimal solutions. Coupled with the progress in representation learning, such as over-parameterized neural networks, such local solutions can be globally optimal.

Unfortunately, this talk will also demonstrate that the central min-max optimization problems in ML, such as generative adversarial networks (GANs), robust reinforcement learning (RL), and distributionally robust ML, contain spurious attractors that do not include any stationary points of the original learning formulation. Indeed, we will describe how algorithms are subject to a grander challenge, including unavoidable convergence failures, which could explain the stagnation in their progress despite the impressive earlier demonstrations.

### Additional file

Cevher slides.pdf

## 14/10/2021, Thursday, 17:00–18:00 Europe/Lisbon — Online

The Neural Tangent Kernel is a new way to understand the gradient descent in deep neural networks, connecting them with kernel methods. In this talk, I'll introduce this formalism and give a number of results on the Neural Tangent Kernel and explain how they give us insight into the dynamics of neural networks during training and into their generalization features.

Based off joint works with Arthur Jacot and Franck Gabriel.

## 21/10/2021, Thursday, 17:00–18:00 Europe/Lisbon — Online

Constantino Tsallis, Group of Statistical Physics, CBPF and Santa Fe Institute
Statistical mechanics for complex systems

Together with Newtonian mechanics, Maxwell electromagnetism, Einstein relativity and quantum mechanics, Boltzmann-Gibbs (BG) statistical mechanics constitutes one of the pillars of contemporary theoretical physics, with uncountable applications in science and technology. This theory applies formidably well to a plethora of physical systems. Still, it fails in the realm of complex systems, characterized by generically strong space-time entanglement of their elements. On the basis of a nonadditive entropy (defined by an index $q$, which recovers, for $q=1$, the celebrated Boltzmann-Gibbs-von Neumann-Shannon entropy), it is possible to generalize the BG theory. We will briefly review the foundations and applications in natural, artificial and social systems.

A Bibliography is available at http://tsallis.cat.cbpf.br/biblio.htm

### Additional file

Tsallis slides.pdf

## 04/11/2021, Thursday, 17:00–18:00 Europe/Lisbon — Online

We will review physics-informed neural network and summarize available theoretical results. We will also introduce new NNs that learn functionals and nonlinear operators from functions and corresponding responses for system identification. The universal approximation theorem of operators is suggestive of the potential of NNs in learning from scattered data any continuous operator or complex system. We first generalize the theorem to deep neural networks, and subsequently we apply it to design a new composite NN with small generalization error, the deep operator network (DeepONet), consisting of a NN for encoding the discrete input function space (branch net) and another NN for encoding the domain of the output functions (trunk net). We demonstrate that DeepONet can learn various explicit operators, e.g., integrals, Laplace transforms and fractional Laplacians, as well as implicit operators that represent deterministic and stochastic differential equations. More generally, DeepOnet can learn multiscale operators spanning across many scales and trained by diverse sources of data simultaneously.

## 11/11/2021, Thursday, 17:00–18:00 Europe/Lisbon — Online

Michael Arbel, INRIA Grenoble Rhône-Alpes
Annealed Flow Transport Monte Carlo

Annealed Importance Sampling (AIS) and its Sequential Monte Carlo (SMC) extensions are state-of-the-art methods for estimating normalizing constants of probability distributions. We propose here a novel Monte Carlo algorithm, Annealed Flow Transport (AFT), that builds upon AIS and SMC and combines them with normalizing flows (NF) for improved performance. This method transports a set of particles using not only importance sampling (IS), Markov chain Monte Carlo (MCMC) and resampling steps — as in SMC, but also relies on NF which are learned sequentially to push particles towards the successive annealed targets. We provide limit theorems for the resulting Monte Carlo estimates of the normalizing constant and expectations with respect to the target distribution. Additionally, we show that a continuous-time scaling limit of the population version of AFT is given by a Feynman-Kac measure which simplifies to the law of a controlled diffusion for expressive NF. We demonstrate experimentally the benefits and limitations of our methodology on a variety of applications.

Arbel slides.pdf

## 25/11/2021, Thursday, 17:00–18:00 Europe/Lisbon — Online

Precipitation nowcasting, the high-resolution forecasting of precipitation up to two hours ahead, supports the real-world socioeconomic needs of many sectors reliant on weather-dependent decision-making. State-of-the-art operational nowcasting methods typically advect precipitation fields with radar-based wind estimates, and struggle to capture important non-linear events such as convective initiations. Recently introduced deep learning methods use radar to directly predict future rain rates, free of physical constraints. While they accurately predict low-intensity rainfall, their operational utility is limited because their lack of constraints produces blurry nowcasts at longer lead times, yielding poor performance on rarer medium-to-heavy rain events. Here we present a deep generative model for the probabilistic nowcasting of precipitation from radar that addresses these challenges. Using statistical, economic and cognitive measures, we show that our method provides improved forecast quality, forecast consistency and forecast value. Our model produces realistic and spatiotemporally consistent predictions over regions up to 1,536 km × 1,280 km and with lead times from 5–90 min ahead. Using a systematic evaluation by more than 50 expert meteorologists, we show that our generative model ranked first for its accuracy and usefulness in 89% of cases against two competitive methods. When verified quantitatively, these nowcasts are skillful without resorting to blurring. We show that generative nowcasting can provide probabilistic predictions that improve forecast value and support operational utility, and at resolutions and lead times where alternative methods struggle.

## 02/12/2021, Thursday, 17:00–18:00 Europe/Lisbon — Online

Soledad Villar, Mathematical Institute for Data Science at Johns Hopkins University
Equivariant machine learning structure like classical physics

There has been enormous progress in the last few years in designing neural networks that respect the fundamental symmetries and coordinate freedoms of physical law. Some of these frameworks make use of irreducible representations, some make use of high-order tensor objects, and some apply symmetry-enforcing constraints. Different physical laws obey different combinations of fundamental symmetries, but a large fraction (possibly all) of classical physics is equivariant to translation, rotation, reflection (parity), boost (relativity), and permutations. Here we show that it is simple to parameterize universally approximating polynomial functions that are equivariant under these symmetries, or under the Euclidean, Lorentz, and Poincaré groups, at any dimensionality $d$. The key observation is that nonlinear $O(d)$-equivariant (and related-group-equivariant) functions can be expressed in terms of a lightweight collection of scalars–scalar products and scalar contractions of the scalar, vector, and tensor inputs. These results demonstrate theoretically that gauge-invariant deep learning models for classical physics with good scaling for large problems are feasible right now.

Villar.pdf

## 09/12/2021, Thursday, 17:00–18:00 Europe/Lisbon — Online

Pier Luigi Dragotti, Department of Electrical and Electronic Engineering, Imperial College, London
Computational Imaging for Art investigation and for Neuroscience

The revolution in sensing, with the emergence of many new imaging techniques, offers the possibility of gaining unprecedented access tothe physical world, but this revolution can only bear fruit through the skilful interplay between the physical and computational worlds. This is the domain of computational imaging which advocates that, to develop effective imaging systems, it will be necessary to go beyond the traditional decoupled imaging pipeline where device physics, image processing and the end-user application are considered separately. Instead, we need to rethink imaging as an integrated sensing and inference model. In this talk we cover two research areas where computational imaging is likely to have an impact.

We first focus on the heritage sector which is experiencing a digital revolution driven in part by the increasing use of non-invasive, non-destructive imaging techniques. These new imaging methods provide a way to capture information about an entire painting and can give us information about features at or below the surface of the painting. We focus on Macro X-Ray Fluorescence (XRF) scanning which is a technique for the mapping of chemical elements in paintings. After describing in broad terms the working of this device, a method that can process XRF scanning data from paintings is introduced. The method is based on connecting the problem of extracting elemental maps in XRF data to Prony's method, a technique broadly used in engineering to estimate frequencies of a sum of sinusoids. The results presented show the ability of our method to detect and separate weak signals related to hidden chemical elements in the paintings. We then discuss results on the Leonardo’s The Virgin of the Rocks and show that our algorithm is able to reveal, more clearly than ever before, the hidden drawings of a previous composition that Leonardo then abandoned for the painting that we can now see.

In the second part of the talk, we focus on two-photon microscopy and neuroscience. To understand how networks of neurons process information, it is essential to monitor their activity in living tissue. Multi-photon microscopy is unparalleled in its ability to image cellular activity and neural circuits, deep in living tissue, at single-cell resolution. However, in order to achieve step changes in our understanding of brain function, large-scale imaging studies of neural populations are needed and this can be achieved only by developing computational tools that can enhance the quality of the data acquired and can scan 3-D volumes quickly. In this talk we introduce light-field microscopy and present a method to localize neurons in 3-D. The method is based on the use of proper sparsity priors, novel optimization strategies and machine learning.

This is joint work with A. Foust, P. Song, C. Howe, H. Verinaz, J. Huang and Y.Su from Imperial College London, and C. Higgitt and N. Daly from The National Gallery in London

### Additional file

Dragotti slides.pdf