2020 seminars

Europe/Lisbon
Online

Sebastian Engelke
Sebastian Engelke, University of Geneva

Machine learning beyond the data range: extreme quantile regression

Machine learning methods perform well in prediction tasks within the range of the training data. When interest is in quantiles of the response that go beyond the observed records, these methods typically break down. Extreme value theory provides the mathematical foundation for estimation of such extreme quantiles. A common approach is to approximate the exceedances over a high threshold by the generalized Pareto distribution. For conditional extreme quantiles, one may model the parameters of this distribution as functions of the predictors. Up to now, the existing methods are either not flexible enough or do not generalize well in higher dimensions. We develop new approaches for extreme quantile regression that estimate the parameters of the generalized Pareto distribution with tree-based methods and recurrent neural networks. Our estimators outperform classical machine learning methods and methods from extreme value theory in simulations studies. We illustrate how the recurrent neural network model can be used for effective forecasting of flood risk.

Additional file

document preview

Engelke slides.pdf

Europe/Lisbon
Online

Alhussein Fawzi
Alhussein Fawzi, DeepMind

Discovering faster matrix multiplication algorithms with deep reinforcement learning

Improving the efficiency of algorithms for fundamental computational tasks such as matrix multiplication can have widespread impact, as it affects the overall speed of a large amount of computations. The automatic discovery of algorithms using machine learning offers the prospect of reaching beyond human intuition and outperforming the current best human-designed algorithms. In this talk I'll present AlphaTensor, our reinforcement learning agent based on AlphaZero for discovering efficient and provably correct algorithms for the multiplication of arbitrary matrices. AlphaTensor discovered algorithms that outperform the state-of-the-art complexity for many matrix sizes. Particularly relevant is the case of 4 × 4 matrices in a finite field, where AlphaTensor's algorithm improves on Strassen's two-level algorithm for the first time since its discovery 50 years ago. I'll present our problem formulation as a single-player game, the key ingredients that enable tackling such difficult mathematical problems using reinforcement learning, and the flexibility of the AlphaTensor framework.

Europe/Lisbon
Room P3.10, Mathematics Building — Online

Yang-Hui He
Yang-Hui He, London Institute for Mathematical Sciences & Merton College, Oxford University

A data science driven approach to physics and mathematics I

In this self-contained lecture series, we will look at a computational and data science driven approach to problems in physics and mathematics.

We will focus on explicit constructions in specific case studies which have emerged over the past decades.

Finally, we discuss some recent developments in using neural networks and machine-learning using the test case of mathematical problems related to geometries that crop up in string theory, namely Calabi-Yau geometries.

This subject has been a fruitful cross-fertilization between mathematics, physics and computer science.

Part of the MPML 2nd Lecture Series

The mini-course is aimed at advanced Masters'  students and beginning Ph.D. students in physics, mathematics and engineering who do not need any prior exposure to these topics. 

All technical details necessary for understanding any of the problems we consider will be introduced at a level accessible to a non-specialist. 

The lectures will involve some live coding demonstrations however. A basic familiarity with Mathematica would be helpful.

Europe/Lisbon
Abreu Faro Amphitheatre, Interdisciplinary Complex — Online

Yang-Hui He
Yang-Hui He, London Institute for Mathematical Sciences & Merton College, Oxford University

Universes as Bigdata: Physics, Geometry and Machine-Learning

The search for the Theory of Everything has led to superstring theory, which then led physics, first to algebraic/differential geometry/topology, and then to computational geometry, and now to data science. With a concrete playground of the geometric landscape, accumulated by the collaboration of physicists, mathematicians and computer scientists over the last 4 decades, we show how the latest techniques in machine-learning can help explore problems of interest to theoretical physics and to pure mathematics. At the core of our programme is the question: how can AI help us with mathematics?

Additional file

document preview

He slides.pdf

Europe/Lisbon
Online

Ben Edelman
Ben Edelman, Harvard University

Studies in feature learning through the lens of sparse boolean functions

How do deep neural networks learn to construct useful features? Why do self-attention-based networks such as transformers perform so well on combinatorial tasks such as language learning? Why do some capabilities of networks emerge "discontinuously" as the computational resources used for training are scaled up? We will present perspectives on these questions through the lens of a particular class of simple synthetic tasks: learning sparse boolean functions. In part one, we will show that the hypothesis class of one-layer transformers can learn these functions in a statistically efficient manner. This leads to a view of each layer of a transformer as creating new "variables" out of sparse combinations of the previous layer's outputs. In part two, we will focus on the classic task of learning sparse parities, which is statistically easy but computationally difficult. We will demonstrate that SGD on various neural networks (transformers, MLPs, etc.) successfully learns sparse parities, with computational efficiency that is close to known lower bounds. Moreover, the training curves display no apparent progress for a long time, and then quickly drop late in training. We show that despite this apparent delayed breakthrough in performance, hidden progress is actually being made throughout the course of training.

Based on joint work with Surbhi Goel, Sham Kakade, Cyril Zhang, Boaz Barak, and Eran Malach:
https://arxiv.org/abs/2110.10090
https://arxiv.org/abs/2207.08799

Additional file

document preview

Edelman slides.pdf

Europe/Lisbon
Online

Sara A. Solla
Sara A. Solla, Northwestern University | NU · Department of Neuroscience; Department of Physics and Astronomy

Low Dimensional Manifolds for Neural Dynamics

The ability to simultaneously record the activity from tens to hundreds to thousands of neurons has allowed us to analyze the computational role of population activity as opposed to single neuron activity. Recent work on a variety of cortical areas suggests that neural function may be built on the activation of population-wide activity patterns, the neural modes, rather than on the independent modulation of individual neural activity. These neural modes, the dominant covariation patterns within the neural population, define a low dimensional neural manifold that captures most of the variance in the recorded neural activity. We refer to the time-dependent activation of the neural modes as their latent dynamics and argue that latent cortical dynamics within the manifold are the fundamental and stable building blocks of neural population activity.

Additional file

document preview

Solla slides.pdf

Europe/Lisbon
Online

Gonçalo Correia
Gonçalo Correia, IST and Priberam Labs

Learnable Sparsity and Weak Supervision for Data-Efficient, Transparent, and Compact Neural Models

Neural network models have become ubiquitous in Machine Learning literature. These models are compositions of differentiable building blocks that result in dense representations of the underlying data. To obtain good representations, conventional neural models require many training data points. Moreover, those representations, albeit capable of obtaining a high performance on many tasks, are largely uninterpretable. These models are often overparameterized and give out representations that do not compactly represent the data. To address these issues, we find solutions in sparsity and various forms of weak supervision. For data-efficiency, we leverage transfer learning as a form of weak supervision. The proposed model can perform similarly to models trained on millions of data points on a sequence-to-sequence generation task, even though we only train it on a few thousand. For transparency, we propose a probability normalizing function that can learn its sparsity. The model learns the sparsity it needs differentiably and thus adapts it to the data according to the neural component's role in the overall structure. We show that the proposed model improves the interpretability of a popular neural machine translation architecture when compared to conventional probability normalizing functions. Finally, for compactness, we uncover a way to obtain exact gradients of discrete and structured latent variable models efficiently. The discrete nodes in these models can compactly represent implicit clusters and structures in the data, but training them was often complex and prone to failure since it required approximations that rely on sampling or relaxations. We propose to train these models with exact gradients by parameterizing discrete distributions with sparse functions, both unstructured and structured. We obtain good performance on three latent variable model applications while still achieving the practicality of the approximations mentioned above. Through these novel contributions, we challenge the conventional wisdom that neural models cannot exhibit data-efficiency, transparency, or compactness.

Europe/Lisbon
Online

Valentin De Bortoli
Valentin De Bortoli, Center for Sciences of Data, ENS Ulm, Paris

Diffusion models, theory and methodology

Generative modeling is the task of drawing new samples from an underlying distribution known only via an empirical measure. There exists a myriad of models to tackle this problem with applications in image and speech processing, medical imaging, forecasting and protein modeling to cite a few. Among these methods diffusion models are a new powerful class of generative models that exhibit remarkable empirical performance. They consist of a “noising” stage, whereby a diffusion is used to gradually add Gaussian noise to data, and a generative model, which entails a “denoising” process defined by approximating the time-reversal of the diffusion. In this talk we discuss three aspects of diffusion models. First, we will dive into the methodology behind diffusion models. Second, we will present some of their theoretical guarantees with an emphasis on their behavior under the so-called manifold hypothesis. Such theoretical guarantees are non-vacuous and provide insight on the empirical behavior of these models. Finally, I will present an extension of diffusion models to the Optimal Transport setting and introduce Diffusion Schrodinger Bridges.

Additional file

document preview

de Bortoli slides.pdf

Europe/Lisbon
Online

Memming Park
Memming Park, Champalimaud Foundation

On learning signals in recurrent networks

Neural dynamical systems with stable attractor structures such as point attractors and continuous attractors are widely hypothesized to underlie meaningful temporal behavior that requires working memory. However, perhaps counterintuitively, having good working memory is not sufficient for supporting useful learning signals that are necessary to adapt to changes in the temporal structure of the environment. We show that in addition to the well-known continuous attractors, the periodic and quasi-periodic attractors are also fundamentally capable of supporting learning arbitrarily long temporal relationships. Due to the fine tuning problem of the continuous attractors and the lack of temporal fluctuations, we believe the less explored quasi-periodic attractors are uniquely qualified for learning to produce temporally structured behavior. Our theory has wide implications for the design of artificial learning systems, and makes predictions on the observable signatures of biological neural dynamics that can support temporal dependence learning. Based on our theory, we developed a new initialization scheme for artificial recurrent neural networks which outperforms standard methods for tasks that require learning temporal dynamics. Finally, we speculate on their biological implementations and make predictions on neuronal dynamics.

Additional file

document preview

Park slides.pdf

Europe/Lisbon
Online

Rongjie Lai
Rongjie Lai, Rensselaer Polytechnic Institute

Learning Manifold-Structured Data using Deep Neural Networks: Theory and Applications

Deep artificial neural networks have made great success in many problems in science and engineering. In this talk, I will discuss our recent efforts to develop DNNs capable of learning non-trivial geometry information hidden in data. In the first part, I will discuss our work on advocating the use of a multi-chart latent space for better data representation. Inspired by differential geometry, we propose a Chart Auto-Encoder (CAE) and prove a universal approximation theorem on its representation capability. CAE admits desirable manifold properties that conventional auto-encoders with a flat latent space fail to obey. We further establish statistical guarantees on the generalization error for trained CAE models and show their robustness to noise. Our numerical experiments also demonstrate satisfactory performance on data with complicated geometry and topology. If time permits, I will discuss our work on defining convolution on manifolds via parallel transport. This geometric way of defining parallel transport convolution (PTC) provides a natural combination of modeling and learning on manifolds. PTC allows for the construction of compactly supported filters and is also robust to manifold deformations. I will demonstrate its applications to shape analysis and point clouds processing using PTC-nets. This talk is based on a series of joint work with my students and collaborators.

Additional file

document preview

Lai slides.pdf

Europe/Lisbon
Online

Paulo Rosa
Paulo Rosa, Deimos

Deep Reinforcement Learning based Integrated Guidance and Control for a Launcher Landing Problem

Deep Reinforcement Learning (Deep-RL) has received considerable attention in recent years due to its ability to make an agent learn how to take optimal control actions, given rich observation data via the maximization of a reward function. Future space missions will need new on-board autonomy capabilities with increasingly complex requirements at the limits of the vehicle performance. This justifies the use of machine learning based techniques, in particular reinforcement learning in order to allow exploring the edge of the performance trade-off space. The guidance and control systems development for Reusable Launch Vehicles (RLV) can take advantage of reinforcement learning techniques for optimal adaption in the face of multi-objective requirements and uncertain scenarios.

In AI4GNC - a project funded by the European Space Agency (ESA), led by DEIMOS and participated by INESC-ID, the University of Lund, and TASC - a Deep-RL algorithm was used to train an actor-critic agent to simultaneously control the engine thrust magnitude and the two TVC gimbal angles to land a RLV in 6-DoF simulation. The design followed an incremental approach, progressively augmenting the number of degrees of freedom and introducing more complexity factors such as nonlinearity in models. Ultimately, the full 6-DoF problem was addressed using a high fidelity simulator that includes a nonlinear actuator model and a realistic vehicle aerodynamic model. Starting from an initial vehicle state along a reentry trajectory, the problem consists of precisely land the RLV while ensuring system requirements satisfaction, such as saturation and rate limits in the actuation, and aiming at fuel consumption optimality. The Deep Deterministic Policy Gradient (DDPG) algorithm was adopted as candidate strategy to allow the design of an integrated guidance and control algorithm in continuous action and observation spaces.

The results obtained are very satisfactory in terms of landing accuracy and fuel consumption. These results were also compared to a more classical and industrially used solution, due to its capability to yield satisfactory landing accuracy and fuel consumption, composed of a successive convexification guidance and a PID controller tuned independently for the non-disturbed nominal scenario. A reachability analysis was also performed to assess the stability and robustness of the closed-loop system composed by the integrated guidance and control NN, trained for the 1-DoF scenario, and the RLV dynamics.

Taking into account the fidelity of the benchmark adopted and the results obtained, this approach is deemed to have a significant potential for further developments and ultimately space industry applications, such as In-Orbit Servicing (IOS) and Active Debris Removal (ADR), that also require a high level of autonomy.

Additional file

document preview

Rosa slides.pdf

Europe/Lisbon
Online

Diogo Gomes

Mathematics for data science and AI - curriculum design, experiences, and lessons learned

In this talk, we will explore the importance of mathematical foundations for AI and data science and the design of an academic curriculum for graduate students. While traditional mathematics for AI and data science has focused on core techniques like linear algebra, basic probability, and optimization methods (e.g., gradient and stochastic gradient descent), several advanced mathematical techniques are now essential to understanding modern data science. These include ideas from the calculus of variations in spaces of random variables, functional analytic methods, ergodic theory, control theory methods in reinforcement learning, and metrics in spaces of probability measures. We will discuss the author's experience designing an applied mathematics curriculum on data science and draw on the author's experience and lessons learned in teaching an advanced course on the mathematical foundations of data science. This talk aims to promote discussion and exchange of ideas on how mathematicians can play an important role in AI and data science and better equip our students to excel in this field.

Additional file

document preview

Gomes Diogo slides.pdf

Europe/Lisbon
Online

Harry Desmond
Harry Desmond, University of Portsmouth

Exhaustive Symbolic Regression (or how to find the best function for your data)

Symbolic regression aims to find optimal functional representation of datasets, with broad applications across science. This is traditionally done using a “genetic algorithm” which stochastically searches function space using an evolution-inspired method for generating new trial functions. Motivated by the uncertainties inherent in this approach — and its failure on seemingly simple test cases — I will describe a new method which exhaustively searches and evaluates function space. Coupled to a model selection principle based on minimum description length, Exhaustive Symbolic Regression is guaranteed to find the simple equations that optimally balance simplicity with accuracy on any dataset. I will describe how the method works and showcase it on Hubble rate measurements and dynamical galaxy data.

Based on work with Deaglan Bartlett and Pedro G. Ferreira:
https://arxiv.org/abs/2211.11461
https://arxiv.org/abs/2301.04368

Additional file

document preview

Desmond_slides.pdf

Europe/Lisbon
Room P3.10, Mathematics Building — Online

Rui Castro
Rui Castro, Mathematics Department, TU Eindhoven

Anomaly detection for a large number of streams: a permutation/rank-based higher criticism approach

Anomaly detection when observing a large number of data streams is essential in a variety of applications, ranging from epidemiological studies to monitoring of complex systems. High-dimensional scenarios are usually tackled with scan-statistics and related methods, requiring stringent distributional assumptions for proper test calibration. In this talk we take a non-parametric stance, and introduce two variants of the higher criticism test that do not require knowledge of the null distribution for proper calibration. In the first variant we calibrate the test by permutation, while in the second variant we use a rank-based approach. Both methodologies result in exact tests in finite samples. Our permutation methodology is applicable when observations within null streams are independent and identically distributed, and we show this methodology is asymptotically optimal in the wide class of exponential models. Our rank-based methodology is more flexible, and only requires observations within null streams to be independent. We provide an asymptotic characterization of the power of the test in terms of the probability of mis-ranking null observations, showing that the asymptotic power loss (relative to an oracle test) is minimal for many common models. As the proposed statistics do not rely on asymptotic approximations, they typically perform better than popular variants of higher criticism relying on such approximations. Finally, we demonstrate the use of these methodologies when monitoring the content uniformity of an active ingredient for a batch-produced drug product, and monitoring the daily number of COVID-19 cases in the Netherlands.

Based on joint work with Ivo Stoepker, Ery Arias-Castro and Edwin van de den Heuvel:
https://arxiv.org/abs/2009.03117

Europe/Lisbon
Online

Andreas Döpp
Andreas Döpp, Ludwig-Maximilians-Universität München | Faculty of Physics

Machine-learning strategies in laser-plasma physics

The field of laser-plasma physics has experienced significant advancements in the past few decades, owing to the increasing power and accessibility of high-power lasers. Initially, research in this area was limited to single-shot experiments with minimal exploration of parameters. However, recent technological advancements have enabled the collection of a wealth of data through both experimental and simulation-based approaches.

In this seminar talk, I will present a range of machine learning techniques that we have developed for applications in laser-plasma physics [1]. The first part of my talk will focus on Bayesian optimization, where I will showcase our latest findings on multi-objective and multi-fidelity optimization of laser-plasma accelerators and neural networks [2-4].

In the second part of the talk, I will discuss machine learning solutions for tackling complex inverse problems, such as image deblurring or extracting 3D information from 2D sensors [5-6]. Specifically, I will discuss various adaptations of established convolutional network architectures, such as the U-Net, as well as novel physics-informed retrieval methods like deep algorithm unrolling. These techniques have shown promising results in overcoming the challenges posed by these intricate inverse problems.

References

  1. Data-driven Science and Machine Learning Methods in Laser-Plasma Physics
  2. Expected hypervolume improvement for simultaneous multi-objective and multi-fidelity optimization
  3. Multi-objective and multi-fidelity Bayesian optimization of laser-plasma acceleration
  4. Pareto Optimization of a Laser Wakefield Accelerator
  5. Measuring spatio-temporal couplings using modal spatio-spectral wavefront retrieval
  6. Hyperspectral Compressive Wavefront Sensing

Additional file

document preview

Doepp slides.pdf

Europe/Lisbon
Online

Sara Magliacane
Sara Magliacane, University of Amsterdam and MIT-IBM Watson AI Lab

Causal vs causality-inspired representation learning

Causal representation learning (CRL) aims at learning causal factors and their causal relations from high-dimensional observations, e.g. images. In general, this is an ill-posed problem, but under certain assumptions or with the help of additional information or interventions, we are able to guarantee that the representations we learn are corresponding to some true underlying causal factors up to some equivalence class.

In this talk I will first present CITRIS, a variational autoencoder framework for causal representation learning from temporal sequences of images, in systems in which we can perform interventions. CITRIS exploits temporality and observing intervention targets to identify scalar and multidimensional causal factors, such as 3D rotation angles. In experiments on 3D rendered image sequences, CITRIS outperforms previous methods on recovering the underlying causal variables. Moreover, using pretrained autoencoders, CITRIS can even generalize to unseen instantiations of causal factors.

While CRL is an exciting and promising new field of research, the assumptions required by CITRIS and other current CRL methods can be difficult to satisfy in many settings. Moreover, in many practical cases learning representations that are not guaranteed to be fully causal, but exploit some ideas from causality, can still be extremely useful. As examples, I will describe some of our work on exploiting these "causality-inspired" representations for adapting policies across domains in RL and to nonstationary environments, and how learning a factored graphical representations (even if not necessarily causal) can be beneficial in these (and possibly other) settings.

Additional file

document preview

Magliacane slides.pdf

Europe/Lisbon
Room P3.10, Mathematics Building — Online

Mário Figueiredo
Mário Figueiredo, Instituto Superior Técnico and IT

Causal Discovery from Observations: Introduction and Some Recent Advances

Causal discovery is an active research field that aims to uncover the underlying causal mechanisms that drive the relationship between a collection of variables and which has applications in many areas, including medicine, biology, economics, and social sciences. In principle, identifying causal relationships requires interventions. However, intervening is often impossible, impractical, or unethical, which has stimulated much research on causal discovery from purely observational data or mixed observational-interventional data. In this talk, after overviewing the causal discovery field, I will discuss some recent advances, namely on causal discovery from data with latent interventions and on what is the quintessential causal discovery problem: distinguishing the cause from the effect on a pair of dependent variables.

Additional file

document preview

Figueiredo slides 2023.pdf

Europe/Lisbon
Online

Artemy Kolchinsky
Artemy Kolchinsky, Universal Biology Institute, University of Tokyo

Information geometry for nonequilibrium processes

Recently, there has been dramatic progress in nonequilibrium thermodynamics, with diverse applications in biological and chemical systems. The central quantity of interest in the field is “entropy production” (EP), which reflects the increase of the entropy of a system and its environment. Major questions of interest include (1) quantitative tradeoffs between EP and performance measures like speed and precision, (2) inference of EP from data, and (3) decomposition of EP into contributions from different sources of dissipation. In this work, we study the thermodynamics of nonequilibrium processes by considering the information geometry of fluxes. Our approach can be seen as a dynamical generalization of existing work on the information geometry of probability distributions considered at a given instant in time. It is applicable to a broad range of nonequilibrium processes, including nonlinear ones that exhibit oscillations and/or chaos, and it has implications for thermodynamic tradeoffs, thermodynamic inference, and decompositions of EP. As one application, we derive a universal decomposition of EP into “excess” and “housekeeping” contributions, representing contributions from nonstationarity and cyclic fluxes respectively.

Joint work with Andreas Dechant, Kohei Yoshimura, Sosuke Ito. arXiv:2206.14599

Europe/Lisbon
Online

Olga Mula
Olga Mula, TU Eindhoven

Optimal State and Parameter Estimation Algorithms and Applications to Biomedical Problems

In this talk, I will present an overview of recent works aiming at solving inverse problems (state and parameter estimation) by combining optimally measurement observations and parametrized PDE models. After defining a notion of optimal performance in terms of the smallest possible reconstruction error that any reconstruction algorithm can achieve, I will present practical numerical algorithms based on nonlinear reduced models for which we can prove that they can deliver a performance close to optimal. The proposed concepts may be viewed as exploring alternatives to Bayesian inversion in favor of more deterministic notions of accuracy quantification. I will illustrate the performance of the approach on simple benchmark examples and we will also discuss applications of the methodology to biomedical problems which are challenging due to shape variability.

https://arxiv.org/pdf/2203.07769.pdf
https://arxiv.org/pdf/2009.02687.pdf