2020 seminars

Europe/Lisbon
Online

Dan Roberts
Dan Roberts, MIT, Center for Theoretical Physics

The Principles of Deep Learning Theory

Deep learning is an exciting approach to modern artificial intelligence based on artificial neural networks. The goal of this talk is to provide a blueprint — using tools from physics — for theoretically analyzing deep neural networks of practical relevance. This task will encompass both understanding the statistics of initialized deep networks and determining the training dynamics of such an ensemble when learning from data.

This talk is based on a book, The Principles of Deep Learning Theory, co-authored with Sho Yaida and based on research also in collaboration with Boris Hanin. It will be published next year by Cambridge University Press.

Additional file

document preview

Roberts slides.pdf

Europe/Lisbon
Online

Anders Hansen
Anders Hansen, Faculty of Mathematics and Department of Applied Mathematics and Theoretical Physics, University of Cambridge

Why things don’t work — On the extended Smale's 9th and 18th problems (the limits of AI) and methodological barriers

The alchemists wanted to create gold, Hilbert wanted an algorithm to solve Diophantine equations, researchers want to make deep learning robust in AI, MATLAB wants (but fails) to detect when it provides wrong solutions to linear programs etc. Why does one not succeed in so many of these fundamental cases? The reason is typically methodological barriers. The history of science is full of methodological barriers — reasons for why we never succeed in reaching certain goals. In many cases, this is due to the foundations of mathematics. We will present a new program on methodological barriers and foundations of mathematics, where — in this talk — we will focus on two basic problems: (1) The instability problem in deep learning: Why do researchers fail to produce stable neural networks in basic classification and computer vision problems that can easily be handled by humans — when one can prove that there exist stable and accurate neural networks? Moreover, AI algorithms can typically not detect when they are wrong, which becomes a serious issue when striving to create trustworthy AI. The problem is more general, as for example MATLAB's linprog routine is incapable of certifying correct solutions of basic linear programs. Thus, we’ll address the following question: (2) Why are algorithms (in AI and computations in general) incapable of determining when they are wrong? These questions are deeply connected to the extended Smale’s 9th and 18th problems on the list of mathematical problems for the 21st century.

Additional file

document preview

Hansen slides.pdf

Europe/Lisbon
Online

Joosep Pata
Joosep Pata, National Institute of Chemical Physics and Biophysics, Estonia

Machine learning for data reconstruction at the LHC

Physics analyses at the CERN experiments rely on detector hits being interpreted or reconstructed as particle candidates. The data reconstruction systems are built on decades of physics and detector knowledge and must operate reliably on petabytes of data in diverse computing centers spread around the world. In the recent years, machine learning (ML) is playing an increasingly important role at the LHC experiments for reconstructing and interpreting the data, from calibrating the detector readouts to the final interpretation for complex signal processes. We will discuss the various aspects of ML at the LHC experiments, focusing on data reconstruction and particle identification approaches using modern machine learning methods such as graph neural networks. We will bring a concrete detailed example from machine learned particle flow (MLPF), an R&D effort to develop a fully optimizable particle flow reconstruction across detector subsystems in CMS.

Additional file

document preview

Pata slides.pdf

Europe/Lisbon
Online

André F. T. Martins
André F. T. Martins, Instituto Superior Técnico

From Sparse Modeling to Sparse Communication

Neural networks and other machine learning models compute continuous representations, while humans communicate mostly through discrete symbols. Reconciling these two forms of communication is desirable for generating human-readable interpretations or learning discrete latent variable models, while maintaining end-to-end differentiability.

In the first part of the talk, I will describe how sparse modeling techniques can be extended and adapted for facilitating sparse communication in neural models. The building block is a family of sparse transformations called alpha-entmax, a drop-in replacement for softmax, which contains sparsemax as a particular case. Entmax transformations are differentiable and (unlike softmax) they can return sparse probability distributions, useful to build interpretable attention mechanisms. Variants of these sparse transformations have been applied with success to machine translation, natural language inference, visual question answering, and other tasks.

In the second part, I will introduce mixed random variables, which are in-between the discrete and continuous worlds. We build rigorous theoretical foundations for these hybrids, via a new “direct sum” base measure defined on the face lattice of the probability simplex. From this measure, we introduce new entropy and Kullback-Leibler divergence functions that subsume the discrete and differential cases and have interpretations in terms of code optimality. Our framework suggests two strategies for representing and sampling mixed random variables, an extrinsic (“sample-and-project”) and an intrinsic one (based on face stratification).

In the third part, I will show how sparse transformations can also be used to design new loss functions, replacing the cross-entropy loss. To this end, I will introduce the family of Fenchel-Young losses, revealing connections between generalized entropy regularizers and separation margin. I will illustrate with applications in natural language generation, morphology, and machine translation.

This work was funded by the DeepSPIN ERC project.

Additional file

document preview

Martins slides.pdf

Europe/Lisbon
Online

Jan Kieseler
Jan Kieseler, European Organization for Nuclear Research (CERN)

The MODE project

The effective design of instruments that rely on the interaction of radiation with matter for their operation is a complex task. Furthermore, the underlying physics processes are intrinsically stochastic in nature and open a vast space of possible choices for the physical characteristics of the instrument. While even large scale detectors such as e.g. at the LHC are built using surrogates for the ultimate physics objective, the MODE Collaboration (an acronym for Machine-learning Optimized Design of Experiments) aims at developing tools also based on deep learning techniques to achieve end-to-end optimization of the design of instruments via a fully differentiable pipeline capable of exploring the Pareto-optimal frontier of the utility function for future particle collider experiments and related detectors. The construction of such a differentiable model requires inclusion of information-extraction procedures, including data collection, detector response, pattern recognition, and other existing constraints such as cost. This talk will give an introduction to the goals of the newly founded MODE collaboration and highlight some of the already existing ingredients.

Additional file

document preview

Kieseler slides.pdf

Europe/Lisbon
Online

Fernando E. Rosas
Fernando E. Rosas, Faculty of Medicine, Department of Brain Sciences, Imperial College

Towards a deeper understanding of high-order interdependencies in complex systems

We live in an increasingly interconnected world and, unfortunately, our understanding of interdependency is still limited. As a matter of fact, while bivariated relationships are at the core of most of our data analysis methods, there is still no principled theory to account for the different types of interactions that can occur between three or more variables. This talk explores the vast and largely unexplored territory of multivariate complexity, and discusses information-theoretic approaches that have been introduced to fill this important knowledge gap.

The first part of the talk is devoted to synergistic phenomena, which correspond to statistical regularities that affect the whole but not the parts. We explain how synergy can be effectively captured by information-theoretic measures inspired in the nature of high brain functions, and how these measures allow us to map complex interdependencies into hypergraphs. The second part of the talk focuses on a new theory of what constitutes causal emergence, and how it can be measured from time series data. This theory enables a formal, quantitative account of downward causation, and introduces “causal decoupling” as a complementary modality of emergence. Importantly, this not only establishes conceptual tools to frame conjectures about emergence rigorously, but also provides practical procedures to test them on data. We illustrate the considered analysis tools on different case studies, including cellular automata, baroque music, flocking models, and neuroimaging datasets.

Additional file

document preview

Rosas slides.pdf

Europe/Lisbon
Online

Josef Urban
Josef Urban, Czech Institute of Informatics, Robotics and Cybernetics

Machine Learning and Theorem Proving

The talk will describe several ways in which machine learning is combined with theorem proving today over large corpora of formal proof. If time permits, I will also show some demos of the systems and mention related topics such as ML-guided conjecturing and autoformalization.

Additional file

document preview

Urban slides.pdf

Europe/Lisbon
Online

Dmitry Krotov
Dmitry Krotov, MIT-IBM Watson AI Lab and IBM Research in Cambridge

Modern Hopfield Networks in AI and Neurobiology

Modern Hopfield Networks or Dense Associative Memories are recurrent neural networks with fixed point attractor states that are described by an energy function. In contrast to conventional Hopfield Networks, their modern versions have a very large memory storage capacity, which makes them appealing tools for many problems in machine learning and cognitive and neuro-sciences. In this talk I will introduce an intuition and a mathematical formulation of this class of models, and will give examples of problems in AI that can be tackled using these new ideas. I will also explain how different individual models of this class (e.g. hierarchical memories, attention mechanism in transformers, etc.) arise from their general mathematical formulation with the Lagrangian functions.

References:

  1. D.Krotov, J.Hopfield, "Dense associative memory for pattern recognition"
  2. D.Krotov, J.Hopfield, "Large Associative Memory Problem in Neurobiology and Machine Learning"
  3. M.Demircigil, et al., "On a model of associative memory with huge storage capacity"
  4. H.Ramsauer, et al., "Hopfield Networks is All You Need"
  5. D.Krotov, "Hierarchical Associative Memory"

Additional file

document preview

Krotov slides.pdf

Europe/Lisbon
Online

Rianne van den Berg
Rianne van den Berg, Microsoft Research Amsterdam

Generative models for discrete random variables

In this talk I will discuss how different classes of generative models can be adapted to handle discrete random variables, and how this can be used to connect generative models to downstream tasks such as lossless compression. I will start by discussing normalizing flow models, and the challenges that arise when converting these models that are typically designed for real-valued random variables to discrete random variables. Next, I will demonstrate how denoising diffusion models with discrete state spaces have a rich design space in terms of the noising process, and how this influences the performance of the learned denoising model. Finally, I will show how denoising diffusion models can be connected to autoregressive models, and introduce an autoregressive model with a random generation order.

Additional file

document preview

van den Berg slides.pdf

Europe/Lisbon
Online

Emtiyaz Khan
Emtiyaz Khan, RIKEN-AIP, Tokyo and OIST, Okinawa, Japan

The Bayesian Learning Rule for Adaptive AI

Humans and animals have a natural ability to autonomously learn and quickly adapt to their surroundings. How can we design AI systems that do the same? In this talk, I will present Bayesian principles to bridge such gaps between humans and AI. I will show that a wide variety of machine-learning algorithms are instances of a single learning-rule called the Bayesian learning rule. The rule unravels a dual perspective yielding new adaptive mechanisms for machine-learning based AI systems. My hope is to convince the audience that Bayesian principles are indispensable for an AI that learns as efficiently as we do.

Reference: M.E. Khan, H. Rue, The Bayesian Learning Rule [arXiv] [Tweet]

Additional file

document preview

Khan slides.pdf

Europe/Lisbon
Online

Andrea L. Bertozzi
Andrea L. Bertozzi, University of California Los Angeles

Graph based models in semi-supervised and unsupervised learning

Similarity graphs provide a structure for analyzing high dimensional data. These undirected weighted graphs provide structure for identifying inherent clusters in datasets and many methods exist to sort through such data building on the graph laplacian matrix. One way to think about such problems is in terms of penalized cut problems. These can be expressed in terms of the graph total variation which has a well-known analogue in Euclidean space. We show how to use ideas from geometric methods for PDEs to develop efficient and high performing methods for semi-supervised and unsupervised learning. These methods also extend to active learning and to modularity optimization for community detection on networks.

Additional file

document preview

Bertozzi slides.pdf

Europe/Lisbon
Online

Stanley Osher
Stanley Osher, Department of Mathematics, University of California, Los Angeles

Conservation laws and generalized optimal transport

In this talk, we connect Lax’s entropy-entropy flux in conservation laws with optimal transport type metric spaces. Following this connection, we further design variational discretizations for conservation laws and mean field control of conservation laws. In particular, we design unconditionally stable time discretization methods that are easy to implement.

On joint work with Siting Liu, UCLA and Wuchen Li, University of South Carolina.

Additional file

document preview

Osher slides.pdf

Europe/Lisbon
Online

Yongji Wang
Yongji Wang, Department of Geosciences, Princeton University

Physics-informed neural networks for solving 3-D Euler equation

One of the most challenging open questions in mathematical fluid dynamics is whether an inviscid incompressible fluid, described by the 3-dimensional Euler equations, with initially smooth velocity and finite energy can develop singularities in finite time. This long-standing open problem is closely related to one of the seven Millennium Prize Problems which considers the problem the viscous analogue to the Euler equations (the Navier-Stokes equations). In this talk, I will describe how we leverage the power of deep learning, using deep neural networks with equation constraints, namely physics-informed neural networks (PINNs), to find a smooth self-similar blow-up solution for the 3-dimensional Euler equations in the presence of a cylindrical boundary. To the best of our knowledge, the solution represents the first example of a truly 2-D or higher dimensional backwards self-similar solution. This new numerical framework based on PINNs is shown to be robust and readily adaptable to other fluid equations, which sheds new light to the century-old mystery of capital importance in the field of mathematical fluid dynamics.

Based on the paper

Yongji Wang, Ching-Yao Lai, Javier Gomez-Serrano, Tristan Buckmaster, Asymptotic self-similar blow up profile for 3-D Euler via physics-informed neural networks

Additional file

document preview

Wang slides.pdf

Europe/Lisbon
Online

Anja Butter
Anja Butter, ITP, University of Heidelberg

Machine Learning and LHC Event Generation

First-principle simulations are at the heart of the high-energy physics research program. They link the vast data output of multi-purpose detectors with fundamental theory predictions and interpretation. In the coming LHC runs, these simulations will face unprecedented precision requirements to match the experimental accuracy. New ideas and tools based on neural networks have been developed at the interface of particle physics and machine learning. They can improve the speed and precision of forward simulations and handle the complexity of collision data. Such networks can be employed within established simulation tools or as part of a new framework. Since neural networks can be inverted, they open new avenues in LHC analyses.

Additional file

document preview

Butter slides.pdf

Europe/Lisbon
Online

John Baez
John Baez, U.C. Riverside

Shannon Entropy from Category Theory

Shannon entropy is a powerful concept. But what properties single out Shannon entropy as special? Instead of focusing on the entropy of a probability measure on a finite set, it can help to focus on the "information loss", or change in entropy, associated with a measure-preserving function. Shannon entropy then gives the only concept of information loss that is functorial, convex-linear and continuous.

This is joint work with Tom Leinster and Tobias Fritz.

Additional file

document preview

Baez slides.pdf

Europe/Lisbon
Online

Dario Izzo
Dario Izzo, European Space Agency

Geodesy of irregular small bodies via neural density fields: geodesyNets

The problem of determining the density distribution of celestial bodies from the induced gravitational pull is of great importance in astrophysics as well as space engineering (thinking of situations where spacecraft need to perform orbital and surface proximity operations). Knowledge of a body density distribution provides also great insights on the body's origin and composition. In practice, the state-of-the-art approaches for modelling the gravity field of extended bodies are spherical harmonics models, mascon models and polyhedral gravity models. All of these, however, while being widely studied and developed since the early works from Laplace, introduce requirements such as knowledge of a shape model, assumption of a homogeneous internal density, being outside the Brillouin sphere, etc...

In this talk, we introduce and explain Neural Density Fields, a new approach to represent the density of extended bodies and learn its accurate form inverting data from gravitational accelerations, orbits or the gravity potential. The resulting deep learning model, called geodesyNets is able to compete with classical approaches while solving most of their limitations. We also introduce eclipseNets, a deep learning model based on related ideas and able to learn the eclipse shadow cones of irregular bodies, thus allowing highly precise propagation and stability studies.

Additional file

document preview

Izzo slides.pdf

Europe/Lisbon
Online

Audrey Durand
Audrey Durand, IID, Université Laval, Canada

Interactive learning for Neurosciences - Between Simulation and Reality

Learning a behaviour to conduct a given task can be achieved by interacting with the environment. This is the crux of reinforcement learning (RL), where an (automated) agent learns to solve a problem through an iterative trial-and-error process. More specifically, an RL agent can interact with the environment and learn from these interactions by observing a feedback on the goal task. Therefore, these methods typically require to be able to intervene on the environment and make (possibly a very large number of) mistakes. Although this can be a limiting factor in some applications, simple RL settings, such as bandit settings, can still host a variety of problems for interactively learning behaviours. In other situations, simulation might be the key.

In this talk, we will show that RL can be used to formulate and tackle data acquisition (imaging) problems in neurosciences. We will see how bandit methods can be used to optimize super-resolution imaging by learning on real devices through an actual empirical process. We will also see how simulation can be leveraged to learn more sequential decision making strategies. These applications highlight the potential of RL to support expert users on difficult task and enable new discoveries.

Additional file

document preview

Durand slides.pdf

Europe/Lisbon
Online

Joseph Bakarji
Joseph Bakarji, University of Washington

Dimensionally Consistent Learning with Buckingham Pi

Dimensional analysis is a robust technique for extracting insights and finding symmetries in physical systems, especially when the governing equations are not known. The Buckingham Pi theorem provides a procedure for finding a set of dimensionless groups from given measurements, although this set is not unique. We propose an automated approach using the symmetric and self-similar structure of available measurement data to discover the dimensionless groups that best collapse this data to a lower dimensional space according to an optimal fit. We develop three data-driven techniques that use the Buckingham Pi theorem as a constraint: (i) a constrained optimization problem with a nonparametric function, (ii) a deep learning algorithm (BuckiNet) that projects the input parameter space to a lower dimension in the first layer, and (iii) a sparse identification of nonlinear dynamics (SINDy) to discover dimensionless equations whose coefficients parameterize the dynamics. I discuss the accuracy and robustness of these methods when applied to known nonlinear systems.

Additional file

document preview

Bakarji slides.pdf

Europe/Lisbon
Online

Paulo Tabuada
Paulo Tabuada, University of California, Los Angeles

Deep neural networks, universal approximation, and geometric control

Deep neural networks have drastically changed the landscape of several engineering areas such as computer vision and natural language processing. Notwithstanding the widespread success of deep networks in these, and many other areas, it is still not well understood why deep neural networks work so well. In particular, the question of which functions can be learned by deep neural networks has remained unanswered.

In this talk we give an answer to this question for deep residual neural networks, a class of deep networks that can be interpreted as the time discretization of nonlinear control systems. We will show that the ability of these networks to memorize training data can be expressed through the control theoretic notion of controllability which can be proved using geometric control techniques. We then add an additional ingredient, monotonicity, to conclude that deep residual networks can approximate, to arbitrary accuracy with respect to the uniform norm, any continuous function on a compact subset of $n$-dimensional Euclidean space by using at most $n+1$ neurons per layer. We will conclude the talk by showing how these results pave the way for the use of deep networks in the perception pipeline of autonomous systems while providing formal (and probability free) guarantees of stability and robustness.

Europe/Lisbon
Online

Inês Hipólito
Inês Hipólito, Humboldt-Universität

The Free Energy Principle in the Edge of Chaos

Living beings do an extraordinary thing. By being alive they are resisting the second law of thermodynamics. This law stipulates that open, living systems tend to dissipation by the increase of entropy or chaos. From minimal cognitive organisms like plants to more complex organisms equipped with nervous systems, all living systems adjust and adapt to their environments, thereby resisting the second law. Impressively, while all animals cognitively enact and survive their local environments, more complex systems do so also by actively constructing their local environments, thereby not only defying the second law, but also (evolution) selective properties. Because all living beings defy the second law by adjusting and engaging with the environment, a prominent question is how do living organisms persist while engaging in adaptive exchanges with their complex environments? In this talk I will offer an overview of how the Free Energy Principle (FEP) offers a principled solution to this problem. The FEP prescribes that living systems maintain themselves by remaining in non-equilibrium steady states by restricting themselves to a limited number of states; it has been widely applied to explain neurocognitive function and embodied action, develop artificial intelligence and inspire psychopathology models.

Europe/Lisbon
Online

Petar Veličković
Petar Veličković, DeepMind and University of Cambridge

Geometric Deep Learning: Grids, Graphs, Groups, Geodesics and Gauges

The last decade has witnessed an experimental revolution in data science and machine learning, epitomised by deep learning methods. Indeed, many high-dimensional learning tasks previously thought to be beyond reach — such as computer vision, playing Go, or protein folding — are in fact feasible with appropriate computational scale. Remarkably, the essence of deep learning is built from two simple algorithmic principles: first, the notion of representation or feature learning, whereby adapted, often hierarchical, features capture the appropriate notion of regularity for each task, and second, learning by local gradient-descent type methods, typically implemented as backpropagation.

While learning generic functions in high dimensions is a cursed estimation problem, most tasks of interest are not generic, and come with essential pre-defined regularities arising from the underlying low-dimensionality and structure of the physical world. This talk is concerned with exposing these regularities through unified geometric principles that can be applied throughout a wide spectrum of applications.

Such a 'geometric unification' endeavour in the spirit of Felix Klein's Erlangen Program serves a dual purpose: on one hand, it provides a common mathematical framework to study the most successful neural network architectures, such as CNNs, RNNs, GNNs, and Transformers. On the other hand, it gives a constructive procedure to incorporate prior physical knowledge into neural architectures and provide principled way to build future architectures yet to be invented.

Europe/Lisbon
Amphitheatre Fa2, IST — Online

Diogo Gomes

From Calculus of Variations to Reinforcement Learning I

This course begins with a brief introduction to classical calculus of variations and its applications to classical problems such as geodesic trajectories and the brachistochrone problem. Then, we examine Hamilton-Jacobi equations, the role of convexity and the classical verification theorem. Next, we illustrate the lack of classical solutions and motivate the definition of viscosity solutions. The course ends with a brief description of the reinforcement learning problem and its connection with Hamilton-Jacobi equations.

Additional file

document preview

Gomes Diogo Optimal_Control_and_ML.pdf

Europe/Lisbon
Amphitheatre Fa2, IST — Online

Diogo Gomes

From Calculus of Variations to Reinforcement Learning II

This course begins with a brief introduction to classical calculus of variations and its applications to classical problems such as geodesic trajectories and the brachistochrone problem. Then, we examine Hamilton-Jacobi equations, the role of convexity and the classical verification theorem. Next, we illustrate the lack of classical solutions and motivate the definition of viscosity solutions. The course ends with a brief description of the reinforcement learning problem and its connection with Hamilton-Jacobi equations.

Europe/Lisbon
Amphitheatre Fa2, IST — Online

José Miguel Urbano

Semi-Supervised Learning and the $\infty$-Laplacian I

Motivated by a recent application in Semi-Supervised Learning (SSL), the minicourse is a brief introduction to the analysis of infinity-harmonic functions. We will discuss the Lipschitz extension problem, its solution via MacShane-Whitney extensions and its several drawbacks, leading to the notion of AMLE (Absolutely Minimising Lipschitz Extension). We then explore the equivalence between being absolutely minimising Lipschitz, enjoying comparison with cones and solving the infinity-Laplace equation in the viscosity sense.

Additional file

document preview

Urbano JM.pdf

Europe/Lisbon
Amphitheatre Fa2, IST — Online

José Miguel Urbano

Semi-Supervised Learning and the $\infty$-Laplacian II

Motivated by a recent application in Semi-Supervised Learning (SSL), the minicourse is a brief introduction to the analysis of infinity-harmonic functions. We will discuss the Lipschitz extension problem, its solution via MacShane-Whitney extensions and its several drawbacks, leading to the notion of AMLE (Absolutely Minimising Lipschitz Extension). We then explore the equivalence between being absolutely minimising Lipschitz, enjoying comparison with cones and solving the infinity-Laplace equation in the viscosity sense.

Europe/Lisbon
Online

Robert Nowak
Robert Nowak, University of Wisconsin-Madison

The Neural Balance Theorem and its Consequences

Rectified Linear Units (ReLUs) are the most common activation function in deep neural networks. Weight decay is the most prevalent form of regularization used in deep learning. Together ReLUs and weight decay lead to an interesting effect known as “Neural Balance”: the norms of the input and output weights of each ReLU are automatically equalized at a global minima of the training objective. Neural Balance has a number of important consequences ranging from characterizations of the function spaces naturally associated to neural networks, their immunity to the curse of dimensionality, and to new and more effective architectures and training strategies.

This talk is based on joint work with Rahul Parhi and Liu Yang.

Europe/Lisbon
Online

Frederico Fiuza

Accelerating the understanding of nonlinear dynamical systems using machine learning

The description of nonlinear, multi-scale dynamics is a common challenge in a wide range of physical systems and research fields — from weather forecast to controlled nuclear fusion. The development of reduced models that balance between accuracy and complexity is critical to advancing theoretical comprehension and enabling holistic computational descriptions of these problems. I will discuss how techniques from statistical and machine learning are offering new ways of inferring reduced physics models from the increasingly abundant data of nonlinear dynamics produced by experiments, observations, and simulations. In particular, I will focus on how sparse regression techniques can be used to infer interpretable plasma physics models (in the form of nonlinear partial differential equations) directly from the data of first-principles fully-kinetic simulations. The potential of this approach is demonstrated by recovering the fundamental hierarchy of plasma physics models based solely on particle-based simulation data of complex plasma dynamics. I will discuss how this data-driven methodology provides a promising tool to accelerate the development of reduced theoretical models of nonlinear dynamical systems and to design computationally efficient algorithms for multi-scale simulations.

Additional file

document preview

fiuza slides.pdf

Europe/Lisbon
Online

João Sacramento
João Sacramento, ETH Zürich

The least-control principle for learning at equilibrium

A large number of models of interest in both neuroscience and machine learning can be expressed as dynamical systems at equilibrium. This class of systems includes deep neural networks, equilibrium recurrent neural networks, and meta-learning. In this talk I will present a new principle for learning equilibria with a temporally - and spatially - local rule. Our principle casts learning as a least-control problem, where we first introduce an optimal controller to lead the system towards a solution state, and then define learning as reducing the amount of control needed to reach such a state. We show that incorporating learning signals within a dynamics as an optimal control enables transmitting activity-dependent credit assignment information, avoids storing intermediate states in memory, and does not rely on infinitesimal learning signals. In practice, our principle leads to strong performance matching that of leading gradient-based learning methods when applied to an array of benchmarking experiments. Our results shed light on how the brain might learn and offer new ways of approaching a broad class of machine learning problems.

Europe/Lisbon
Online

Tom Goldstein
Tom Goldstein, University of Maryland

Building (and breaking) neural networks that think fast and slow

Most neural networks are built to solve simple pattern matching tasks, a process that is often known as “fast” thinking. In this talk, I’ll use adversarial methods to explore the robustness of neural networks. I’ll also discuss whether vulnerabilities of AI systems that have been observed in academic labs can pose real security threats to industrial systems. Then, I’ll present methods for constructing neural networks that exhibit “slow” thinking abilities akin to human logical reasoning. Rather than learning simple pattern matching rules, these networks have the ability to synthesize algorithmic reasoning processes and solve difficult discrete search and planning problems that cannot be solved by conventional AI systems. Interestingly, these reasoning systems naturally exhibit error correction and robustness properties that make them more difficult to break than their fast thinking counterparts.

Additional file

document preview

Goldstein slides.pdf

Europe/Lisbon
Online

Markus Reichstein
Markus Reichstein, MPI for Biogeochemistry

Integrating Machine Learning with System Modelling and Observations for a better understanding of the Earth System

The Earth is a complex dynamic networked system. Machine learning, i.e. derivation of computational models from data, has already made important contributions to predict and understand components of the Earth system, specifically in climate, remote sensing and environmental sciences. For instance, classifications of land cover types, prediction of land-atmosphere and ocean-atmosphere exchange, or detection of extreme events have greatly benefited from these approaches. Such data-driven information has already changed how Earth system models are evaluated and further developed. However, many studies have not yet sufficiently addressed and exploited dynamic aspects of systems, such as memory effects for prediction and effects of spatial context, e.g. for classification and change detection. In particular new developments in deep learning offer great potential to overcome these limitations. Yet, a key challenge and opportunity is to integrate (physical-biological) system modeling approaches with machine learning into hybrid modeling approaches, which combines physical consistency and machine learning versatility. A couple of examples are given with focus on the terrestrial biosphere, where the combination of system-based and machine-learning-based modelling helps our understanding of aspects of the Earth system.

Europe/Lisbon
Online

Bruno Loureiro
Bruno Loureiro, École Polytechnique Fédérale de Lausanne (EPFL)

Phase diagram of Stochastic Gradient Descent in high-dimensional two-layer neural networks

Despite the non-convex optimization landscape, over-parametrized shallow networks are able to achieve global convergence under gradient descent. The picture can be radically different for narrow networks, which tend to get stuck in badly-generalizing local minima. Here we investigate the cross-over between these two regimes in the high-dimensional setting, and in particular investigate the connection between the so-called mean-field/hydrodynamic regime and the seminal approach of Saad & Solla. Focusing on the case of Gaussian data, we study the interplay between the learning rate, the time scale, and the number of hidden units in the high-dimensional dynamics of stochastic gradient descent (SGD). Our work builds on a deterministic description of SGD in high-dimensions from statistical physics, which we extend and for which we provide rigorous convergence rates.

Based on: https://arxiv.org/abs/2202.00293

Additional file

document preview

Loureiro slides.pdf