Europe/Lisbon
Online

André F. T. Martins

André F. T. Martins, Instituto Superior Técnico
From Sparse Modeling to Sparse Communication

Neural networks and other machine learning models compute continuous representations, while humans communicate mostly through discrete symbols. Reconciling these two forms of communication is desirable for generating human-readable interpretations or learning discrete latent variable models, while maintaining end-to-end differentiability.

In the first part of the talk, I will describe how sparse modeling techniques can be extended and adapted for facilitating sparse communication in neural models. The building block is a family of sparse transformations called alpha-entmax, a drop-in replacement for softmax, which contains sparsemax as a particular case. Entmax transformations are differentiable and (unlike softmax) they can return sparse probability distributions, useful to build interpretable attention mechanisms. Variants of these sparse transformations have been applied with success to machine translation, natural language inference, visual question answering, and other tasks.

In the second part, I will introduce mixed random variables, which are in-between the discrete and continuous worlds. We build rigorous theoretical foundations for these hybrids, via a new “direct sum” base measure defined on the face lattice of the probability simplex. From this measure, we introduce new entropy and Kullback-Leibler divergence functions that subsume the discrete and differential cases and have interpretations in terms of code optimality. Our framework suggests two strategies for representing and sampling mixed random variables, an extrinsic (“sample-and-project”) and an intrinsic one (based on face stratification).

In the third part, I will show how sparse transformations can also be used to design new loss functions, replacing the cross-entropy loss. To this end, I will introduce the family of Fenchel-Young losses, revealing connections between generalized entropy regularizers and separation margin. I will illustrate with applications in natural language generation, morphology, and machine translation.

This work was funded by the DeepSPIN ERC project.

Additional file

document preview

Martins slides.pdf