Europe/Lisbon
Online

Pedro Domingos
Pedro Domingos, University of Washington

Deep Networks Are Kernel Machines

Deep learning's successes are often attributed to its ability to automatically discover new representations of the data, rather than relying on handcrafted features like other learning methods. In this talk, however, I will show that deep networks learned by the standard gradient descent algorithm are in fact mathematically approximately equivalent to kernel machines, a learning method that simply memorizes the data and uses it directly for prediction via a similarity function (the kernel). This greatly enhances the interpretability of deep network weights, by elucidating that they are effectively a superposition of the training examples. The network architecture incorporates knowledge of the target function into the kernel. The talk will include a discussion of both the main ideas behind this result and some of its more startling consequences for deep learning, kernel machines, and machine learning at large.

Additional file

document preview

Domingos_P.pdf