– Europe/Lisbon
Online
Learning from distributed datasets: an introduction with two examples
Data are increasingly measured, in ever tinier minutiae, by networks of spatially distributed agents. Illustrative examples include a team of robots searching a large region, a collection of sensors overseeing a critical infra-structure, or a swarm of drones policing a wide area.
How to learn from these large, spatially distributed datasets? In the centralized approach each agent forwards its dataset to a fusion center, which then carries out the learning from the pile of amassed datasets. This approach, however, prevents the number of agents to scale up: as more and more agents ship data to the center, not only the communication channels near the center quickly swell to congestion, but also the computational power of the center is rapidly outpaced.
In this seminar, I describe the alternative approach of distributed learning. Here, no fusion center exists, and the agents themselves recreate the centralized computation by exchanging short messages (not data) between network neighbors. To illustrate, I describe two learning algorithms: one solves convex learning problems via a token that randomly roams through the network, and the other solves a classification problem via random meetings between agents (e.g., gossip), each agent measuring only its own stream of features.
This seminar is aimed at non-specialists. Rather than trying to impart the latest developments of the field, I hope to open a welcoming door to those wishing to have a peek at this bubbling field of research, where optimization, control, probability, and machine learning mingle happily.
Additional file
Projecto FCT UIDB/04459/2020.