In many emerging applications, it is of paramount interest to learn hidden parameters from data. For example, self-driving cars may use onboard cameras to identify pedestrians, highway lanes, or traffic signs in various light and weather conditions. Problems such as these can be framed as classification, regression, or risk minimization in general, at the heart of which lies stochastic optimization and machine learning. In many practical scenarios, distributed and decentralized learning methods are preferable as they benefit from a divide-and-conquer approach towards data at the expense of local (short-range) communication. In this talk, I will present our recent work that develops a novel algorithmic framework to address various aspects of decentralized stochastic first-order optimization methods for non-convex problems. A major focus will be to characterize regimes where decentralized solutions outperform their centralized counterparts and lead to optimal convergence guarantees. Moreover, I will characterize certain desirable attributes of decentralized methods in the context of linear speedup and network independent convergence rates. Throughout the talk, I will demonstrate such key aspects of the proposed methods with the help of provable theoretical results and numerical experiments on real data.