Linear (and generalized linear) regression (LR) is an old, but still essential, statistical tool: its goal is to learn to predict a (response) variable from a linear combination of other (explanatory) variables. A central problem in LR is the selection of relevant variables, because using fewer variables tends to yield better generalization and because this identification may be meaningful (e.g., which genes are relevant to predict a certain disease). In the past quarter-century, variable selection (VS) based on sparsity-inducing regularizers has been a central paradigm, the most famous example being the LASSO, which has been intensively studied,
extended, and applied.
In many contexts, it is natural to have highly-correlated variables (e.g., several genes that are strongly co-regulated), thus simultaneously relevant as predictors. In this case, sparsity-based VS may fail: it may select an arbitrary subset of these variables and it is unstable. Moreover, it is often desirable to identify all the relevant variables, not just an arbitrary subset thereof, a goal for which several approaches have been proposed. This talk will be devoted to a recent class of such approaches, called ordered weighted l1 (OWL). The key feature of OWL is that it is provably able to explicitly identify (i.e. cluster) sufficiently-correlated features, without having to compute these correlations. Several theoretical results characterizing OWL will be presented, including connections to the mathematics of economic inequality. Computational and optimization aspects will also be addressed, as well as recent applications in subspace clustering, learning Gaussian graphical models, and deep neural networks.