CMI Seminar
Madeleine Udell is a PhD candidate at Stanford University in Computational & Mathematical Engineering, working with Professor Stephen Boyd. Her research focus is on modeling and solving large-scale optimization problems and on finding and exploiting structure in high dimensional data, with applications in marketing, demographic modeling, and medical informatics. Her recent work on generalized low rank models (GLRMs) extends principal components analysis (PCA) to embed tabular data sets with heterogeneous (numerical, Boolean, categorical, and ordinal) types into a low dimensional space, providing a coherent framework for compressing, denoising, and imputing missing entries. She has developed of a number of open source libraries for modeling and solving optimization problems, including Convex.jl, one of the top ten tools in the new Julia language for technical computing, and is a member of the JuliaOpt organization, which curates high quality optimization software. She received a B.S. degree in Mathematics and Physics, summa cum laude, with honors in mathematics and in physics, from Yale University. At Stanford, she was awarded a NSF Graduate Fellowship, a Gabilan Graduate Fellowship, and a Gerald J. Lieberman Fellowship. She has taught courses in Discrete Mathematics and Algorithms, and in Convex Optimization, and led a team of TAs for an online course with over 10,000 students. Selected as the doctoral student member of Stanford's School of Engineering Future Committee, she is currently working to develop a road-map for the future of engineering at Stanford over the next 10--20 years.
Principal components analysis (PCA) is a well-known technique for approximating a data set represented by a low rank matrix. Here, we extend the idea of PCA to handle arbitrary data sets consisting of numerical, Boolean, categorical, ordinal, and other data types. This framework encompasses many well known techniques in data analysis, such as nonnegative matrix factorization, matrix completion, sparse and robust PCA, k-means, k-SVD, and maximum margin matrix factorization. The method handles heterogeneous data sets, and leads to coherent schemes for compressing, denoising, and imputing missing entries across all data types simultaneously. It also admits a number of interesting interpretations of the low rank factors, which allow clustering of examples or of features.
We propose several parallel algorithms for fitting generalized low rank models, and describe implementations and numerical results.