Special CMX Seminar

How Many Labels Do You Need For Semi-Supervised Learning?

Matthew Thorpe, Postdoctoral Associate, Department of Mathematical Sciences, Carnegie Mellon University,

Speaker's Bio:
Matthew is a research fellow in the Cantab Capital Institute for the Mathematics of Information. His research interests are in discrete-to-continuum limits in graphical problems, particularly variational problems that arise from applications in machine learning, and optimal transport distances and their applications to signal and image analysis. Before joining Cambridge Matthew was a postdoctoral associate at Carnegie Mellon University working with Dejan Slepčev on optimal transport problems (NSF grant number 1421502). Prior to that Matthew was a PhD student in the maths and statistics doctoral training centre (MASDOC) at Warwick under the supervision of Florian Theil and Adam Johansen. The focus of the PhD was on applying variational methods to statistical inference problems.

Given a data set of which a small subset are labelled, the goal of semi-supervised learning is to find the unknown labels. A popular method is to minimise a discrete p-Dirichlet energy defined on a graph constructed from the data. As the size of the data set increases one hopes that solutions of the discrete problem converge to a continuum variational problem with the continuum p-Dirichlet energy. It follows from Sobolev regularity that one cannot impose constraints if p is less than the dimension of the data hence, in this regime, one must also increase the number of labels in order to avoid labels "dissappearing" in the limit. In this talk I will address the question of what is the minimal number of labels. To compare labelling functions on different domains we use a metric based on optimal transport which then allows for the application of methods from the calculus of variation, in particular Gamma-convergence, and methods from PDE's, such as constructing barrier functions in order to apply the maximum principle. We can further show rates of convergence. This is joint work with Jeff Calder (Minnesota) and Dejan Slepcev (CMU).

Event Sponsors:

Computing and Mathematical Sciences (CMS) - More events from this Sponsor

For more information, please contact Jolene Brink by phone at (626)395-2813 or by email at [email protected] or visit CMX Website.