Information, Geometry, and Physics Seminar
In light of the recent advances in generative modeling, we can view learning as the process of assigning probabilities to plausible outcomes in a high dimensional event space. The miracle of modern AI is that such a task is viable even when the data is sparse in relation to the total volume of possibilities. One wonders why it works, and how far we can stretch this magic---what is the smallest number of samples needed to learn a distribution, and, is it possible to continue learning after the training data has been depleted? I tackle these questions by borrowing ideas from information theory, statistical mechanics, thermodynamics, and optimal transport. I will illustrate some experiments with diffusion models that reinforce our theoretical insights.