The Shape of Data

In this post, we’ll warm up our geometry muscles by looking at one of the most basic data analysis techniques: linear regression. You’ve probably encountered it elsewhere, but I want to think about it from the point of view of geometry and, particularly, the distributions that I introduced in my previous post. Recall that our goal is to infer a probability distribution from a set of data points. Linear regression follows a very common pattern among modeling algorithms: We choose a basic form that we want the distribution to have, then we choose the distribution that best fits the given data among all distributions of this form.

