The Shape of Data

In this post, we’ll warm up our geometry muscles by looking at one of the most basic data analysis techniques: linear regression. You’ve probably encountered it elsewhere, but I want to think about it from the point of view of geometry and, particularly, the distributions that I introduced in my previous post. Recall that our goal is to infer a probability distribution from a set of data points. Linear regression follows a very common pattern among modeling algorithms: We choose a basic form that we want the distribution to have, then we choose the distribution that best fits the given data among all distributions of this form.

View original post 1,071 more words