The Shape of Data

Now that we’ve gotten a taste of the curse of dimensionality, lets look at another potential problem with the basic form of regression we discussed a few posts back. Notice that linear/least squares regression always gives you an answer, whether or not that answer is any good. If you want to find a line that best fits your data, the regression algorithm will give you a line, regardless of the actual structure of the data. So what we also need is a way to determine how good the model discovered by regression is at approximating the data. There are a few different ways to do this, and in this post, I’ll introduce one called Principal Component Analysis (PCA).

View original post 1,121 more words