The Shape of Data

So far on this blog, we’ve seen two very different approaches to constructing models that predict data distributions. With regression, we replaced the original data points with an equation defining a relatively simple shape that approximates the data, then used this to predict one dimension/feature of new points based on the others. With K Nearest Neighbors, we used the data points directly to define a fairly complicated distribution that divided the data space into two (or more) classes, so that we could predict the classes of new data points. Today, we’ll combine elements of both. We’re going to stick with classification, splitting the data space into two classes, but the goal will be to replace the original data with a simplified (linear) model.

View original post 1,015 more words