The Shape of Data

In the last post, I discussed how one can analyze a data set from the point of view of geometry, by thinking of each data point as coordinates in a high dimensional space. The thing is, when we’re analyzing these data sets, we’re usually not interested in the points in the data set, so much as the points that aren’t in the data set. For example, if we’re using past customer data to predict whether a new potential customer will buy a product, the new customer is not in our initial data set. So the goal of analyzing the data points from past customers is to understand the points that may correspond to new customers. The same goes for determining if a tumor is benign, if a certain blood pressure drug will work on a given patient, or any other application of data analysis.

