Our friends at Twitter have contributed to MLlib, and this post uses material from Twitter’s description of its open-source contribution, with permission. The associated pull request is slated for release in Spark 1.2.

https://databricks.com/blog/2014/10/20/efficient-similarity-algorithm-now-in-spark-twitter.html

Original Paper –

Twitter-Dimension-Independent-Similarity-Computation

Advertisements