Memcache at Facebook + Hadoop

Sanjay Sharma's Weblog

Memcached as I have heard and acknowledge, is the de-facto leader in web layer cache.

Here are some interesting facts from Facebook memcached usage statistics (http://www.infoq.com/presentations/Facebook-Software-Stack)

  • Over 25 TB (whooping!!!) of in-memory cache
  • Average latency <200 micro seconds (vow!!)
  • cache serialized PHP data structures
  • Lots of multi-gets

Facebook memcached customizations

  • Over UDP
    • Reduced memory overhead of TCP con buffers
    • Application-level flow control, (optimization for multi-gets)
  • On demand aggregation of per-thread stats
    • Reduces global lock contention
  • Multiple kernel changes to optimize for Memcached usage
    • Distributing network interrupt handling over multiple cores
    • opportunistic polling of network interface

My Memcached usage experience with Hadoop

  • Problem definition- using memcached for key-value lookup in Map class. Each mapper method required look up of around 7-8 different types of key-value Maps. This meant that for each  row in input data (million+ rows), lookup was required 7 times more. The entire Map could not be used as…

View original post 240 more words

Advertisements