Facebook's Real Time analytics uses Puma and pTail  that "tails" logs/events into HBase. This could also be accomplished using Flume-HBase sink. Moreover, Flume-HDFS already has batching constructs, reliability guarantees etc.  Puma/pTail mentions similar mechanisms [FB's talk: http://gigaom.com/cloud/how-facebook-is-powering-real-time-analytics/
Also, Flume can emulate a downstream node for applications that log using Scribe. http://archive.cloudera.com/cdh/3/flume/Cookbook/index.html#_logging_scribe_events_to_a_flume_agent

Why did Facebook develop Puma/pTail instead of using existing ones like Flume ?

Advertisements