This talk was given at Midwest.io 2014.

The MapReduce framework is a proven method for processing large volumes of data but even simple problems require expertise. Tackling the learning curve for Big Data and efficient processing is a daunting task for developers just getting started. The Apache Crunch project helps to break down complex processing problems into simple concepts which can be utilized on industry standard frameworks such as Hadoop and Spark. Apache Crunch is being used as an integral part of building processing pipelines for healthcare data allowing for quick development of new solutions and architectures. The talk will also cover how the core concepts of Apache Crunch enable first class integration, rapid scaling of development across teams, and development of extensible processing infrastructure.

About the Speaker

Micah is a committer and PMC member on the Apache Crunch project.

Advertisements