Laszlo Bock, Senior Vice President of People Operations at Google, kicked off Google’s first-ever re:Work event by talking about the potential to make work better for people everywhere. This is a huge opportunity because we spend more time at work than we do anything else in our lifetime. What if we could improve the experience of work, even just a little, for everyone?
Bock offered a glimpse into Google’s own approach. People often think working at Google is about bean bags, free food, and lava lamps, but it’s really about having a mission that matters – doing work that is meaningful and connected to something bigger. Google starts with the belief that people are good and will do the right thing. From there, it’s easy to give employees freedom, access to information, and autonomy, which allows them to go out and create amazing things.
Just as Google’s product development is driven by user feedback and data, Google’s HR policies are informed by feedback and data. Google uses science, collects data, and runs experiments internally to determine how best to make employees as happy, healthy, and productive as possible. From dissecting the attributes that make a great manager to nudging employees to take advantage of the company’s 401(k) matching, Google uses robust, academic-quality analytics to drive HR decisions.
Prasad Setty, Vice President of People Analytics & Compensation at Google, moderated a panel on using data to make better people decisions. Google uses data and analytics to inform all its people decisions, from choosing benefits options to promotions to hiring. Setty explained Google’s methodical approach starts with looking at the existing research, developing a new hypothesis, and then testing it internally.
Google has even started its own longitudinal study on voluntary Googler participants, which will track their careers over several decades. Setty said they’ll look at a range of data points to track work performance, attitudes, beliefs, problem solving strategies, challenges, and resiliency. And while the company isn’t sure what it will find, it knows collecting this data is the first step to discovering new things about how we work.
User logs of Hadoop jobs serve multiple purposes. First and foremost, they can be used to debug issues while running a MapReduce application – correctness problems with the application itself, race conditions when running on a cluster, and debugging task/job failures due to hardware or platform bugs
To keep our code at Google in the best possible shape we provided our software engineers with these constant reminders. Now, we are happy to share them with the world.
Many thanks to these folks for inspiration and hours of hard work getting this guide done:
Also thanks to Blaine R Southam who has turned it into a pdf book.
#!/bin/bash echo before comment : <<'END' bla bla blurfl END echo after comment
This talk was given at Midwest.io 2014.
The MapReduce framework is a proven method for processing large volumes of data but even simple problems require expertise. Tackling the learning curve for Big Data and efficient processing is a daunting task for developers just getting started. The Apache Crunch project helps to break down complex processing problems into simple concepts which can be utilized on industry standard frameworks such as Hadoop and Spark. Apache Crunch is being used as an integral part of building processing pipelines for healthcare data allowing for quick development of new solutions and architectures. The talk will also cover how the core concepts of Apache Crunch enable first class integration, rapid scaling of development across teams, and development of extensible processing infrastructure.
About the Speaker
Micah is a committer and PMC member on the Apache Crunch project.
HBase can easily store terabytes of data, but how do you scale your search mechanism to sift through these mountains of bits and retrieve large result sets in a matter of milliseconds? We used a combination of Solr sharding, careful index creation, and result pruning to meet these strict requirements in our production environment. Come see how we handle millions of rapid fire queries from dozens of parallel search clients against many terabytes of data while addressing high availability through load balancing and replication.
Memcache at Facebook + Hadoop
Memcached as I have heard and acknowledge, is the de-facto leader in web layer cache.
Here are some interesting facts from Facebook memcached usage statistics (http://www.infoq.com/presentations/Facebook-Software-Stack)
- Over 25 TB (whooping!!!) of in-memory cache
- Average latency <200 micro seconds (vow!!)
- cache serialized PHP data structures
- Lots of multi-gets
Facebook memcached customizations
- Over UDP
- Reduced memory overhead of TCP con buffers
- Application-level flow control, (optimization for multi-gets)
- On demand aggregation of per-thread stats
- Reduces global lock contention
- Multiple kernel changes to optimize for Memcached usage
- Distributing network interrupt handling over multiple cores
- opportunistic polling of network interface
My Memcached usage experience with Hadoop
- Problem definition- using memcached for key-value lookup in Map class. Each mapper method required look up of around 7-8 different types of key-value Maps. This meant that for each row in input data (million+ rows), lookup was required 7 times more. The entire Map could not be used as…
View original post 240 more words