How to Turn Your Big Data Firehose into Insights in 3 Easy Steps

Many years ago when I was first starting out in digital analytics I captured a piece of valuable advice from Avinash Kaushik. He essentially said if you want to be an expert in anything, read 3 books on the subject and then you will know 80% of what everyone else knows. Indeed this follows the Pareto curve (aka the 80/20 rule) and is also cited in Nate Silver’s book, The Signal and the Noise.

When faced with overwhelming information, such as a billion quickly changing data sources, I use this three-step technique. You may notice, this is very similar to testing … or Bayesian theory… or similar to cracking the Enigma code if you saw the recent movie. Here is how it goes:

1. First, Ask a Clear Question

The geeky term for this is hypothesize. Yes, you need to form a hypothesis first; for the simple reason that what is valuable is contextual. Therefore, you must know what you want in order to find value from the information. Your hypothesis could be general such as, “Is there a valuable use for social media data that applies to our company?” or more specific such as, “Are we using the right metric to measure social media?” The question can be anything you want to understand. Anything at all. However, the more specific your question, the more specific your answer will be.  

2. Triangulate until Three that Agree

This is directly related to Avinash’s advice above which you might notice is, itself, triangulated. Triangulating means finding several sources that say essentially the same thing until you know it is probably true. Three seems to be the magic number. For example, if I suspect the new programmatic ad buying to be weak in certain areas (like actually knowing a person vs an audience) I’ll keep this theory in the back of my head as I read about the topic looking for data to support it. In some cases, if it’s very new and specific, I may email people I feel are experts in the field. @brooksbell really knows testing. @timash wrote the book on landing page optimization. @bobpage knows more about modern high speed data architecture than anyone. Your source does not have to be a book. Any trusted source will do. By combining three of them, you increase the odds that you have found what is considered most correct. Yes, I said “most correct” because fast-moving fields change. What is correct today may not be correct tomorrow. In data, you might combine voice of customer with behavioral data with business experience to solidify what you believe to be true.  

3. Go beyond the 80/20 rule

After you have enough information to triangulate, you have established a heuristic. That is you have, a rule of thumb, something that is true about 80% of the time. And what one thing is true about rules of thumb? They can be broken. It is the last mile, the final 20% which is the domain of experts and experience. By the way, this is also how the IBM computer beat Garry Kasparov. As long as the computer operated on “rules”, those rules could be deduced and manipulated by Kasparov to beat the machine. At the point the machine knew how to make exceptions to its own rules, it became unbeatable. To get to mastery, you have to know the exceptions. Like outliers in a scatterplot, these are the cases for innovation. This is where you find really neat “Blue Ocean” insights.   

So to make sense of any big data firehose, first, know what you want. Second, solve for the first 80% by triangulating information until you get three that agree. Finally, continue to keep an open mind for the final 20% which represent exceptions to your rule and watch out for a revolution of insight.