Data has been called the new oiland for good reason. While as recently as a decade ago our ability to use data effectively was limited by technology, today our access is nearly unlimited. New platforms likeHadoopandSpark allow us to analyze hundreds —or even thousands — of databases at once.Open data initiativeshave expanded our reach even further.
Still it’s becoming clear that there is a data divide opening up. James Manyika, a director at the McKinsey Global Institute, told me that he sees vast differences in effectiveness even among companies in the same industry, with similar IT budgets, competing for the same consumer.
Gautam Tambay, CEO of Springboard, founded his company to help close the data divide. His company offers online education in data science that pairs executives with personal mentors in order to help them navigate an increasingly data intensive world. So to gain some insight I talked to him about what managers today need to know about working with data.
1. Ask The Right Questions
Many managers pride themselves on being “data driven,” but that’s a fairly meaningless term. Data, by itself, tells you nothing. That’s why the first thing that Springboard’s courses train students to do is ask the right questions, form hypotheses and use data to test them.
Imagine the fairly common situation of a sales manager reviewing her numbers. For any given time period, she’s likely to notice that one of her representative’s sales are down. Obviously, that’s a problem. So she might look into data that she thinks might explain the shortfall, such as number of calls made by the salesperson, follow-up emails, and so on.
Yet there are many other factors that could explain the shortfall, such as quality of leads, industry and economic trends, product performance, or competitive activity, such as new products and discounts. By solely focusing on the salesperson’s performance, the sales manager could be missing a much bigger problem.
Obviously this is a very simplistic example, but it illustrates a larger point. The world is far more complex than it may seem on an Excel spreadsheet. That’s why Tambay emphasizes the need to break data down into factors that are mutually exclusive and collectively exhaustive (MECE), so that root causes can be identified and dealt with.
2. Prepare Your Data Carefully And Apply a “Sanity” Test
Another pitfall is the data itself. When we see numbers on the screen, we often make the mistake of taking them at face value and don’t give much thought to how they were collected. Often, however, data is manually input by bored and overworked clerks using confusing systems they may not have been properly trained on.
As MIT’s Zeynep Ton explains in The Good Jobs Strategy, these issues are especially prevalent in the retail industry, where data problems like phantom stockouts are endemic. In fact, she points to one study that found 65% of a retailer’s inventory data was inaccurate. Often, these errors lead to inaccurate assessment of product demand, which leads to poor decisions.
A more noteworthy example occurred in 2010, when two Harvard economists, Carmen Reinhart and Kenneth Rogoff, published a working paper that warned that US debt was approaching a critical level. Their work greatly influenced the political debate around the federal budget but, as it turned out, they had made a simple Excel error and their fears were found to be baseless.
These are not isolated examples. In fact, data errors have led to what scientists are calling a replication crisis, in which many scientific papers are later found to be invalid. Tambay suggests that managers apply a simple “sanity test” to see if the data make sense. For example, in the case of Reinhart and Rogoff, it was clear that many countries ran high debt levels with little or no adverse effects.
One way would be to ask him to shoot 100 free throws. The average NBA free throw percentage is 75%, so a sample size of 100 should give us a reasonable assessment. Statistically speaking, we could be 95% sure that the error would be within a 10% range, which seems like it should provide a good basis for judgment.
However, one out of twenty times (the 5% that the confidence interval doesn’t cover) you’d go down in history as the jackass who overlooked Michael Jordan because he missed a few free throws. That’s the problem with traditional statistical methods (sometimes called frequentist or Gaussian statistics). They can minimize error, but not eliminate it.
Humans don’t internalize numbers well, so how data are presented can make an enormous impact on how effective a particular data analysis will be. That’s why Springboard doesn’t just teach its students to be good data scientists, it also emphasizes the need to be good storytellers.
Tambay recommends that analysts first prepare a data presentation with just headlines, so they can focus on the meaning behind the numbers. From there, they can build compelling data visualizations that support the narrative.
“We teach our students to tell stories with data because that’s what’s most likely to affect decision making,” Tambay told me. “And that gets to the core of what we try want our students to achieve — enable better business decisions through telling compelling data stories.”
The truth is that it’s not enough to be “data driven.” Today’s managers need to take care to prepare their data so that it reflects reality, understand what types of analysis to apply, derive meaningful conclusions and communicate them effectively.