The Globe of Science and Innovation at CERN, Geneva, Switzerland.

On August 6th, 1991, Tim Berners-Lee summarized the origins of what he called “the WorldWideWeb project”:

“The project started with the philosophy that much academic information should be freely available to anyone. It aims to allow information sharing within internationally dispersed teams, and the dissemination of information by support groups.”

When the Web was created, Berners-Lee and his colleagues could have stopped there, satisfied with finding a brilliant solution to the problem of linking up CERN scientists and data, globally. Would anyone have blamed them? But the project grew far beyond its relatively modest original scope because Berners-Lee and others saw that the Web could solve much bigger problems.

We’re at a critical juncture with Big Data because right now, we’re choosing the problems we want it to solve. Call it the “CERN moment,” if you like. We’re not asking enough of Big Data, and we’re still getting in its way.

The data paradigm we’re used to (and comfortable with) relies on human analysis and application. Gather the data, build a dashboard, draw a conclusion, make a decision, act. This workflow was great for answering questions like, “which items should we feature on our home page?”

But we should be using Big Data to do what we can’t, not just help us do what we’re already doing, better. One application of Big Data I’d like to see is a kind of “Pandora for stuff” based on networked data from many sources (essentially a more accurate, more useful personalization). Another is a new kind of sentiment analysis that focuses less on the what of sentiment, and more on the why and what now—why do people like or dislike your product or brand, and what should you do about it. This idea speaks to something that I think Big Data will soon excel at: finding cause. We are a bit skittish about cause, and for good reason. Almost all the data marketers deal with tells them about correlation, but hardly ever rises to the level of determining cause. It tells them about relationships, but doesn’t tell us why those relationships exist. It might seem like solid correlation data is the best we’ll get, but I think Big Data can absolutely take us all the way to cause if we pipe enough data into our systems, and crucially, give those systems the power to control variables programmatically.

Old: Define the problem, train the machine. New: Train the machine to define the problem.

In order to meet its true potential, Big Data should draw the conclusions, make the decisions, and act programmatically, in real time, with minimal human involvement.

Right now, humans are still better at at least one thing: defining the problem. But that’s not to say Big Data can’t eventually outdo us in this arena, too. If we can train machines to solve problems, we can train them to define them as well. Once this happens, we close the circuit—the true power of Big Data is realized. The system starts working on its own, and better than ever before. Here are the conditions which must be met for Big Data to flourish:

  1. It must be ubiquitous, accessing data from everywhere we let it.
  2. It must be always on, constantly gathering, processing, comparing, and acting.
  3. It must be empowered to act in the moment.

What do you think? Are there other conditions that will help us get more out of Big Data? Add them to the comments below, and let’s discuss!

  • MattLaessig

    Great article Stephen. That will truly be a paradigm shift of how we engage “the machine”. Shifting from human definition of the problem, which is largely limited by our insights at the beginning of the analysis, to training the machine to identify the problem itself during and after the analysis of staggering levels of data will unlock tremendous value and more powerful conclusions.