Making Data Work through Collaboration

We are living in an extraordinary time. The convergence of technology vectors in recent years, including the advancement in microprocessors, storage, networking, virtualization, and the cloud, has presented one disruptive shift after the other, empowering remarkable people and organizations to push the limits experientially, conceptually, and socially.

When I spoke at the first O’Reilly Strata Conference in 2011, the big question was, “How does this burgeoning industry accelerate its momentum and continue to push its limits?” In the past two years, we’ve seen data-derived insight drive business value and impact many sectors, with McKinsey projecting that the industry will grow to $16 billion by 2015. And its impact goes far beyond the enterprise. Silicon Valley veteran Marc Andreessen recently stated that “software is eating the world,” meaning that the world is getting instrumented through the explosion of machine-generated and mobile data. Governments are also embracing Big Data solutions, and the topic is making front page news in mass media outlets across the globe. Within the cultural conversation, the concept is on the tip of the tongues of people around the world.

As we look back at the progress made over the last several years, the one fundamental thing that is consistent is that Big Data is about people and the notion of being part of a larger community. The impact of the efforts of our co-conspirators in this data community — the data scientists and developers who have been instrumental to making data work — cannot be overstated.

At Strata 2011, I observed that this space needed more data scientists, government support, industry investment, and startups. During my keynote last month at the 2013 Strata Conference, I paid tribute to ten such practitioners, change-makers, institutions, and entrepreneurs who have inspired me and the Greenplum team since then.

DataKind’s founder Jake Porway brought data science to non-profit organizations to address social and civic causes. Cornell University’s Jon Kleinberg reshaped how we think about machine learning algorithms. Andrew Ng and Daphne Koller of Coursera introduced data science and technical education to the world. Alpine Data Labs’ Steven Hillion, who founded the analytics team at Greenplum, makes the list, as do Metamarkets CEO Mike Driscoll and Big Data trailblazer Todd Papaioannou. Cowbird founder Jonathan Harris has shown us new ways to tell stories through data and collaboration, while at the government level, the Obama Administration has taken a proactive stance on Big Data, with $200 million budgeted to fund Big Data research and development across six Federal agencies.

So what do we need to make data work in the future? More of what has taken us so far already: collaboration.

When I talk about collaboration, I mean it on the same level as a word as powerful as democracy. This means collaboration on software, standards and data. Open source software is not the open source of twenty years ago — not when you have hundred million dollar investments by venture capital companies expecting a billion dollar return. Within any open source project, there will be conflicts about how that project evolves, which is why the development of standards to ensure compatibility and interoperability will be key.

Another critical piece of collaboration is around the data itself. My data and your data are contributing to a unified data set around the world. Examples include healthcare, energy, local economies, global security and the future of education. I believe that within this global data set are insights to some of the world’s biggest problems, revealing significant breakthroughs that our civilization is looking for. This is what keeps us inspired and what drives us to be part of this important community today.

Watch Scott’s full keynote below.

About the Author: Scott Yara