Data: The New Science

Last month, Massachusetts Gov. Deval Patrick, and leaders from leading technology firms, venture capital and research universities announced joint efforts to position the state as a hub for job creation in Data Science. It is a bet worth making.

Thirty-five years ago, Computer Science didn’t exist as an academic discipline. Now, it is taught on a global basis. Within years, Data Science will become the new “hot” discipline for the next generation, driven by the hottest buzzword in technology today: Big Data.

Big Data refers to the growth of data sets that have become so large, they defy the ability of conventional forms of information technology to store, manage, analyze and gather important insights from them.

According to the research firm IDC, all of the new information generated in the world in the year 2000 amounted to approximately two million terabytes. The digital universe, as we call it, now generates more than twice that much in a single day.

Driving the data deluge are technologies and applications that permeate our daily lives: mobile sensors, smart phones, social networking.

A utility that reads a smart meter on your home every fifteen minutes generates three thousand times more information about your electricity usage than the old way of reading meters once a month. Multiply that by millions of customers in a metro area, and a smart grid can produce a data set large enough to monitor power demand in real time and predict probable outages before they occur.

Until recently, the cost to store, access and analyze Big Data was so high only government and a few business models could justify the cost. Law enforcement discovered innovative ways to match fingerprints and ballistics against centralized, digitized databases. Counterterrorism and the gaming industry adopted facial recognition technology to identify the bad guys before they step on an airplane, or onto the casino floor.

Today, real time analysis of Big Data is within the reach of almost any business model in any industry. The cost to sequence one human genome, for example, has fallen from $100 million in 2001 to less than $10,000, putting personalized medical treatments within the reach of masses.

Online retailers can sift through billions of purchase observations to recommend additional items that a particular consumer might want to buy at the point of sale. Pretty soon, location data on your smart phone matched with shopping records gathered from your loyalty card will enable a supermarket to send you personalized, digital coupons for items as you walk down that aisle.

Crowd sourcing – think tens of thousands of drivers on your local highways – already generates enough traffic data for the GPS in your smart phone to suggest alternative routes when a stretch of road is clogged. Machine learning, the ability of computers to get smarter as they gather more information and more feedback from users will lead to even greater precision in information exchange.

To accelerate job creation in data science, Massachusetts announced matching grants for Big Data projects to be funded by public/private partnerships, funding for technology research at the Massachusetts Institute of Technology, along with a Big Data internship program modeled after a successful program in life sciences, and support for an innovative non-profit community presence in Boston, where data scientists can share infrastructure and knowledge.

Data Science Summit LogoThis is an exciting time for innovators in this arena. Last month, our company held our second annual summit for data scientists in government, academia, biotechnology, retailing, marketing and other fields.

There, John Brownstein from Harvard Medical School told how an online innovation he co-founded,, seeks to be the for infectious diseases, using crowd sourcing to provide a global view of outbreaks – two weeks ahead of the Centers for Disease Control or the U.N.

He told how the SARS outbreak in China was first detected by someone examining a stock performance chart of a Chinese company that sold herbal remedies, and how the H1N1 virus outbreak was first reported by a local TV station in Veracruz, Mexico. The mobile app OutbreaksNearMe will disseminate information back to users who report in, providing an incentive for people to input their health information anonymously into a pool of data large enough to analyze.  Information that once took months to flow up the chain of public health bureaucracy now reaches the public in weeks.

Entrepreneur Tarek Kamil spoke of how 6,000 sensors in a basketball can identify telemetry data that a human cannot see, feeding information for analysis that can recommend drills to improve technique. Business consultant Piyanka Jain explained how some of the quickest returns on Big Data analytics can be found in uncovering pain points in customer relationships to improve loyalty and profitability.

Making use of Big Data, however, requires new technology architectures and algorithms that go beyond conventional database management and business intelligence. Most new information created is not transaction data that resides in neat rows and columns in traditional office databases managed by an administrator. Eighty percent of new information is generated outside of enterprise data centers, and traditional databases simply cannot scale to the challenges of data sets 100 to 1,000 times larger.

Data science also requires a new breed of scientist. Imagine a PhD in life sciences who manages petabytes of information for a pharmaceutical company that conducts drug research. These people have business acumen and subject matter expertise and understand the newest technology tools and possess the curiosity to help an organization recognize predictive patterns, or anomalies in patterns.  Their skills are valuable and rare.

As academics from Columbia and Stanford universities and UC Berkeley told our summit last month, data science is a curriculum that did not exist five years ago. It is a science that requires multi-disciplinary skills, and most universities are not organized to teach in multi-disciplinary ways. Partnerships like the ones announced in Massachusetts will change that.

Data is the new science. Big Data holds the answers. Are you asking the right questions?

This post originally appeared on

About the Author: Pat Gelsinger