How Will Twitter’s API Restrictions Affect Big Data Applications? DataSift Helps Clarify What’s Going On

Earlier this month, Twitter stepped up its enforcment of an API restriction that does not allow the building of ‘client apps that mimic or reproduce the mainstream Twitter consumer client experience’. Around the same time, LinkedIn announced that its users would no longer be able to display their tweets on LinkedIn. Will LinkedIn’s user experience be negatively impacted? What might happen to other applications? !

Fortunately, for every door that closes, a new one opens. DataSift is one of a select few companies that Twitter has entrusted to resyndicate and provide access to the full Twitter feed for use in internal analytics applications. Essentially, Twitter is leaving the door open, but through a process managed by third parties. There is one catch – you have to pay a fee for access. The good news is that the fee comes with value added platform services from DataSift, targeted at your specific industry and business need.

What I found most interesting about DataSift is their vision. DataSift believes that every entity (small, medium, or large) should have the ability to take advantage of Big Data, and especially Social Data. As DataSift Founder & CTO Nick Halstead puts it, “We are trying to help democratize the Big Data industry to enable entrepreneurs and enterprises to easily create socially-intelligent applications. No data-scientists required, no Hadoop expertise needed.”

I was so moved by DataSift’s mission, that I used my social network to contact Rob Bailey, CEO of DataSift, to learn more on how the company can bring Big Data even to the little guy for use in analytics platforms.

1.  Anyone can access and manipulate Twitter data using publicly available APIs. Why would an organization pay for access to Twitter data through DataSift?

Twitter places significant restrictions to direct API access such as maximum number of tweets consumed per day and what you can do with the data in order to control how the data is used. This may work for small organizations doing testing, but does not work for large organizations that need to scale and build enterprise grade social media analytics applications such as social media monitoring.

As a licensed Twitter resyndication partner, DataSift enables companies to access billions of public social conversations on Twitter, as well as Facebook, and other social networks. For example, our Historics service allows access to 90 billion tweets dating back to January 2010 for analytics purposes and goes beyond simple keyword search to filter by demographics, sentiment of Tweets, location, Klout score, and more. The value here is that organizations can tap into both real-time and over two years of historical Tweets to discover insights and trends that relate to brands, businesses, financial markets, news and public opinion. We are trying to help companies understand their customers and their marketplaces better.

Direct API access can also be quite costly and complex for a company to build a scalable infrastructure. Our cloud service does all of the heavy lifting, with the proper privacy controls and intelligence around the data so companies can quickly leverage social data to make discoveries never before possible.

2.  Twitter works with a very small group of trusted partners to sell access to data on Twitter’s behalf, each partner servicing a different pain point or need in the market. Which needs does DataSift address?

As Twitter’s CEO has publicly stated, Twitter is really busy growing its user base which is growing explosively and building a business around advertising. It is less focused on monetizing its data through the analytics space, which is a different sized opportunity. As a result, Twitter has formed partnerships with a few select companies to sell Twitter data, while still maintaining some control as to how the data can be used. DataSift is one of only a handful of licensed resyndication partners focused on providing an open data platform enabling companies of any size – entrepreneurs to enterprises – to easily create socially-intelligent applications.

3.  There are considerably more active Facebook users than Twitter users. However, unlike Facebook where most user data is private, Twitter data is public. As a result, do you believe that having access to an entire pool of Twitter data provides a more accurate and reliable source of information over a smaller pool of Facebook data?

Both social media platforms are equally important for different reasons. Since Twitter is a pubic forum, people do more broadcasting and sharing, especially for news. It is the most important source of social data that our customers are requesting. Because Facebook is not as open, the platform is mostly used by brands to build deeper customer relationships. We support data captured in Facebook Fan Pages, for example, where brands interact with their customers.

4.  Not all high value consumers use social networks to express their likes and dislikes, product feedback, and more. Do you think the composition and distribution of users expressing their opinions on social networks such as Facebook and Twitter can actually lead brands to focus on sub-optimal strategies?

Twitter, Facebook, Pinterest, and other social networks are all awesome sources of data for companies looking to understand their customers and their markets better. However, companies should conduct their own analysis as to the relevancy each social network has to their specific business and the quality of the insights that can be derived from those sources.

To be clear, we don’t think companies should disregard their old strategies and purely use social data to formulate strategies, but rather they should pay close attention to relevant social networks on an ongoing basis and regularly reevaluate what insights they can glean from those sources.

5.  I was blown away with DataSift’s ability to track in real time what Twitter users were saying about Facebook’s IPO throughout the day and graph the results against Facebook’s stock price. The conclusion you can make from graph below is that DataSift’s social signal was a good trade indicator. So this is just a day’s worth of data. Imagine what predictions organizations can make by modeling years worth of Twitter data? How do you see organizations taking advantage of DataSift’s Historics to go beyond simple monitor and trending to predictive analytics?

The usage of Twitter data for predictability really depends on the question that is being asked. In some cases it is more predictive and in some cases it is less predictive. A good use case for predictability is around consumer products and services relevant to Twitter users. Taco Bell, for example, has a large following and can use past positive sentiment to predict which new products will be well received. On the other hand, HP would probably not use Twitter to predict the success of its Scientific Calculator since not many people tweet about this product. Similarly for predicting success of box office movies, some movies may or may not be relevant. A science fiction movie targeted at ages 20-35, such as Prometheus, is more relevant for predictability since this audience is very active on Twitter and will more likely tweet about the movie. A drama targeted for ages 55+ and up will probably have a relatively lower level of social activity.

6.  Looking at your current client base, what segments of the market (industry and/or LOB) are mostly using social data for Big Data analytics and why?

Technology companies and consumer brands are at the forefront of using social data. And even within these two segments, some are more active than others. For example, is a heavy Twitter user versus IBM, a company that seems to have a lower level of external activity. (Salesforce, a much smaller company, has 3x more followers than IBM on Twitter.) Virgin America is very active versus United Airlines. The amount of activity and adoption of Twitter is really driven by a company’s culture and starts from the top down. Richard Branson, for example, is always on Twitter. This drives Virgin America employees to join Twitter, follow their CEO, and start engaging on Twitter.

The financial industry uses Twitter as a social signal for trading. In fact, we will be launching new services specifically for the financial industry in the next few weeks.

There are also less obvious industries such as agriculture that are becoming more active on Twitter as well. If you think about it, weather updates, crop pricing, new farming products are all very useful for the farming industry and that information can oftentimes be found first on social networks. I am constantly amazed at the types of information that get communicated via social channels.

7.  What Big Data technologies are you using to deliver powerful social data tools to clients?

We are not using Amazon Web Services for our core platform and services. We have our own data center in Europe managing a very large Hadoop cluster. In fact it is the largest Hadoop cluster in Europe.

8.  How are you using Big Data technologies to drive new products and services in the future for your clients?

Because we have our own data center, we have the flexibility of constantly innovating with cutting edge technologies so we can offer better products to our customers. We were able to offer Historics, providing clients high-speed access (<3ms) to real time and 2+ years of historical Twitter data, because we architected an Internet-scale, real-time data processing platform. Before Historics was available, there was only one product in the market that provided access to 30 days of Twitter data, delivered through Amazon Web Services.

9. Many see Big Data as an opportunity to improve the quality of life such as being able to predict the right cancer treatment, predict the next placement of a large-scale wheat farm, etc. What is exciting to you about how social data can be leveraged for Big Data to improve the quality of life?

The transformation of the news industry through Big Data is very exciting. I am a news junkie and used to consume up to 30 different news magazines a month. Now I consume most of my news through Twitter whereby its social nature delivers more granular, relevant news. I still get a few magazines per month, but I am gradually finding that Twitter is the first place I go to get my news. So from a news consumer standpoint, Twitter provides a better channel for fresh, high quality, relevant news.

From a news publisher standpoint, Big Data is being used to analyze how news is consumed and diffused through social networks such as Twitter so publishers can adapt news content accordingly. For example, Big Data is being used today to analyze what factors contribute to a news piece going viral. It has been found that positive, uplifting news is a key factor for viral news; therefore, news organizations can take this data and publish more positive news pieces. We will see more news organizations go bankrupt if they don’t leverage Big Data to transform their news business.

10. What books are you currently reading on your Kindle or if you are still paper based like me, what books are stacked on your nightstand?

I haven’t bought a paper book in 2 years. On my Kindle, I am reading ‘In the Plex’ which is a story about Google, ‘The Handbook of News Analytics’ because we are doing a lot of news analytics, and ‘Born to Run’ which is about ultra marathon runners.

To learn more about Datasift and sign up for a free trial, visit

About the Author: Mona Patel