Big Data: what’s hot, what’s not according to the Twitter stream



Because (or in spite) of the hype, sentiment about Big Data vendors was generally bullish in 2012. The attention spilled over from IT to the business media. These were among the findings reported by DataSift, which conducted a retrospective analysis of vendor mentions on Twitter during 2012 for Ovum. To some extent, the results were surprising: while Hadoop garners much of the spotlight as a Big Data platform, the vendor 10gen, which develops MongoDB, came in second in mentions to Apache, which hosts the Hadoop project. Although only peripherally a Big Data story, HP and Autonomy was the biggest negative story of the year. The data provided by DataSift provides a good example of how social media mining provides a useful snapshot of popular thinking that supplements – or replaces – the traditional role of marketing focus groups.

Mining Twitter for insights

Traditionally, brand recognition studies used focus groups – with data often correlated to actual sales – to qualify and quantify how a company and/or product is perceived, and why. The rise of social networks has provided a valuable new source of data that is selected, not by scientific sample, but by participants themselves; they vote with their keyboard on whether they will say something for public consumption.

DataSift gained fame as one of a handful of companies authorized to syndicate the entire stream of public Tweets, totaling more than 400 million tweets every day. To enable companies to mine insights, it built a platform to allow companies to create filters to mine and categorize vast volumes of social data, and deliver it into business intelligence tools for further analysis. Today, Twitter is one of several social media streams that DataSift analyzes. Twitter is a surprisingly rich stream of information; although the 140-character messages are often cryptic, they are supplemented by over 70 metadata tags that enrich the data, along with details about URLs that are often shared in tweets.

DataSift conducted a retrospective analysis of Big Data vendor mentions during 2012 to quantitatively analyze brand recognition. By restricting the search to vendors, the analysis focused on perception of the Big Data market, as opposed to the perception of Big Data among the general public. In all, the analysis reflected 2.2 million Twitter interactions from more than 981,000 authors.

Big Data spills over to the business world

With links present in 70% of Big Data posts mentioning vendors, media sites were frequent targets. Analyzing link targets, DataSift confirmed what had been anecdotal evidence: awareness of the Big Data technology market has crossed over from IT to the business world. The most frequently cited media source was, the online site of Forbes magazine (a US business journal). While technology news portals GigaOM and Techcrunch followed Forbes, another major business media source – the Harvard Business Review blog site – edged out popular IT news portal ZDNet.

Which vendors are hot?

Given the hype around Hadoop, it shouldn’t be surprising that the Apache Foundation – which hosts the Hadoop open source project – was the most frequently cited “vendor,” accounting for 9.4% of the posts. But the most interesting finding was 10gen’s strong showing, just behind Apache, at 6.2% of all posts. Although MongoDB is not known for storing high volumes of data, it is associated with variety, given its schemaless architecture. The popularity of the 10gen brand is attributable to the fact that MongoDB has become for web developers the document equivalent of MySQL; it is open source, built in a language (JavaScript) that is highly popular among web developers, and relatively simple to develop. Ovum believes that the popularity of 10gen is more indicative of the future of web development rather than Big Data, per se. We view 10gen as becoming the non-transactional database successor to MySQL in the world of web developers. Following Apache and 10gen were (in order) IBM, HP, Teradata, Splunk, Oracle, Cloudera, Amazon – and then DataSift (SAP and Hortonworks ranked immediately behind DataSift).

Not all the attention was positive. While positive mentions of Big Data vendors outnumbered negative mentions by 3:1, negative sentiment spiked in November with headlines over HP’s troubled acquisition of Autonomy. Not surprisingly, given that vendors accelerated the pace of product announcements during 2012, 60% of Twitter activity occurred in the second half of the year.

The attention was not necessarily uniform by country. While conventional wisdom is that the US is the leading market for Big Data platform installs, the Japanese, Germans, and French were often far more vocal on Twitter. By company, there were some conflicting trends. While companies such as SAP, DataSift, and Splunk found the most mentions in their home countries, the opposite was the case for the Apache Foundation and Cloudera, where Japan was the most vocal; 10gen, where France and Japan were the most represented; and IBM, which drew more mentions out of France. If social network chatter is indicative, investments by startups such as Cloudera and 10gen in less “sexy” (or stagnant) markets like Japan appear to be paying off.

Big Data is a global phenomenon

Ovum’s Big Data survey, conducted in 2011, showed that the US was leading the way in Big Data implementation. Since then, most vendors have reported to us that the US was also their most mature market.

Yet the discrepancies with vendor mentions by region suggest strong latent interest in the next tier of national markets. Without question, although the Big Data market in the rest of the world may not be as well developed as in the US, the curiosity is clearly there.



Tony Baer, Principal Analyst, Ovum IT Enterprise Solutions

Further reading

2013 Trends to Watch: Big Data, IT014002651 (October 2012)


All Rights Reserved.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior permission of the publisher, Ovum (an Informa business).

The facts of this report are believed to be correct at the time of publication but cannot be guaranteed. Please note that the findings, conclusions, and recommendations that Ovum delivers will be based on information gathered in good faith from both primary and secondary sources, whose accuracy we are not always in a position to guarantee. As such Ovum can accept no liability whatever for actions taken based on any information that may subsequently prove to be incorrect.