Fast Data goes mainstream

OVUM VIEW

Summary

Fast Data, the velocity attribute of Big Data, is entering the limelight. With Oracle’s recent announcement that its Exalytics data platform has entered general release, in-memory databases have become the latest strategic battleground for major data platform providers. The emergence of Fast Data, made possible by a combination of bandwidth growth, commoditization of scale-out computing architectures, and declining memory prices, is not simply a technology solution looking for a problem. The explosion of data has created a new urgency for organizations across many sectors to gain a complete picture of their environment, rapidly. Technology capabilities and cost curves have hit the inflection point where Fast Data is entering the mainstream.

Fast Data has hit the inflection point

Ovum defines Fast Data as processes requiring lower latencies than is otherwise capable with optimizations that are typically performed on disk-based storage. Fast Data is not a single technology, but a spectrum of approaches that process data that might or might not be stored. Besides in-memory databases, Fast Data encompasses high-performance, low-latency complex event processing (CEP) applications, where data streams are processed in memory to detect otherwise indecipherably sophisticated patterns. It also encompasses hybrid approaches employed by advanced SQL analytic data platforms that optimize utilization of cache and disk in conjunction with columnar architectures that are readily compressed to reduce I/O and table scanning overhead. Increasingly, Fast Data products are delivering on the promise of velocity in Big Data by embracing many of the appliance-based in-memory or hybrid architectures of advanced SQL platforms to accelerate processing of variably structured data.

Fast Data is nothing new, but Moore’s Law is taking the market mainstream

Fast Data technologies and applications are actually nothing new. In-memory data stores have been around since the early 1990s, but their cost has typically restricted them to highly specialized applications such as directories of router addresses for embedded systems, or high-performance event streaming systems developed by investment banks for conducting high-speed trading triggered by patterns in realtime capital markets feeds. In sectors such as process manufacturing, defense, and aerospace, there have long been systems capable of deterministic realtime response that, of necessity, have relied on various forms of cache or memory. Furthermore, caching is hardly unknown to the database world; “hot spots” of data that are frequently accessed are often placed in cache, as are snapshots of database configurations that are often stored to support restore processes, and so on.

What has changed is a combination of several factors. Growth of bandwidth, memory density (where circuit sizes are 10× denser today than a decade ago) and, of course, continuation of Moore’s law with commodity, multi-core processors, have brought pricing of cache or solid state-driven dependent systems to mainstream markets.

Fast Data is the third ‘v’ of Big Data

Of the four “v”s that define Big Data (volume, variability, velocity, and value), Fast Data is the third. Velocity has been primarily associated with:

  • low-latency, high-volume CEP systems
  • advanced SQL analytic platforms, which, for the most part, were initially designed for handling structured data
  • document-oriented NoSQL data stores such as MongoDB or Cassandra which were designed for relatively simple, operational tasks such as file retrieval or user profile maintenance, with rudimentary analytics.

The explosion of data has in many sectors ramped up the urgency of Fast Data. Examples include border security, algorithmic or high frequency trading, smart city management, or social media analytics of realtime consumer sentiment. For such organizations, Fast Data is no longer a luxury, but a necessity.

Vendors are planting their stakes

While most major IT data, application, and platform players have had specialized in-memory products for around a decade, Oracle and SAP are making in-memory the next major stage for their rivalry: Oracle Exalytics versus SAP HANA. Initially, both are targeting a mix of analytic processing applications, including OLAP and the running of more complex, multi-stage problems that would have traditionally required lengthy batch runs. In the long run, SAP views HANA as a cornerstone of its stretch goal to become the second-largest database player by 2015. (See the Ovum report “How SAP Could Challenge Oracle,” for a more detailed discussion of SAP’s database strategy).

The need for complexity and speed is also penetrating the so-called “mainstream’ database market, as platforms add features such as compression, increased caching, and use of high-speed backplanes to improve response. As SAP broadens HANA eventually to more transactive use cases, it will inevitably find Oracle Exadata and, we expect, flavors of IBM DB2, SolidDB, and Microsoft SQL Server, in its competitive radar.

The floodgates are opening to Fast Data solutions

There is growing awareness that it has become feasible to derive operational intelligence from non-traditional (and in most cases, variable structure) data sources such as sensory or machine data or social networking interactions, and so the demand for providing the raw speed to make it real has also grown. In 2012, specialized platforms and solutions will emerge to address unanswered needs with Fast Data.

A good example is graph data, which deduces many-to-many relationships from such phenomena as social network activity, resource demand on urban infrastructure (e.g., transportation, water, or power), or healthcare diagnostics. For consumer brands, typical problems might be who are the opinion leaders in social tribes; for healthcare diagnostics, which patients of similar age or gender have similar symptoms, and so on. YarcData, a spin-off from Cray Inc, has announced a specialized platform that optimizes cache and disk designed exactly for this problem. YarcData is the tip of the iceberg of specialized solutions that will start emerging in 2012 as the Big Data community turns its attention to Fast Data.

APPENDIX

Disclaimer

All Rights Reserved.

No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission of the publisher, Ovum (an Informa business).

The facts of this report are believed to be correct at the time of publication but cannot be guaranteed. Please note that the findings, conclusions and recommendations that Ovum delivers will be based on information gathered in good faith from both primary and secondary sources, whose accuracy we are not always in a position to guarantee. As such Ovum can accept no liability whatever for actions taken based on any information that may subsequently prove to be incorrect.