Aster Data helps Teradata embrace Big Data

OVUM VIEW

Summary

Despite running the same week as Oracle OpenWorld, Teradata’s annual Partners User Group Conference & Expo on October 2–6 still managed to draw in approximately 3,500 delegates (up from 2010), which made it the largest data warehousing gathering in the world. The headlining act was, unsurprisingly, Big Data – specifically, Aster Data, an analytics software vendor that Teradata acquired earlier in 2011. That move and a subsequent focus on Big Data are predictable given Teradata’s traditional heritage in high-end data warehousing. What is interesting about Aster Data is how Teradata is positioning its analytics technology – as a bridge between the Hadoop/MapReduce programming model for processing Big Data sets and traditional enterprise data warehousing (EDW). Hadoop’s strength is in the simplicity of its file system, which can have any type of data thrown against it. Its weaknesses are that it is still largely a hand-coded model, lacks enterprise-grade management tools, and does not yet possess a vibrant community of developers and users that flaunt skill sets around MapReduce. Teradata wants its customers to grasp the benefits of both worlds: one to enable them to embrace the no-data-boundaries world of NoSQL that is based on Big Data analytics, and the other to retain the value of traditional EDW investments.

Aster Data pushes MapReduce into Teradata’s data warehousing strategy

In March 2011 Teradata paid $263m for Aster Data, one of the more innovative analytic startups, to kick-start its own Big Data strategy. That should come as no surprise. Teradata, as a high-end data warehousing solution, has always focused on large data sets. However, the company also had the stigma of being a conservative “old-school” vendor, focused on traditional SQL-based enterprise data warehouses used by the likes of Wal-Mart.

The advent of Big Data and the surrounding technology hype around Hadoop and MapReduce have forced Teradata to sit up and take note of this radically different processing model for analyzing huge data sets. Hence, bringing Big Data to traditional EDW-driven analysis – specifically, enabling new classes of analysis, which are driven by new types of NoSQL-based algorithms running alongside traditional data sets – is the challenge that Teradata is addressing.

The opportunity that Teradata has spotted is to sit in the middle, between its own EDWs and Hadoop clusters. The company is now using Aster Data’s original nCluster appliance to provide a bridge between its own scalable, SQL-based data warehousing platform and the NoSQL Hadoop/MapReduce programming framework. Hadoop is an open-source implementation of the MapReduce paradigm. Aster Data has implemented MapReduce natively on a relational database model, which allows developers to write MapReduce functions as procedural functions that can then be expressed in SQL. The Aster Data SQL-MapReduce framework combines SQL with prepackaged MapReduce modules that can be run through Aster Data’s integrated MapReduce environment. The data processed can be accessed through standard business intelligence (BI) and analytic tools.

So why the need to step outside of SQL in the first place? First, SQL is not that suited to analyzing log files and unstructured data (such as Gmail messages, which Google analyzes to push targeted ads). Second, SQL is not fast enough to meet the expected response times for complex multidimensional queries, which typically require complex joins and multiple-pass SQL. (This can be done in SQL, but it is not very elegant). Third, the MapReduce framework allows analysis tasks to be broken down into more efficient distributed computing tasks that can be defined with much less code than SQL.

Additionally, it would not be desirable if developers were required to “unlearn” SQL and relearn NoSQL. Teradata’s Aster Data implementation eliminates most of the switching costs. Hence, Teradata customers eyeing Big Data analysis benefit because they can execute MapReduce code while retaining enterprise-grade administration of SQL-based data warehousing environments that most companies are comfortable with.

However, there are also problems to consider with Hadoop and MapReduce. Hadoop’s underlying HDFS might be flexible, but it certainly lacks the reliance and high availability/failover that IT organizations expect – including service-level agreements, security, and data quality. Some of these problems have been solved by Cloudera and other “commercial” versions of Hadoop, which offer management capabilities such as no single point of failure and high availability.

Although MapReduce can be powerful, that power comes at a price: MapReduce functions are usually written in programming or scripting languages such as Java, Python, C or Perl, and Fabric and Couch, which are more the domain of IT developers than business analysts. Several IT luminaries have already labeled MapReduce a backward step in programming evolution because it (once again) strips away the level of abstraction that insulates analytic tools and end users from the data and database layer.

Nevertheless, the architectural fit between the two vendor platforms is tight. Teradata and Aster Data have similar petabyte scale-out, shared-nothing, massively parallel processing architectures. Both also support elastic cloud resource provisioning for mixed workloads in federated EDW deployments. Significantly, they both now integrate with external Hadoop Distributed File System (HDFS) clusters for analytics against unstructured sources.

Teradata releases Aster MapReduce appliance and next-generation data warehousing appliance – but separately

Aster Data was a company primarily focused on evolving MapReduce into a more corporate IT development option; currently there is an army of Hadoop developers rolling out their own kinds of analytics. Aster Data’s patented SQL-MapReduce framework within the Aster nCluster (later branded Aster Database 5.0) was really a first proof point to show that MapReduce could be reconciled with SQL and work effectively with traditional RDBMSs.

Following its Aster acquisition, Teradata has been busy developing a new generation of Big Data engineered offerings. At the end of September 2011 Teradata released its first Aster Data-based database and new MapReduce implementation for Big Data, called the Teradata Aster MapReduce Platform. This is basically an adaptive bridge between the MapReduce and traditional EDW worlds that allows Teradata customers to add their own custom MapReduce modules or import existing MapReduce jobs. This platform is available as software or a cloud service, or in pre-tuned appliance form.

At Partners Teradata also announced the fifth generation of the Teradata Data Warehouse Appliance 2690, which runs the Teradata 13.10 database and higher. (This means customers do not have to use the latest Teradata 14 release). The appliance is still designed to be deployed separately in a classic Teradata EDW use case. However, Ovum would like to see an integrated option for 2690 customers to run the Aster SQL-MapReduce Framework as part of their licenses; currently they need to purchase the Teradata Aster MapReduce Appliance.

The 2690 appliance triples the data storage processing capacity and doubles the performance of its predecessor, thanks to the inclusion of innovative compression algorithms that operate at the data storage block level to reduce space and (in the upcoming Teradata Database 14) Teradata Columnar. It can be configured from 2TB up to 315TB of uncompressed user data per cabinet and process data at more than 38GB per second, per cabinet. The guts of the system comprise a Suse Linux Enterprise Server 10 SP3 and two six-core Intel Xeon X5675 processors. This appliance is designed to meet a diverse range of analytic needs and use cases that require incremental levels of performance.

Teradata Columnar is a relatively new capability introduced into Teradata Database 14 that allows for hybrid row-column table processing. To ease management of this hybrid environment, Teradata has also revamped the interface to make it more business user friendly; you do not need to be a database administrator to navigate the data. Teradata also claims that the appliance uses 60% less energy and has a 50%-lower footprint than previous versions.

The Teradata 2690 appliance will be available in the first quarter of 2012. Pricing has not yet been set. Admittedly, it is getting easier for customers to get lost in the maze of branding around Teradata’s ever-expanding appliance family, which now includes Data Mart, Extreme, Data Warehouse, Extreme Performance, and Active Enterprise. However, Ovum expects to see more integration with Aster Data across many of Teradata’s appliance lines during 2012.

Aster Data will reinforce Teradata’s reputation at the high end, but Teradata needs to keep an eye on the low end

Aster Data is not focused on strengthening Teradata’s presence in the market for mid-market EDW solutions. It is clearly focused on bumping up its high-end credentials. However, at the same time it will help Teradata grow its reputation as an innovative rather than conservative force in EDW, today and in the future.

Following conversations with Teradata executives, Ovum believes the company is ready to target considerable R&D resources at Aster Data and is continuing to work hard to bring Aster Data’s development team into its own development labs. Hence, Teradata is likely to pull back on its cooperative agreement with IBM on BigInsights in order to bring MapReduce and Hadoop to customers.

Aster Data will certainly benefit from Teradata’s global marketing, professional services, and partner reach. Ovum expects Teradata and Aster Data to continue to rationalize their own portfolios, with Aster Data’s SQL-MapReduce API to become a Teradata “standard” that will allow other products and appliances across its portfolio to tap MapReduce functionality where it makes sense to do so. At the same time, Teradata has learned to lower its sights. The company has addressed the competitive challenge at what Ovum refers to as “the low end of the high end” of EDW, against rivals such as IBM Netezza, with aggressive price cutting and an expanded set of appliance options. In particular, the company has strong emphasis on its 2600 series EDW appliances. It should continue to do so since this is, perhaps, a better growth opportunity than its traditional high end.

Ultimately the success of Teradata’s Big Data foray will be its ability to align more closely with the needs of the business analyst (and the technical needs of BI and data warehousing professionals) rather than the developer. Ovum believes that if that happens, Hadoop/MapReduce will hold great promise for large-scale BI, analytics, and EDW.

APPENDIX

Disclaimer

All Rights Reserved.

No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission of the publisher, Ovum (an Informa business).

The facts of this report are believed to be correct at the time of publication but cannot be guaranteed. Please note that the findings, conclusions and recommendations that Ovum delivers will be based on information gathered in good faith from both primary and secondary sources, whose accuracy we are not always in a position to guarantee. As such Ovum can accept no liability whatever for actions taken based on any information that may subsequently prove to be incorrect.