IBM optimizes R for Netezza TwinFin appliance

Advanced analytics was in the spotlight March 14 as Predictive Analytics World kicked off in San Francisco. The event coincided with a notable announcement: IBM’s plans to include the R statistical programming language in its Netezza TwinFin Data Warehouse Appliance. An enterprise-ready version of open source R statistical programming language, commercialized by Revolution Analytics, will be ported to run on TwinFin. The optimized interoperability will enable customers to perform predictive analytics managed by the high-speed data warehouse appliance. Integration work is already under way, and selected customers should be able to beta test the product in the coming months. This interoperability provides a commercially accepted and supported version of R for TwinFin. It also endorses the need for the serious, advanced analysis of Big Data sets to be underpinned by a scalable data warehousing infrastructure. In addition, it demonstrates IBM’s continued leadership in predictive analytics and its efforts to make this traditionally complex (and expensive) technology mainstream.

Netezza appliance customers now have more options for predictive analytics

This integration will enable IBM Netezza customers to engage in predictive analytics within their data warehouse appliances by using the “Enterprise” version of the open source R statistics language developed by Revolution’s commercial division. R was developed in academia, but is starting to carve out a niche in the business world (finance, life sciences, retail media, and manufacturing) as an enterprise-ready software product, thanks to the efforts of Revolution. However, R is more akin to a programming language than a comprehensive statistical package, like SPSS or SAS. Since the open source community authors many of the R-based functions, these vary by style.

Netezza customers now in fact have three options for predictive analytics. Firstly, open source R is included in the TwinFin i-Class offering for zero cost. Secondly, Revolution and Netezza are also providing interoperability (rather than integration – the core database kernel remains unchanged) between Enterprise R and TwinFin for an associated fee. Lastly, SPSS customers can tap into Netezza to push SQL processing back into the database, with further integration planned for mid-2011.

Ovum believes this is a logical partnership on two counts. Firstly, marrying the statistical power of R requires an economically scalable and high-speed data processing platform to deliver timely, practical business insights. By pushing R code directly into the Netezza data warehouse appliance, users effectively bring analytic processing closer to the data. This, in turn, reduces the likelihood of data movement bottlenecks (and additional infrastructure investment barriers). Secondly, IBM gives R a certain degree of validity and reach in the market that Revolution could not have hoped to achieve itself.

There is certainly scope for “co-opetition” on Netezza between Enterprise R, SPSS, SAS, and other third-party in-database analytics, and IBM seems happy to continue promoting that over its own software. However, it is important to note that Enterprise R is not being offered as a preconfigured part of TwinFin. This is a co-selling model, in which IBM will sell TwinFin and Revolution will sell Enterprise R. However, the companies will engage in joint sales and marketing around the interoperability, which is scheduled for release later in 2011.

Convergence between predictive analytics and Big Data has been on the cards for some time

Ovum interprets this partnership as yet another Big Data move; companies collect data for two reasons – compliance or analysis. Predictive analytics is a process-intensive task that requires a platform to efficiently query and analyze data. By integrating R into the TwinFin appliance, IBM is promising users much faster query results against huge data sets, without moving that data to another discrete system or cluster of commodity servers. Big Data analysis usually comes at a cost. However, one of Netezza’s biggest selling points was a highly competitive price-performance proposition, making the Enterprise R interoperability more than just an attempt to offer SPSS-like predictive analytic capabilities to the masses more cheaply.

Significantly, partnerships such as these also underscore what is still a nascent movement towards NoSQL data management and analysis. It will be interesting to see how IBM and Revolution will develop this relationship as a way to analyze Big Data formats such as Hadoop.

Ovum believes R and Hadoop are slowly creeping into the mainstream, with early momentum noted in financial services and life sciences. However, how companies will eventually combine these two technologies is still up for grabs, and perhaps represents the next big wave of BI and analytic technologies and approaches in terms of how organizations will collect and restructure data prior to analysis. There is plenty of action on the supply side of the market to suggest this:

  • Teradata acquired Aster Data.
  • ParAccel plans to build a Hadoop connector to its analytic database.
  • Cloudera is partnering with several data warehousing vendors (including IBM Netezza) to integrate with Hadoop environments.

These all highlight the need for high-performance appliance and server infrastructures to keep up with R and Hadoop processing frameworks that work against Big Data. The flip side to this is a growing movement of support for MapReduce as a non-SQL programmatic path to querying SQL and NoSQL data. Ovum believes that in the future, most analytic database platforms will support MapReduce in some capacity.

Open water is starting to appear between IBM and analytic rivals

IBM is starting to steal a clear lead in predictive analytics from rivals SAP BusinessObjects and Oracle, although both of these vendors will probably also add support for R as well as Hadoop and MapReduce in the near future.

IBM has an enviable portfolio of analytic resources. In addition to R, the company acquired SPSS in June 2009 for $1.2bn; it is perhaps no coincidence that SPSS co-inventor Norman Nie is Revolution’s CEO. IBM has also embraced R/Python technologies for some time. Both can be wrapped into SPSS predictive analytics solutions to take on the appearance of a standard procedure, which is then easily invoked through the SPSS interface.

Around the same time, IBM launched its Business Analytics and Optimization (BAO) services division, which represented its biggest foray into business consulting since its PwC acquisition. In addition, IBM has its vast Global Services division, with a wealth of technology and industry expertise.

Admittedly, rivals will be hard pressed to match the breadth and depth of IBM’s technology services arsenal. Oracle and SAP are also earmarking BI/analytics as a growth area. Both have recently unveiled impressive releases of their core BI platforms, OBIEE and SAP BusinessObjects 4.0, respectively. Both vendors have been talking up predictive analytics. Although SAP BusinessObjects has been OEMing SPSS since 2007, neither vendor has made its own predictive analytics swoops yet. Ovum believes both will acquire rather than develop their future predictive analytics capabilities. The problem is that they are hardly spoiled for choice in the market. Two options are available: invest in a fledgling startup (not unlike Revolution), or aim to the skies and look to acquire SAS Institute. SAS is by far the nearest predictive analytics rival to IBM in terms of technology depth; it has more or less been the last word in advanced statistical analysis for the past three decades. However, SAS still has its own challenges, not least of which is to put a friendlier business face on its tools and technologies.

The Enterprise R and TwinFin interoperability certainly makes advanced and Big Data analytics accessible to a broader audience – to the point that it might even force SAS (and even SPSS) to redefine positioning. Additionally, the advent of open source software such as R, open access to Internet data stores (Google, Facebook, Twitter, et al), and cloud computing also promises a low-cost path to Big Data analytics. This does not preclude organizations at the low end of the market that want to engage in high-end analytics but often find they are priced out. Similar parallels can be drawn with the evolution of Java EE application servers, in which open source and alternate frameworks provided similar low-cost paths many organizations considered “good enough” to satisfy advanced needs.