SAS adds in-memory to high-performance computing



SAS is nearer to rolling out an expanded set of high-performance computing (HPC) solutions, which it announced in April 2011. The HPC solutions use existing SAS technologies around grid computing and in-database processing to speed up analytic processing. The company has since added a third wheel to that – in-memory computing – which gives IT organizations more architectural options to deliver rapid analytics to end users. SAS already has technologies and products in all three areas, and has now built a trio of HPC solutions that are delivered through select technology partners. In consultation with SAS, customers can decide which high-performance approach – grid, in-database, or in-memory – to employ.

The business case for a higher-performance analytic infrastructure is becoming clear

SAS has devised an HPC framework aimed at several challenges faced by its analytic customers: managing Big Data (rapidly growing data volumes and rising data complexity), meeting expectations of real- and near-realtime response, and overloaded or underused computing hardware resources.

Traditional enterprise data warehousing systems that have served companies well in the past are starting to crack under the load, particularly as new types of data (characterized by high volume, velocity, and variety) come flooding into corporate IT systems and databases.

With the different capabilities required for analytical processing, limitations have driven the market to look at supplementing EDW with different storage requirements.

SAS’s answer is HPC. However, Ovum believes it should be talking about enabling high-performance analytics (HPA) in different ways. The key aspects of HPA are speed and completeness, or technically enabling rapid ad hoc modeling against complete data sets. Analytical modeling, which is still a grossly misused and misunderstood term in IT, is the foundation of analytics. Being able to move that process along quickly and populate models with data is key. Removing the need for guesswork – reducing the time for investigating and modeling the data – from what is already a complex equation is also paramount. One way to achieve that is to enable greater insight by either using all the data and all the variables or allowing increased variable interactions – or a combination thereof.

Hence, HPA systems need to support modeling against not only complete data sets, but also the number of variables and interactions that can be supported as part of a process for delivering better predictive models for risk analysis, fraud detection, optimization, and other needs. “Complete” is the key word here; in the past, advanced analytics required sampling of data. Of course, complete data is ubiquitous with Big Data. However, the main implication for HPA is that it enables customers to model, and with any luck solve, problems they might not have been able to before.

Ovum envisages a number of scenarios, some of them classic SAS use cases, in which this might be applicable. This is particularly across sectors such as banking, insurance, and retail, which are terabyte ridden and require complex analytic modeling.

High on the list of scenarios that SAS is targeting are: reconciling the varied analysis needs, particularly for advanced analytic routines that choke the performance of daily BI reporting jobs; reducing the need to move large amounts to and from data warehouses or specialized analytic appliances or databases; and specialized analytics such as predictive risk analysis, fraud detection, price optimization, investment portfolio valuation, and optimization, which require complex modeling and scoring, and rapid return of results (often in seconds). These scenarios are ideally served by an HPC infrastructure built around grid, in-database, and in-memory technologies.

There are now three main parts (or options) to HPC

To meet these demands, SAS has pulled together three very different types of processing architectures into HPC. They are based on old and new technologies:

  • Grid computing. SAS is building on an existing offering called SAS Grid Manager (which was launched approximately five years ago). This technology uses an OEM of cluster, grid, and cloud management software vendor Platform Computing’s technology to create a distributed grid environment that provides workload balancing, high availability, and parallel job execution across multiple servers, with shared physical storage to process large volumes of data and analytics programs.
  • In-database processing. Historically SAS partially exploited the capabilities of individual databases to bump up performance, preferring a common-denominator approach. However, that is changing. Approximately four years ago the company announced SAS In-Database initiative, which initially targeted Teradata but is now expanding to a wider range of database providers such as IBM DB2 and Oracle, as well as more specialized analytic database providers such as IBM/Netezza, EMC/Greenplum, and Teradata/Aster Data. Not all of the Base SAS procedures have been translated into SQL for execution in the database, but SAS is working on ramping up that range. SAS has worked to in-database enable analytic data preparation, model development, and model deployment steps to leverage a massively parallel processing environment offered by databases. It has also developed an extensive in-database portfolio for Scoring Accelerator, Analytics Accelerator, and Anti-Money Laundering. It has even extended the ELT capabilities in its enterprise data integration product to take advantage of in-database processing.
  • In-memory analytics. SAS High-Performance Analytics, based on its in-memory processing portfolio, was announced in 2011 and is delivered as an HPC appliance in conjunction with partners such as EMC Greenplum and Teradata. However, SAS products such as SAS High-Performance Risk and SAS High-Performance Markdown Optimization also incorporate in-memory as software-only solutions. Regardless, in-memory is now the latest addition to SAS’s HPC framework, allowing SAS Analytics (and even entire analytic applications) to run in RAM on MPP-driven database appliances. The software distributes computational processing in memory and in parallel across a dedicated set of blade server nodes that communicate via message passing interface (MPI). The first SAS In-memory systems are aimed at finance (for risk management and retail markdown optimization).

SAS is offering customers different options for HPC

To enable HPC, SAS has had to dissect and in some cases retrofit its software to some of these processing architectures.

Grid computing is certainly one of these. The push to grid enable more of the SAS software portfolio gives customers a way to create and run a centrally managed analytic infrastructure that scales out and delivers high availability, response, workload management, and scheduling at the same time. Although SAS Grid Manager is the lynchpin product for this, other SAS technologies such as Data Integration Studio and Enterprise Miner are also pre-calibrated for parallel processing in the SAS grid environment. A handful of other SAS products, including Enterprise Guide and Risk Dimensions, can fire off processing tasks to a grid for shared computing resources. This is done simply by appending a few lines of “grid-enabling” code, or so SAS claims. The “SAS grid” is underpinned by a framework that lets a SAS Metadata Server connect to multiple SAS servers interlinked with shared storage/SAN capabilities and relational databases. Analytic models are then targeted and run across this shared pool of resources.

SAS is increasingly seeing its in-database (or, more precisely, in-data warehouse) as a way to tackle the highly dynamic issues of data movement that occur in high-volume operational analytics environments. Reduction of unnecessary data movement and latency is the goal, and putting analytic processing (both the data preparation and even a large chunk of the modeling) as close to the data as possible (i.e. in the database or data warehouse) is seen as the answer. In this respect SAS’s in-database strategy, meanwhile, continues to expand, with more integrated functions and databases supported. SAS has worked hard on the integrations to allow an increasing number of analytic products, such as Base SAS, SAS ACCESS, Scoring/Analytics Accelerator, and SAS Anti-Money Laundering, to connect and tap into the native engines of all the leading the major RDBMS vendors, the leading hardware and software warehouse appliances, and several columnar databases including Oracle, Aster Data, EMC/Greenplum, IBM/Netezza, Teradata, and others. SAS takes full responsibility for engineering the integrations, the depth of each will depend on customer demand and the depth of partnership. Currently the level of support can be split by simple and advanced functions, the latter adding procedures such as data preparation, data exploration, and analytic modeling that run end to end.

The addition of in-memory comes as no surprise given the range of general-purpose analytic products on the market that now incorporate that technology – including QlikTech, IBM Cognos (TM1), Tibco Spotfire, and MicroStrategy. These are more BI reporting-focused solutions. Integrating SAS In-Memory as an HPC appliance is a more immediate response to, perhaps, SAP’s HANA (High-Performance Analytic Appliance) and Oracle’s newly unveiled Exalytics appliances, with the key differentiator being the ability to do complex (often predictive) modeling in memory as well.

Like many of its analytic rivals, SAS believes in-memory will be a game changer for high-end analytics. One differentiator for SAS could be its application-specific strategy, in which SAS brings in-memory processing to bear on a raft of horizontal and vertical analytic applications packaged up as an appliance. SAS’s move into in-memory could also be a sign that it has reached a ceiling with its in-database HPC strategy, particularly for analytic modeling, which is difficult to parallelize in some MPP database systems such as Teradata. This problem used to stem from the limitations of user-defined functions, which SAS has since swapped out with an approach that involves SAS Embedded Process in the database (which is basically a light-weight SAS Server) and running DS2 code (a new parallel, object-orientated language) directly in that database. Clearly there is still a strong need for in-database processing, and this is a complementary option to an in-memory approach. Ovum envisages that both techniques will be used through the end-to-end process of modeling.

Will customers be stumped for choice?

HPC will provide flexible options for customers that wish to pursue its high-performance analytics vision. However, the challenge will be to help them determine which is best suited for their specific needs. Do customers process data over a grid, in-database, or in-memory?

Ovum believes the architecture can be hybrid, especially since many customers will have mixed workloads and tasks, multiple users with varying skill sets, and different business problems they are trying to resolve. This requires a flexible architecture, whereby certain types of complex analytics that require fast processing and response are farmed out to one of SAS HPC’s grid, in-database, or in-memory options across a single server-based system that is separate from routine BI analysis and reporting. However, we also believe the company (and customers) should tread carefully into in-memory analytics, particularly the breed of RAM-centric appliances that SAS and others have in mind. Pushing heavy-duty parallel processing of large data sets directly into memory is no mean feat. It also raises questions about data quality and consistency.



All Rights Reserved.

No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission of the publisher, Ovum (an Informa business).

The facts of this report are believed to be correct at the time of publication but cannot be guaranteed. Please note that the findings, conclusions and recommendations that Ovum delivers will be based on information gathered in good faith from both primary and secondary sources, whose accuracy we are not always in a position to guarantee. As such Ovum can accept no liability whatever for actions taken based on any information that may subsequently prove to be incorrect.