Start a Conversation

Unsolved

This post is more than 5 years old

2500

March 29th, 2012 11:00

What advice would you give EMC customers looking for improved performance and/or big data analytic warehouses considering SAP HANA?

SAP’s HANA combines compute, memory, compression, columnar indexing, non-disruptive updates, parallel processing, and advanced Intel chip functionality.  That combined with SAP Business Objects and Mobility makes for a compelling business case for a one stop provider that only gets better as memory prices plummet and clustering increases in-memory scalability. 

SAP and EMC customers are asking compelling questions and looking for guidance.  What advice would you provide to answer these questions?

  1. Is the in-memory market stable enough to jump in for any of the major vendors?
  2. Besides SAP HANA, Exadata, Teradata, Netezza, and Greenplum, are there other vendors that can do high performance analytics?
  3. Should I wait for HANA to mature if I have the luxury and if so until when?
  4. If I decide I need the analytical performance now what are the best criteria for picking a vendor?
  5. What questions should I ask each vendor?           
  6. What are the strengths and weaknesses of each solution from storage only analytics to all memory and everything in between?
  7. What are the key differences between HANA, Exadata, Teradata, Netezza, and Greenplum?
  8. Does one solution handle unstructured better than another?
  9. Are any of these solutions better for a true cloud solution including mobility from cloud to cloud?
  10. Do these products address true data integration and archive retrieval?
  11. How can EMC best become an in-memory appliance player no matter what the customer chooses?

8 Posts

April 4th, 2012 08:00

From: Lavallee, John
Sent: Wednesday, April 04, 2012 10:59 AM
To: Stone, Allan
Subject: RE: ECN Discussion Any comments?

Hello Allan,
Did you get any answers?

I have asked the same question a few times and got some good answers.

Once during a SAP eco-system call focused on HANA (a good while ago, organized by James Franklin)

And more recently when I attended an EBC in Hopkinton at a Greenplum meeting with George Radford, Field CTO.

I'd recommend that you contact George.

In summary:

They are not actually competing with each other currently.

Those going with the "appliance" (HANA) are focused on acceleration SAP reports in a specific SAP location. 

The play here is relatively simple – plug/play from an installation and professional services offering.

HANA pulls from the SAP instances / data and accelerates reporting.

Those interested in using GP are pulling together SAP + other diverse data sources to create a different kind of pool to have a "bigger picture". 

It's a larger play (from a ETL perspective, NOT from a cost perspective) with a different kind of Professional Services engagement but also a different result – larger/more powerful yet a very different engagement.

Regards

John

John Lavallée Partner Technology Consultant - Focus Partners EMEA

EMC Deutschland GmbH

Mobile:      +49-163-6895020

5 Practitioner

 • 

274.2K Posts

April 5th, 2012 10:00

1.  Is the in-memory market stable enough to jump in for any of the major vendors?

In-memory databases are a recent, but not new phenomenon.  HANA is pretty new, originally announced in 2009, as I recall.  Other products, particularly popular in markets like Financial Services have been around for years, offering in-memory OLTP.  But in-memory analytical database are a new and distinct category.

2.  Besides SAP HANA, Exadata, Teradata, Netezza, and Greenplum, are there other vendors that can do high performance analytics?

You should add SAS High-Performance Analytics to the list.  The product targets the same advantages as HANA. 

Where SAP has the tendency to be biased towards SAP data sources and Business Objects tools, SAS HPA is more open, but is a product of the SAS product family and that implies some biases of its own.

SAS HPA, like SAP HANA, targets the very high-end analytics problem sets, where data scale is modest (1-20 TB) but computational performance needs are extreme. 

The rest of the vendors in your list with the possible exception of Exadata target a balanced problem set – high analytical computational capacity with very large data scale handling capabilities.  Exadata targets a combined OLTP - BI / EDW workload more than it does the Big Data analtyical challenges of Greenplum and others.

3.     Should I wait for HANA to mature if I have the luxury and if so until when?

First, you should, as your question suggests, consider urgency and problem set.

A great oversimplification is that memory-based databases are going to focus almost entirely on the speed of computation, but will be limiting in data scale, due to the cost per TB of main memory, on which they are dependent.

MPP databases such as Greenplum also achieve great improvements in computation speed, address data movement costs by doing computation in-database, e.g. without moving the data, and provide cost-effective data scaling into the petabyte range.

So, if you’re totally focused on blazing-fast computation on modest data, look at in-memory databases like HANA and SAS HPA.  If you need fast (but not fastest) computation, but also larger data scale, then you should consider MPP analytical databases like those in the Greenplum Unified Analytics Platofrm or UAP and its competitors (of course, I do work for Greenplum, so I should not mention competitors.)

4. If I decide I need the analytical performance now what are the best criteria for picking a vendor?

This is way too long a question to answer here, but EMC and Greenplum can provide you with a significant set of evaluation criteria and an explanation of how Greenplum meets them.

Among the available options, you should not stop at evaluating the database platform, because your choice will affect you down the road as you embrace new requirements such as ingest of new data types – structured and unstructured, take on new analytical challenges that may require different execution platforms like Hadoop, and enterprise stability and up-time requirements as your analytics become ubiquitous and therefore mission critical. 

5. What questions should I ask each vendor?          

Again, EMC and Greenplum will be happy to provide a list of questions that we feel are important. Others will provide other questions, but most vendors evaluation criteria would be 70% the same.

6. What are the strengths and weaknesses of each solution from storage only analytics to all memory and everything in between?

Again, we write volumes on these topics, and they are not easily simplified because various user situations demand widely varying solutions. 

But the most key point is that MPP solutions like Greenplum, and in-memory solutions like SAS HPA, offer varying degrees of performance boost over traditional architectures depending on whether or not you also require data scalability.

7. What are the key differences between HANA, Exadata, Teradata, Netezza, and Greenplum?

HANA is in-memory, and SAP has a history of doing a good job ingesting data from SAP and a poor history for ingesting data from non-SAP platforms.  These are the two key considerations.  If you are a SAP shop through and through, then HANA should be considered, otherwise, if your data ingest and analyisis needs run strongly to other platforms, then you should expand your search.

5. Does one solution handle unstructured better than another?

Yes, and this is a very critical differentiator of solutions. 

Most analytical platform vendors a) provide some text manipulation in the database; b) embrace Hadoop and/or the MapReduce component of Hadoop in some fashion or other, to aid in handling unstructured data.

In the fast-changing environment of unstructured Big Data, you should consider whether or not you’ll ultimately will be buying packaged software for analysis of unstructured data. Most, we believe, will have many commercial options within only a few short years.  If you agree with htis, then this suggests that for most, a platform that embraces Apache-compatible Hadoop is a wise investment, rather than trying to achieve the benefits of Hadoop in pieces by buying a MapReduce engine buried inside a relational database where long-term certification and compatibility may become a big issue.  The same holds true for buying a platform with a “work-alike” or a derivative of Apache Hadoop that will not remain fully compatible over time. 

Greenplum Unified Analytical Platform is designed to effectively integrate the capabilities of an MPP Analytical Database, with in-database analytics for speed on structured and semi-structured data, with a fully Apache-compatible Hadoop for hosting applications ingesting and analyzing large amounts of unstructured and semi-structured data.  Because we expect that the two environements will rapidly become intertwined in real-world use, UAP provides for very high-performance data movement and data access between them, and are rapidly deployable, delivered as complementary parts of a fully integrated appliance, as commercial software and as cloud-compatible components.

9. Are any of these solutions better for a true cloud solution including mobility from cloud to cloud?

SAP HANA is not suitable for cloud deployment; rather it is focused on appliance deployment on hardware configurations produced by a variety of vendors, including EMC in partnership with Cisco.  Among the other advanced analytical platforms, Oracle Exadata and IBM Netezza products are similarly proprietary and appliance-based therefore not really fit to the cloud environment.

EMC Greenplum products are cloud-capable, virtual machines for cloud deployment, as well as being available as no-cost community supported editions, commercial software for users to install on their own hardware and as preconfigured appliances.

All this said, users should consider cloud deployments carefully in any big-data solution.   The reasons are simple.  The implications of “Big Data” require vast amounts of I/O bandwidth, wherever they are deployed.  In the cloud, solutions offering vast bandwidth are less mature than solutions for other problems, so strong consideration must be given to the these implications prior to charging headlong into a cloud-based analytics deployment project. 

The Cloud will certainly play an increasing role in advanced and big data analytics, but currently it is not a dominant one, with the exception of a specialized vendors and problem sets where data is generated, stored, analyzed and deployed entirely within the cloud – an example of this would be a web or social media analytics utility company, of which there are many example.

10. Do these products address true data integration and archive retrieval?

Degree if integration and archival facilities vary greatly by product.  Each vendor of those you mention earlier has different solutions.

EMC Greenplum offer unique advantages that come from being part of EMC, a company very experienced in mission-critical data management and archival.  Greenplum itself also offers a variety of features suited to the challenges of large data archival and analytics, including polymorphic storage (rows and column mixed, with archival-level compression options), multiple architectures for assuring disaster recovery and extension of Greenplum Unified Analytics Platforms with EMC product Isilon’s scale-out NAS for raising the degree of enterprise hardening and uptime for both file-volume-based archival problems and Hadoop file system hardening.

Regarding data integration, Greenplum partners with leading companies in the ETL business to deliver a unique set of pre-packaged, easily installed Data Integration Acceleration modules as part of our appliance offerings.  These modules greatly simplify the installation and integration of leading vendors’ ETL solutions into overall analytical solutions.

10. How can EMC best become an in-memory appliance player no matter what the customer chooses?

EMC continues rich partnerships with companies because all analytical solutions require vast storage.   For SAP users, that is through partnership with SAP and Cisco to support HANA, for users seeking broader and more open solutions, EMC Greenplum partners with SAS to deliver SAS High Performance Analytics solutions to achieve the performance benefits of in-memory analytics.   It would not be unreasonable to expect EMC to develop yet additional partnerships with other vendors of in-memory databases and analytical platforms as we’ve done for other data management solutions in years past.

No Events found!

Top