Highlighted
pnkmtt
1 Copper

HortonWorks, Hive, Kafka, and Isilon

All,

We have a customer that is struggling with hourly hive workloads on an Isilon cluster that is doing constant Kafka landings via NFS.  The team has done the analysis and have asked the customer to move temp/shuffle space back to the Hadoop instances, however, I was looking for any guidance on specific hive performance recommendations ( data layout, tunes, etc.).  We have already recommended and are in the process of implementing the guidance within the EMC ISILON BEST PRACTICES GUIDE FOR HADOOP DATA STORAGE document after analysis of their workloads throughout the cluster.  If anyone has a use case example or any other insight into how we could help this customer I would greatly appreciate it. 

Thanks,

Matt Panik

Cloud Specialist/Big Data/Core SE

0 Kudos
1 Reply
Boni Bruno
1 Copper

Re: HortonWorks, Hive, Kafka, and Isilon

Keeping shuffle/temp space on local storage is definitely a must.  Another is enabling TEZ with Hive, TPCDS performance results are comparable to DAS as shown here:

  https://community.emc.com/people/bonibruno/blog/2018/01/22

Running Kafka over NFS with Isilon has also shown to perform well, see PDF below:

http://www.emc.com/collateral/white-papers/hp17440-running-kafka-with-isilon-onefs.pdf

Just make sure you select the right Isilon node type and have enough nodes to meet your performance requirements.

 

 

 

 

0 Kudos