Ambari Metrics and Alerts with Dell EMC PowerScale


Ambari Metrics and Alerts with Dell EMC PowerScale




Note: This topic is part of the Using Hadoop with OneFS - PowerScale Info Hub.

Ambari 2.0 introduced Ambari Metrics and Alerts that provides a centralized management of health alerts and checks for all the services running in the cluster. This information is accessible on the Ambari web interface home page.


Beginning with OneFS 8.0.1.0, PowerScale includes Host Metrics and Alerts to enhance the user experience and ease the task of administering Ambari with OneFS.


Architecture

The Ambari Metrics System (AMS) consists of the following components:

  • The Metrics Monitors run on each host in the Hadoop Cluster, collecting system-level metrics and monitoring the host health status.

  • The Metrics Collector runs on a specific host in the Hadoop cluster. Its job is to store and aggregate metrics received from the registered monitors.


A designated node in OneFS is tasked with the functionality of the Metrics Monitor, collecting and pushing data to the Metrics Collector on one minute intervals. OneFS itself is a cluster, but to Ambari it appears as a single host running the HDFS service. Due to this mismatch, all of the metrics and alert data provided to Ambari are cluster-wide. For example, on a three node PowerScale cluster, the network HDFS traffic reported to Ambari is the total across all nodes.


Prerequisites

  • OneFS 8.0.1.0 or higher.
  • Ambari 2.0 or higher.
  • Ambari Metrics deployed and running (green) on the Ambari Dashboard.


Enabling Ambari Metrics and Alerts

To start using this feature, you need to indicate the host where the Ambari Metrics Collector is installed. To determine the host, navigate to the Ambari UI -> Ambari Metrics service -> Metrics Collector. This action takes you to the host running the service.

Now that you know the hostname running the Metrics Collector, log into OneFS and run the following command:

isi hdfs settings modify --zone=ZONE --ambari-metrics-collector=HOSTNAME

ZONE corresponds to the access zone designed for this configuration, and HOSTNAME corresponds to the host running the Metrics Collector. Assuming that the access zone is System, the command is:

 isi hdfs settings modify --zone=System --ambari-metrics-collector=vnode2605.west.isilon.com

Alternatively, you can make these changes using the OneFS UI. To do so, navigate to the Protocols tab -> Hadoop (HDFS) -> Select the correct Access Zone -> update the Ambari Metrics Collector field. Then save the changes.



See it in action

The Ambari web interface home page is a dashboard showing the status of the cluster. It contains a set of widgets that provide cluster-wide metrics. If OneFS was correctly configured, all the widgets on the home page (and in the HDFS subsection) become active.


Implementation Details

Ambari expects metric data for many Java-specific concepts, such as the Namenode Heap, the number of times the Garbage Collector has run, etc. OneFS does not run Java. Because of this, some special considerations are implemented to send meaningful data to Ambari. The following tables summarize the metric data that do not have an exact map between OneFS and Ambari. Remember, all data points sent by OneFS are cluster-wide.

Ambari Dashboard

Widget Data reported by OneFS
HDFS Disk Usage
  • Represents the usable storage capacity.
DataNodes Live
  • Always one. The reason is that OneFS represents a single Datanode.
Memory Usage *
  • Swap: Always zero. OneFS does not have swap space.
  • Share: Always zero. This value is not available in OneFS.
  • Cache: Always zero. This value is not available in OneFS.
Cluster Load *
  • Procs: The number of PowerScale Nodes.
  • CPUs: Number of cores in the cluster.
  • Nodes: The number of cores in the cluster. (It is sending the same data twice, mimicking the Ambari implementation.)
  • 1-min: The average workload across all OneFS nodes during the last minute.
NameNode Heap
  • Used: The sum of the current memory allocated by the HDFS process (cluster-wide).
  • Total: The sum of the memory limit that the HDFS process can allocate.
NameNode RPC
  • Due to a bug in ambari, this value is only available in the HDFS service tab.
NameNode CPU WIO
  • Always zero. This data is unavailable in OneFS.
NameNode Uptime
  • The OneFS node that has been up the longest time.


* The data shown in this widget is the average of all hosts in the Hadoop Cluster, OneFS being one of them.


HDFS Home Page

All of the default widgets in the HDFS service page are now alive. Here is how it looks on a semi-idle cluster. To find this page, navigate to the Ambari UI -> select the HDFS section.

Summary section

Widget Data reported by OneFS
DataNodes Always one. The reason is that OneFS represents a single Datanode.
Block, Blocks(Total), Block Errors: Always zero, there are no blocks in OneFS.
Total Files + Directories: Always zero. The file system does not maintain the number of files.


Metrics section

Widget Data reported by OneFS
NameNode GC count *

GC count of type major collection, GC total count:: Always zero. OneFS does not run Java.

NameNode GC time GC time in major collection: Always zero. OneFS does not run Java.
NameNode Heap

JVM heap used: The sum of the current memory allocated by the HDFS process.

JVM heap committed: The sum of the memory limit that the HDFS process can allocate.

Failed disk volumes Always zero. Bad drives are reported on the OneFS user interface.
Corrupted Blocks

Always zero. OneFS maintains its own data protection mechanism.

Under Replicated Blocks Always zero. There are no blocks in OneFS.
HDFS Space Utilization Represents the usable storage capacity in OneFS.


* The data shown in this widget is the average of all hosts in the Hadoop Cluster, OneFS being one of them.


NameNode Page

To find this page, navigate to the Ambari UI -> HDFS -> NameNode.


Widget Data reported by OneFS
Cores (CPU), Memory Release 8.0.1.0 does not report this value, but it will in future releases.


Heatmaps

OneFS 8.0.1.0 does not implement Heatmaps, but it will in future releases.


Ambari Alerts

Support for a subset of the Alert Definition was added into the release. To find them, go to the Ambari User Interface -> Alerts tab -> Filter by HDFS.

Some important things to remember are:

  • Only colored alerts (either green, yellow or red) are supported by OneFS.
  • Custom alerts created for HDFS are not recognized.
  • Supported alerts can be enabled or disabled at will.


Tuning alerts

You can tune the Check Interval and the Thresholds to satisfy different workloads. The following example shows how this works using the NameNode Host CPU Utilization Alert.

Due to a bug in Ambari, the default thresholds for this alert are 200% and 250% for warning and critical, while the value ranges between 0-100%. You can fix this by setting the threshold to something more reasonable; for example, 80 and 90% respectively. Also, suppose you want to receive updates for this alert more often. You could adjust the Check Interval to 1 minute.

You are finished. You now have real time data about the CPU utilization in OneFS. The figure below shows what happens when the critical level is reached.





Article ID: SLN319182

Last Date Modified: 07/08/2020 07:06 PM

Rate this article

Accurate
Useful
Easy to understand
Was this article helpful?
0/3000 characters
Please provide ratings (1-5 stars).
Please provide ratings (1-5 stars).
Please provide ratings (1-5 stars).
Please select whether the article was helpful or not.
Comments cannot contain these special characters: <>()\
characters left.