Note: This topic is part of the Using Hadoop with OneFS - Isilon Info Hub.
This article describes how a Hadoop administrator can use Grafana for monitoring the resources in a Hadoop cluster.
Ambari Metrics is already configured to work with Grafana. This article builds on that model to make Isilon Statistics available to Grafana as well.
The only prerequisite to installation is a Centos VM with network access to the Isilon System Zone. Installation uses the following high level plan.
The Isilon admin can use the following commands to create an account for you.
isi auth roles create perfmon isi auth roles modify perfmon --add-priv-ro ISI_PRIV_LOGIN_PAPI isi auth roles modify perfmon --add-priv-ro ISI_PRIV_STATISTICS isi auth users create pmon --home-directory=/ifs/home/pmon --enabled=true --set-password
The --set-password flag allows inputing a password on the command line during user creation without it being visible or in the history. You will need this password later, so remember it.
isi auth users modify pmon --password-expires false isi auth roles modify perfmon --add-user pmon
isi auth roles view perfmon Name: perfmon Description: - Members: pmon Privileges ID: ISI_PRIV_LOGIN_PAPI Read Only: True ID: ISI_PRIV_STATISTICS Read Only: True
This user is the one to use in the setup of the Isilon Data Insights connector. You only need one instance of the connector, which updates the InfluxDB database. Other users only need Grafana logins, which you can set up separately.
Grafana 2.6 is installed by default when you deploy HDP 2.5. You need to replace that with a current Grafana. Perform this task on the node where the Ambari Metrics Collector is Installed.
The easiest way to do the install is to stop the Grafana service in Ambari and install the new Grafana RPM. We recommend first to check the local installation and back up the current configuration files and other files.
[root@nr-hdp-c3 ambari-metrics-grafana]# pwd /var/log/ambari-metrics-grafana [root@nr-hdp-c3 ambari-metrics-grafana]# tail grafana.log : default.paths.logs=/var/log/ambari-metrics-grafana Paths: home: /usr/lib/ambari-metrics-grafana data: /var/lib/ambari-metrics-grafana logs: /var/log/ambari-metrics-grafana
You might have to install a few other packages.
yum install initscripts fontconfig
Obtain the rpm.
wget https://s3-us-west-2.amazonaws.com/grafana-releases/release/grafana-4.3.2-1.x86_64.rpm yum localinstall grafana-4.2.0-1.x86_64.rpm /bin/systemctl daemon-reload /bin/systemctl enable grafana-server.service /bin/systemctl start grafana-server.service
Because you have a new version of Grafana , you also need new versions of the standard Dashboards that are compatible. See https://grafana.com/plugins/praj-ams-datasource.
Influxdb is a time series database and quite useful in its own right. This solution uses Influxdb to store the Isilon Statistics that are extracted by the Data Insights package. This installation consists of creating the repo file, installing the package, checking the config file, and starting it.
1. Create the repo and install.
[root@nr-hdp-c3]# cat > /etc/yum.repos.d/influxdb.repo [influxdb] name = InfluxDB Repository - RHEL \$releasever baseurl = https://repos.influxdata.com/rhel/\$releasever/\$basearch/stable enabled = 1 gpgcheck = 1 gpgkey = https://repos.influxdata.com/influxdb.key <ctrl-D> yum update yum install influxdb
2. Check that the default port is ok. Influxdb uses port 8086 by default. If that port is already in use, this command will list it.
netstat -a -n | grep 8086
If needed, choose another port and update the config file at /etc/influxdb/infuxdb.conf
3. To start Influxdb:
service influxdb start service influxdb status
1. Review and follow as closely as possible the installation instructions for the Data Insights package onGitHub here: https://github.com/Isilon/isilon_data_insights_connector
You will notice two new files for Hadoop.
Also, the example_isi_data_insights_d.cfg file is an updated version that includes statistics for the node-based network and disk statistics.
2. In the config file, in the clusters section, provide the RBAC user you created previously in the user name space. The following example provides the user pmon, which is the user we created in the RBAC example earlier.
3. Check the next section. If entries are missing, add them.
active_stat_groups: cluster_cpu_stats cluster_network_traffic_stats cluster_client_activity_stats cluster_health_stats ifs_space_stats ifs_rate_stats node_load_stats node_disk_stats node_net_stats cluster_disk_rate_stats cluster_proto_stats cache_stats heat_total_stats
4. Add the following two sections:
[node_disk_stats] update_interval: * stats: node.disk.bytes.out.rate.avg node.disk.bytes.in.rate.avg node.disk.busy.avg node.disk.xfers.out.rate.avg node.disk.xfers.in.rate.avg node.disk.xfer.size.out.avg node.disk.xfer.size.in.avg node.disk.access.latency.avg node.disk.access.slow.avg node.disk.iosched.queue.avg node.disk.iosched.latency.avg [node_net_stats] update_interval: * stats: node.net.int.bytes.in.rate node.net.int.bytes.out.rate node.net.ext.bytes.in.rate node.net.ext.bytes.out.rate node.net.int.errors.in.rate node.net.int.errors.out.rate node.net.ext.errors.in.rate node.net.ext.errors.out.rate
5. Run it:
./isi_data_insights_d.py start -c example_isi_data_insights_d.cfg
6. After startup is complete, you can customize Grafana. Follow the documentation above at Github and during the import, make sure to add the additional two .json files.
After you import the dashboards, they appear along with the other dashboards. There are four original dashboards that provide a very broad range of capabilities. These dashboards are targeted to the Isilon administrator. They report across the whole Isilon cluster and ALL protocols. They provide true data lake telemetry.
In the previous figure, the two dashboards that have (Isilon) after their names are custom dashboards.
The top part of this dashboard shows the network throughput (Read and Write) and the HDFS Throughput, which is a subset of the overall Isilon throughput. There are two filters: the Isilon cluster name, for customers lucky enough to have more than 1, and the node number. Not all statisitcs change when filtering by node number (that is,. capacity) but the ones that do change can be interesting, like Load and cache. There are also some links to the Isilon documentation on the right side.
This section shows the Open files and HDFS (only) connections per node. Below that is the OneFS filesytem throughput, which would be across *all* protocols. Also, the cluster capacity is across all Access Zones.
These two graphs show the HDFS Protocol broken out by class as a percentage of the whole, and by the atomic operations, as a number of Ops.
These are the L1/L2/L3 cache statistics for the cluster, which are very important to overall performance. L3 sits on SSD, so only nodes with SSDs can use this. The virtual nodes do not have this; so you see the value 0. The second graph is the metadata version, broken out by node. Use this graph to see how well balanced your workflow is.
This customized dashboard represents the behavior on Isilon that you might want to review about your Datanodes, such as disks and network.
This section shows spme disk specifics such as basic R/W and Latency. These values help you characterize disk level action.
Network traffic is important when evaluating performance. Isilon performance issues can often be caused by network issues.
Grafana dashboards can help with daily reviewing and monitoring of your Isilon cluster. The magic of Grafana brings it all together nicely.
Article ID: SLN319156Last Date Modified: 03/12/2020 06:46 PM