I have moved to your plugin without issue. Finished late in the day yesterday re-configuring. Came in today and it is looking good. We have had a few core switch issues recently and I had lost stats curing those periods of time when my MDM failed to another. It will be nice that the stats just keep coming in. This is a great enhancement to the monitor.
thanks a lot for your feedback. Now that your are quering the gateway you'll get the data also if the MDM role switches, that was one of the limit that I identified in SwissCom implementation.
I'm planning to release a new version of the plugin in order to gave the possibiity to monitor SDS network latencies and disk latencies. I'm thinking about how to implement this: probably I will add a configuration parameter to declare the list of SDS and disks to be monitored. In my small ScaleIO infrastructure (3 nodes) I can monitor all of them, in bigger deployments the graph can be unreadable so I want to give to the user the possibility to choose the components to be monitored.
I will update this forum as soon as I will release a newer version. If you have suggestions you're welcome!
Hi Davide, we found the python / collectd implantation using the ScaleIO GW and we are in the process to test it out.
We are using version ScaleIO 2.0.13000.211
We can interactively log in to the GW using the browser and credentials, that works correctly
Here is the collectd.conf file
Debug true # default: false
Verbose true # default: false
Gateway "##.##.##.##:443" # ScaleIO Gateway IP Address and listening port. (Mandatory)
Cluster "##" # Cluster name will be reported as the collectd hostname, default: myCluster
Pools "###" # list of pools to be reported (Mandatory)
MDMUser "#####" # ScaleIO MDM user for getting metrics (Mandatory)
MDMPassword "####" # Password of the ScaleIO MDM user (Mandatory)
All looks good, but unfortunately getting errors on the colelctd status.
# systemctl status collectd.service
● collectd.service - Collectd statistics daemon
Loaded: loaded (/usr/lib/systemd/system/collectd.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2018-03-02 16:19:10 PST; 7s ago
Main PID: 10070 (collectd)
Mar 02 16:19:10 collectd: plugin_load: plugin "network" successfully loaded.
Mar 02 16:19:10 collectd: plugin_load: plugin "python" successfully loaded.
Mar 02 16:19:10 collectd: Systemd detected, trying to signal readyness.
Mar 02 16:19:10 collectd: ScaleIO: init callback
Mar 02 16:19:10 collectd: [2018-03-02 16:19:11] Systemd detected, trying to signal readyness.
Mar 02 16:19:10 collectd: [2018-03-02 16:19:11] ScaleIO: init callback
Mar 02 16:19:10 collectd: Initialization complete, entering read-loop.
Mar 02 16:19:10 collectd: [2018-03-02 16:19:11] Initialization complete, entering read-loop.
Mar 02 16:19:10 collectd: ScaleIO: Error establishing connection to the ScaleIO Gateway. Check your collectd module configuration. Exiting.
Mar 02 16:19:10 collectd: [2018-03-02 16:19:11] ScaleIO: Error establishing connection to the ScaleIO Gateway. Check your collectd module configuration. Exiting.
Please could you help us to troubleshoot the issue?
I think it's the wild west right now when it comes to monitoring scaleio. Even with the ready nodes and AMS, you really don't have historical visibility.
I ended up having to modify the original collectd python script to work with telegraf (we use a telegraf/influx/grafana stack). It works great but I always run into the question about how often to poll. So far i left it at 30 seconds but I can easily go down to every 5. I guess it really depends on how many SDSs you have configured.
I'll do a write up and share my script once it's cleaned up.
I haven't had a chance to upload the script. I just need to put it on on github and see if others have suggestions on how to improve it.
Forgot to share a link to my blog that shows the ready node.
It's missing the script like I mentioned before but I'll try to get it done soon.