carterbury1
1 Nickel

Re: ScaleIO Historical Performance Monitoring

Davide,

I have moved to your plugin without issue.  Finished late in the day yesterday re-configuring.  Came in today and it is looking good.  We have had a few core switch issues recently and I had lost stats curing those periods of time when my MDM failed to another.  It will be nice that the stats just keep coming in.  This is a great enhancement to the monitor.

Chad

0 Kudos
c0redump
2 Iron

Re: ScaleIO Historical Performance Monitoring

Hi Chad,

thanks a lot for your feedback. Now that your are quering the gateway you'll get the data also if the MDM role switches, that was one of the limit that I identified in SwissCom implementation.

I'm planning to release a new version of the plugin in order to gave the possibiity to monitor SDS network latencies and disk latencies. I'm thinking about how to implement this: probably I will add a configuration parameter to declare the list of SDS and disks to be monitored. In my small ScaleIO infrastructure (3 nodes) I can monitor all of them, in bigger deployments the graph can be unreadable so I want to give to the user the possibility to choose the components to be monitored.

I will update this forum as soon as I will release a newer version. If you have suggestions you're welcome!

Thanks again,

Davide

0 Kudos
carterbury1
1 Nickel

Re: ScaleIO Historical Performance Monitoring

That would be a great enhancement. I look forward to seeing it and appreciate your time.

0 Kudos
Highlighted
SignatureIT
1 Nickel

Re: ScaleIO Historical Performance Monitoring

Hi Davide, we found the python / collectd  implantation using the ScaleIO GW and we are in the process to test it out.


We are using version ScaleIO 2.0.13000.211


We can interactively log in to the GW using the browser and credentials, that works correctly


Here is the collectd.conf file


<Plugin python>

    ModulePath "/usr/share/collectd/python"

    Import scaleio

    <Module scaleio>

        Debug true                        # default: false

        Verbose true                      # default: false

        Gateway "##.##.##.##:443"         # ScaleIO Gateway IP Address and listening port. (Mandatory)

        Cluster "##"  # Cluster name will be reported as the collectd hostname, default: myCluster

        Pools "###"             # list of pools to be reported (Mandatory)

        MDMUser "#####"                   # ScaleIO MDM user for getting metrics (Mandatory)

        MDMPassword "####"            # Password of the ScaleIO MDM user (Mandatory)

    </Module>

</Plugin>

All looks good, but unfortunately getting errors on the colelctd status.


# systemctl status collectd.service

● collectd.service - Collectd statistics daemon

   Loaded: loaded (/usr/lib/systemd/system/collectd.service; enabled; vendor preset: disabled)

   Active: active (running) since Fri 2018-03-02 16:19:10 PST; 7s ago

     Docs: man:collectd(1)

           man:collectd.conf(5)

Main PID: 10070 (collectd)

   CGroup: /system.slice/collectd.service

           └─10070 /usr/sbin/collectd

Mar 02 16:19:10 collectd[10070]: plugin_load: plugin "network" successfully loaded.

Mar 02 16:19:10 collectd[10070]: plugin_load: plugin "python" successfully loaded.

Mar 02 16:19:10 collectd[10070]: Systemd detected, trying to signal readyness.

Mar 02 16:19:10 collectd[10070]: ScaleIO: init callback

Mar 02 16:19:10 collectd[10070]: [2018-03-02 16:19:11] Systemd detected, trying to signal readyness.

Mar 02 16:19:10 collectd[10070]: [2018-03-02 16:19:11] ScaleIO: init callback

Mar 02 16:19:10 collectd[10070]: Initialization complete, entering read-loop.

Mar 02 16:19:10 collectd[10070]: [2018-03-02 16:19:11] Initialization complete, entering read-loop.

Mar 02 16:19:10 collectd[10070]: ScaleIO: Error establishing connection to the ScaleIO Gateway. Check your collectd module configuration. Exiting.

Mar 02 16:19:10 collectd[10070]: [2018-03-02 16:19:11] ScaleIO: Error establishing connection to the ScaleIO Gateway. Check your collectd module configuration. Exiting.

Please could you help us to troubleshoot the issue?


Thank you


Saul

0 Kudos
vdp4life
1 Nickel

Re: ScaleIO Historical Performance Monitoring

I think it's the wild west right now when it comes to monitoring scaleio. Even with the ready nodes and AMS, you really don't have historical visibility.

I ended up having to modify the original collectd python script to work with telegraf (we use a telegraf/influx/grafana stack). It works great but I always run into the question about how often to poll. So far i left it at 30 seconds but I can easily go down to every 5. I guess it really depends on how many SDSs you have configured.

I'll do a write up and share my script once it's cleaned up.

scaleio-metrics.png

0 Kudos
charank25
1 Nickel

Re: ScaleIO Historical Performance Monitoring

Hey,

Did you get a chance to complete the write-up?

0 Kudos
vdp4life
1 Nickel

Re: ScaleIO Historical Performance Monitoring

I haven't had a chance to upload the script. I just need to put it on on github and see if others have suggestions on how to improve it. 

0 Kudos
vdp4life
1 Nickel

Re: ScaleIO Historical Performance Monitoring

Forgot to share a link to my blog that shows the ready node.

https://community.emc.com/blogs/a_c_l/2018/05/07/fun-with-scaleio-readynodes

It's missing the script like I mentioned before but I'll try to get it done soon.

0 Kudos