Start a Conversation

Unsolved

This post is more than 5 years old

C

685192

July 15th, 2015 15:00

SanHQ 3.1 false alerts about volume usage

Just upgraded to SanHQ 3.1.0 and I have a 2 issues:

 

1) Is it normal it reports reports 3.0.9.something and says it’s a beta?

 

2) After the update, I have a bunch of active alerts about space usage on my luns which weren’t there before (arrays are a mix of 7.0.9, 7.0.10 and 7.1.2 and 8.0.3).

 

My luns are NOT thin provisioned on the EQL side and there is no snapshot reserve.

 

When I connect to the group, I see the volume space is showing 100% (which is expected)

 

In VMware, I see there is free space. I expected the array to report 100% all the time after I filled up enough of the datastore and I was ok with that as I could see the real data in VMware. The problem is SANHQ now shows all my arrays with a red icon because it’s reporting false alerts about space usage.

 

Is this a known issue? (I didn’t notice anything in the release notes for SanHQ)

 

Do I have to install VSM to give the array visibility into VMware to report usage correctly and make SanHQ happy?

 

Is there any way to make SanHQ 3.1 go back to reporting the way 3.0.2 did?

Thanks

4 Operator

 • 

1.9K Posts

July 15th, 2015 15:00

Ah ok.

How to deal with different kind of volume types or the need for different thresholds?

Regards,

Joerg

5 Practitioner

 • 

274.2K Posts

July 15th, 2015 15:00

Hello,

Re:Beta.  It should have said "EPA" not "beta"   That will be corrected in the GA release version.

Re: Alerts.  It's mentioned in the Release notes.  3.1.0 adds more alerts about in-use space, latency, more granular retransmission warnings, etc...    You can disable those alerts.   It's under Settings->E-mail Settings->Notifications.  You will need to disable them in Caution, Warning and Critical tabs.  Since the same alerts will have different thresholds under each category.

Regards,

Don

4 Operator

 • 

1.9K Posts

July 15th, 2015 15:00

Its an "EPA" which means Early Production Access and the GA Version will have the correct version number.

About the alarms you dont have to disable them. You need to adjust the values to match your needs!

Regards,

Joerg

5 Practitioner

 • 

274.2K Posts

July 15th, 2015 15:00

hi Joerg,

It does actually say "beta" when you look at the version it was a mistake.   2nd since he's using thick provisioned volumes it's easier to just turn them off since they're already at 100%, you'd have to set every space threshold to 100%.

Don

5 Practitioner

 • 

274.2K Posts

July 16th, 2015 00:00

There isn't a way to set different levels for thick vs. thin provisioned volumes.   So the in-use alert is the same for example.

Don

72 Posts

July 16th, 2015 08:00

Hi Don,

As always, thanks for your help!

I've noticed a couple other things:

My pool (alert id 2.2) shows 15% free (which is expected) but I also see an alert id 3.32 saying my member has 0.2% free space remaining. What does that mean? I only have 1 array in my group so pool free space (it's using the default pool) should be member free space, no? Or is it because my pool is using 100% of my array so that error is a false alarm in my situation and should be disabled? (or is everything misconfigured on my EQL?)

I also noticed on the member, in the summary when you click on it, i see the capacity summary showing compression as 1 incompatible and the 3 things below it as N/A. I've never noticed that before but I assume it's nothing to worry about?

Thanks

5 Practitioner

 • 

274.2K Posts

July 16th, 2015 10:00

Re: Compression.  That's only available on PS6210 and 6610 controllers.  It requires additional H/W not on older controllers, so it's nothing to be worried about.

Re: Free space.  There's how much has been allocated and how much has actually been written to.

So I don't think that's a false alarm.   Are you using snapshots or can you run reclaim on some volumes to get back from free space?

Or get rid of a volume you don't need anymore.

Don

72 Posts

July 16th, 2015 10:00

I'm sorry but I'm still not understanding what's going on with the free space. There are no snapshots on the EQL side and only 1 volume is thin provisioned (750GB of which only half is used). From the EQL side, it's incorrectly reporting the free space left on the luns but that's ok because VMware sees the correct value and i'm not thin provisioned on the EQL side so no danger there (except that one small lun)

Below are three screen shots from group manager and one from sanhq. What am I missing or not understanding? The only thing I can think of is sanhq is complaining the default pool is using all the space on the array even though there is free space in the pool.

Thanks

5 Practitioner

 • 

274.2K Posts

July 16th, 2015 11:00

Hello,

Are you asking why the in-use space show in EQL GUI doesn't match  ESXi view of in-use space?

If so, that's to be expected.  SAN's are block storage, not aware of what filesystems are being used.  When a file is deleted, by default there's no notification of that given to the storage device.  There is a SCSI command to accomplish that but it's the responsibility of the host OS to send that command with what blocks are being removed.  It's known as SCSI UNMAP, aka SCSI RECLAIM.  ESXi with VMFS v5 does not support that on the fly.  You have to manually run a command on the ESXi console to attempt to reclaim some of that free space.    On EQL this requires 6.0.7 firmware or greater, on volumes that are NOT being replicated by any EQL capability.  It also had to be formatted with VMFS v5.  A Datastore upgraded from v3 to v5 does not support SCSI UNMAP.  

Don

72 Posts

July 16th, 2015 11:00

Hi Don,


I understand the SCSI unmap and as I said, I'm not thin provisioned on the EQL so I don't care what the array thinks is free or not (and I adjusted the alerts per your recommendation to stop those alerts in the new sanhq)

My problem is the following two alerts I still see. One of them makes sense (the 15% one) but I don't understand where this 0.2% free is coming from. This is a single member group with just the default pool defined

July 16th, 2015 12:00

I'm also seeing strange reporting from SanHQ 3.1.0. For example, free space as reported by the group on one particular volume (thin provisioned, no snapshot space used or reserved) is 60%, but SanHQ is reporting 96% used and throwing an alarm.  I've got several other volumes showing discrepancies between the group and SanHQ as well, and I've had to disable email alerting because it's become essentially useless, alerting on conditions that don't even exist.

sanHQ: 

ps group:

5 Practitioner

 • 

274.2K Posts

July 16th, 2015 12:00

Hello,

I suggest opening a support case so they can review the logs.

Don

72 Posts

July 16th, 2015 13:00

I opened a case and during testing, it looks like there may be a couple bugs. Per my screen shots, i have 15.2% free and it triggers a caution alert. As a test we changed the caution threshold to 16 and the alert was still there. We changed the threshold to 15 and the alert was cleared.

We also noticed that the thresholds seem to keep reverting to defaults. I make the change, go back to the servers and groups and wait for the next polling cycle and the alert clears. If I go back to check the thresholds, they have reverted to their defaults but as long as I don't change anything, I'm ok. The minute I change one, it seems like everything goes back to defaults because that's what is showing in the interface.

There's no confirmation of any bug just yet, but the ticket has been submitted (and for good measure, what may be a vvol bug with incorrect reporting as well)

5 Practitioner

 • 

274.2K Posts

July 16th, 2015 14:00

Thank you for the update!

D

July 17th, 2015 06:00

Just adding that I've been experiencing this same issue after upgrading to the EPA release.  The false alerts continued for days even after I disabled the particular alert ID across all severity levels.  Restarts of services and the SAN HQ server itself did not help.  I was prepared to call in to support the other morning when i noticed the false alerts suddenly stopped even though I hadn't touched anything in days.  Still not sure what caused the bug to correct itself, but, hoping it sticks.

No Events found!

Top