lmpit

9 Posts

6331

June 9th, 2017 07:00

T310 PERC S100 RAID-5 showing wrong capacity/size after HDD replacement

Hello.

Today I replaced a failed HDD, in a RAID-5 array, with a new HDD.

As far as I can tell, everything is working fine. The new HDD has been assigned to the array and it is in the process of being rebuilt while the system is online and working seemingly fine. Yet, there is something that is troubling me: OMSA is reporting the virtual disk size as 8,187.84GB (8TB) instead of the previous and actual 930.48GB. This is crazy, the array is composed of 3x 500GB HDDs and I can't understand why this is happening... can this be a corruption of OMSA itself or is the RAID array at risk? At the moment the rebuild process is about 50% in, but I doubt that the outcome will change, and it is worrying me.

Windows is reporting the correct disk capacity though. Not sure what to make of it anyway. I hope someone can give me some suggestions as to why OMSA is reporting this ludicrous disk capacity which has no correlation whatsoever with the hardware.

Thank you!

Responses(15)

DELL-Daniel My

Moderator

•

6.2K Posts

0

June 9th, 2017 13:00

Hello

The error and status reporting features of the S-series controllers is very limited. If any changes are made to the virtual disk it may not be reported correctly in OMSA. Restarting OMSA or the server usually refreshes the information.

I would suggest restarting the OMSA services once the rebuild completes. If the issue persists then I would restart the server during your next maintenance window. If you filter your services by description there should be up to four services that start with DSM SA. Those are the OMSA services.

Thanks

lmpit

9 Posts

0

June 9th, 2017 17:00

Thanks for the reply.

Unfortunately that didn't solve the issue. I've restarted the services, rebooted the server, uninstalled OMSA, rebooted and then reinstalled OMSA (v8.5) again, but it still reports ~8TB instead of ~1TB. By the time it finished rebuilding the array I had left the premises so I can't boot into the controller config now and make sure that the reported space there is correct . At least in Windows it is still correct and everything seems fine, but I'm not sure if I can rely on this array like this, yet I am trying to avoid having to backup the whole system, delete/recreate the array and restore - it would take me a whole day.

I suspect that part of the problem is because I'm using a different drive, which seems to work correctly but must have some peculiar difference from the others to have this as a result. I'm attaching a few screenshots with more details. But at this point it would be fair to assume that I have no choice but delete the entire thing and create it from scratch, right?

Alerts from the log

List of disks in array (0:0 replaced)

Array state and capacity (before/after)

theflash1932

7 Technologist

•

16.3K Posts

0

June 10th, 2017 06:00

I would guess this to just be a communication glitch between OMSA and the AWFUL S100 controller. Restart OMSA, restart the server, update server hardware/drivers, update OMSA would be my recommended path to fix the displayed information. I do not think this is an issue any greater than that.

theflash1932

7 Technologist

•

16.3K Posts

0

June 10th, 2017 06:00

Hmm ... for some reason I didn't see your reply.

What make/model is the disk you put in?

lmpit

9 Posts

0

June 10th, 2017 08:00

First of all, thanks for chiming in, I appreciate the help.

I'm not sure why you didn't see my previous reply, but it can be because of the way moderation works in this forum. Any post, or update to a post, is subject to moderation before being visible. Not sure if it's related to my account being very recent (in fact, it was, I just made this post and it was not subject to moderation).

Regarding your first reply, and as I mentioned previously, I already restarted/uninstalled/reinstalled OMSA, restarted the server - no chance. Currently installed OMSA is the latest version (v8.5) found in the download section for this server's model. As for drivers, PERC S100 is using latest v2.0.0.162, but I'm not sure if something else should be updated. Now, what is possibly the problem is the replacement HDD which might not be completely supported by the RAID controller. I've checked Nautilus release notes and the model of the replacement HDD in mentioned nowhere so I have little faith that there is any official support/firmware for the new disk despite it being more that appropriate for the purpose. If you check my previous reply I have a few screenshots with the OMSA information table, to see them full resolution right-click and open in a new tab because I made the mistake of ticking the "lightbox" option when inserting the images which makes them display in a barely visible resolution.

Here is the same screenshot:

Basically I replaced

0:0 WDC WD5003ABYX-18WERA0 r01.01S03

with

0:0 WDC WD5003ABYZ-011FA0 r01.01S03

I'm not very experienced with RAIDs in general, and I know that using a disk that is not 100% certified is a rookie mistake, but Dell also pushed me a bit too much. The server is currently out of warranty but 2 years ago, almost to the day, Dell replaced disk 0:0 which is exactly the same disk that failed this week. When I asked for a quote from Dell they presented me with a 250$ HDD, and when I asked if it was new or refurbished they replied: "might be new, might be refurbished, there is no way of knowing for sure". And so I decided to buy a brand new WD RE4 of the same family of the existing ones and hope for the best. Lets' say that I'm 50% sure that I did the right or wrong thing, depending how you look at it.

Now, I'm not exactly sure what I'm looking at here. If the problem is just the numbers that OMSA shows me, I can live with that. But I worry that the problem may be more deeply rooted and cause me serious trouble down the line. As far as OMSA is concerned, the array is in perfect condition.

theflash1932

7 Technologist

•

16.3K Posts

0

June 10th, 2017 10:00

Looks like I had opened it a while ago and never got around to responding and saw that tab first :)

A couple of things:

WD5003ABYX looks like a WD RE4, which was validated on the PERC's BY Western Digital. I could be wrong, but I'm not aware of Dell ever certifying models they would have shipped for the PERC controllers specifically.

WD5003ABYZ looks like a WD RE (newer version than the 4), which was NOT validated on OEM hardware at all by WD.

Without testing by either Dell or WD, these become hit/miss on hardware of all types where the device manufacturer did not validate them on their systems (Intel, Synology, maybe some RAID controller mfg's, etc.).

I asked if it was new or refurbished they replied: "might be new, might be refurbished, there is no way of knowing for sure"

Front-line agents don't usually know where they come from, because they NEVER interact with dispatched hardware or parts in general. Because they come in bulk packaging from the various manufacturers, Dell cannot sell them as "new". In some cases, parts are pulled from overstocked systems, but no drive they sell or send as a replacement was ever in extended use in anyone else's system.

I know that using a disk that is not 100% certified is a rookie mistake, but Dell also pushed me a bit too much

Don't fall into the trap of boycotting Dell parts after they make you unhappy, either with untrained/unskilled support or too-high prices. You have a Dell server. Do yourself a favor and outfit it with the proper hardware. You don't have to buy Dell parts directly FROM Dell! Buy Dell parts from suppliers and resellers - they are MUCH cheaper and are the same validated and certified parts you would buy from Dell. Xbyte and ServerSupply are two good places to start with. Many people say "I'm not spending $350 on a 2TB drive from Dell!" Then turn around and buy a desktop or NAS drive for $150 and have lots of issues with it, either immediately or down the road, when they could have bought a certified drive from a supplier for $170 and been problem free.

The S100 is an extremely low-end RAID "solution" and is based on a modified version of Intel's chipset RAID. I would always recommend you use the H-series controllers, no matter the function of the server. Next best option would be to use Windows Disk Management to manage a mirrored setup - mutch better than the S-series controllers. Updating the "firmware" (reliability, performance, and function) of the controller is done through the BIOS updates. I would suggest making sure the system firmware (BIOS, iDRAC/LCC, etc.) is up to date.

lmpit

9 Posts

0

June 10th, 2017 11:00

WD5003ABYZ looks like a WD RE (newer version than the 4), which was NOT validated on OEM hardware at all by WD.

Without testing by either Dell or WD, these become hit/miss on hardware of all types where the device manufacturer did not validate them on their systems (Intel, Synology, maybe some RAID controller mfg's, etc.).

You're right, and as I bought this new disk I knew there was a risk of it either not working at all or giving me issues, but the risk isn't catastrophic... yet.

Front-line agents don't usually know where they come from, because they NEVER interact with dispatched hardware or parts in general. Because they come in bulk packaging from the various manufacturers, Dell cannot sell them as "new". In some cases, parts are pulled from overstocked systems, but no drive they sell or send as a replacement was ever in extended use in anyone else's system.

Don't fall into the trap of boycotting Dell parts after they make you unhappy, either with untrained/unskilled support or too-high prices.

Sure, the price isn't appealing, but this server was my first experience with Dell, which I inherited from whomever decided to buy it. And it was not really about boycotting either, the main reason why I didn't buy from Dell was not based on price, but on a perceived unreliability, even though I admit the small window of experience with Dell. From my perspective, a brand new server which has 1 of it's 3 disks fail after 2 years and then Dell's replacement also fails another 2 years later. This made me reluctant to buy Dell's as a first choice - forgetting the irony that the other 2 disks in the array are still alive and kicking.

The S100 is an extremely low-end RAID "solution" and is based on a modified version of Intel's chipset RAID. I would always recommend you use the H-series controllers, no matter the function of the server. Next best option would be to use Windows Disk Management to manage a mirrored setup - mutch better than the S-series controllers. Updating the "firmware" (reliability, performance, and function) of the controller is done through the BIOS updates. I would suggest making sure the system firmware (BIOS, iDRAC/LCC, etc.) is up to date.

Oh, I've realized how low-end that controller is, mainly from other forum threads where people have a multitude of issues with it, ranging from performance to stability. Also, thanks for the tip about using Windows Disk Management, I may end up doing just that at a later time. By the way, do you know if there is any other software beside OMSA that can read the S100 array from within Windows?

Just for kicks, on Monday I'll boot into the RAID controller config and see if it tells me that I have a virtual disk of 8TB or if it's just a problem specific to OMSA, in which case I might ignore it for the time being.

lmpit

9 Posts

0

June 10th, 2017 12:00

Curiously, I've tried running Dell System E-Support Tool (DSET) and then look at the log created. Surprisingly it shows the correct virtual disk capacity. Not too sure what to make of it but it is a bit reassuring nonetheless.

theflash1932

7 Technologist

•

16.3K Posts

0

June 10th, 2017 12:00

any other software beside OMSA that can read the S100 array from within Windows

"Probably" not. I would say no, but there is a small chance that Intel's software can connect and read it, but because the S100 is a rebrand of Intel's controller logic, I suspect it will not be able to talk to it natively.

I think the best options are:

Replace the disk with an RE4.
Updating the BIOS.

Although I think it also safe to ignore it - above best options are if you want to fix it.

(Firmware/driver updates should be kept up to date as a best practice, and given the age of the system, if updates have not been kept current, the firmware could be pretty old and there is a better than average chance that bringing it current will help.)

lmpit

9 Posts

0

June 10th, 2017 17:00

Thanks for all the advice, I really appreciate it.

I think that I already had everything updated in this server, both firmwares and drivers, with the exception of the 2 original HDDs which are at rev.02 and the most current is rev.03.

Since my previous reply, I lost a few hours tinkering back and forth with OMSA and found that this disk capacity thing only becomes an issue after v8.1. In all versions following v8.1 the reported capacity is always enormously exaggerated, whereas previous versions report the correct value. I had an instance with v8.4 where the disk capacity was reported as 7,234,142,328.78GB :)

With the exception of v8.5, whenever I restarted the Data Manager service I got a different disk capacity:
56,098,816.00GB
55,574,528.00GB
39,059,456.00GB
57,671,680.00GB
57,147,392.00GB
52,166,656.00GB
18,874,368.00GB

OMSA v8.5 was the only one that consistently gave the same, yet incorrect, capacity: 8,187.55GB

I was able to track down the origin of the issue, at least partially: the DLL for OMSA's Dell Storage Management (Dell\SysMgt\sm\dellvl\dsm_sm_swrvil.dll). I tried using the DLL from OMSA v8.1 with OMSA v8.5 and the disk capacity was then correctly reported. I also noticed that the physical disk's page lost the "Sector Size" column from its info table when using the old DLL. I'm not sure what else might be different but it's obviously not a good idea to switch the new DLL for the old one just because of a few numbers. Whatever may be, this gives me a little confidence that the problem may just be some programming fluke from OMSA and nothing really worrying about the array itself.

In retrospective, and after checking a backup from last week, just a few hours before I swapped the failed HDD I had been using OMSA v7.4.0.2, which contains Storage Management DLL v4.4.1. I had then updated OMSA to v8.5 on top of v7.4.0.2 but never rebooted until the actual drive swap; what I mean is that it's not entirely implausible that the inconsistency in disk capacity reported might not be directly linked to the new HDD but could just be a coincidence. Why it happens anyway in one version but not the other is still a mystery to me. I don't think it makes sense asking Dell about it through official channels since I'm using non-certified hardware.

That was long enough...

For now I think that I'll leave it be and hope for the best.

Thanks again for all your replies.

DELL-Daniel My

Moderator

•

6.2K Posts

0

June 12th, 2017 14:00

I tried to reproduce this issue and was unable. I did not have a system in the lab configured with an S100 to use so I had to use an S300. The issue may not be the same across both controllers though. I did not encounter the issue with OMSA 8.5, S300, and 2008R2.

What OS are you running and what is the system BIOS? The S100 is a chipset on the system board. It is updated with system BIOS updates.

Thanks

lmpit

9 Posts

0

June 12th, 2017 16:00

Hi.

It is possible that the issue is also partially caused by the new HDD model that I swapped, which is not OEM approved (as pointed out by theflash1932). That and the fact that so far I didn't find any post that resembles the issue that I'm having. But it doesn't fully explain why earlier versions of OMSA report the correct capacity and later versions don't; if I hadn't updated OMSA I probably would have never encountered this "issue". That aside, the RAID seems to be working perfectly. I really wonder what was changed in that DLL that makes it behave this way on my system - regardless of the current configuration.

Anyway, here's the system info you asked:

BIOS Information
Manufacturer: Dell Inc.
Version: 1.12.0
Release Date: 09/06/2013

Firmware/Driver Information for Controller PERC S100
Driver Version: 2.0.0-0162
Storport Driver Version: 6.1.7601.18386

Operating System Information
Operating System: Microsoft Windows Server 2008 R2, Standard x64 Edition
Operating System Version: Version 6.1 (Build 7601 : Service Pack 1) (x64) Server Full Installation

Thank you for your help!

DELL-Daniel My

Moderator

•

6.2K Posts

0

June 13th, 2017 08:00

Thank you for the information!

I'm going to get some hardware changed around in an R310 in the lab to try to reproduce this issue. Unfortunately, we do not have any tower systems in our lab at my site. If I'm unable to reproduce the issue then the non-Dell drive may be the cause.

Thanks

lmpit

9 Posts

0

June 13th, 2017 08:00

If I'm unable to reproduce the issue then the non-Dell drive may be the cause.

Thanks for looking into it..

Yes, the hardware was always a probable factor but as I've implied before, the cause of this can't rest solely on the hardware, because the hardware works as it should as part of the array and the issue is raised by (later versions of) the monitoring software (OMSA). And unless there is a good reason that justifies the kind of changes in that dynamic link library, between v8.1 and v8.2, that are behind the disk capacity calculation discrepancy, no definitive conclusion can be drawn.

The same monitoring software knows how many sectors and how much capacity each disk of the array has, how much of it is allocated to the array and it's virtual disk, that and a trove of other data that would allow it to calculate a correct and exact number. I find it puzzling that it can mathematically come up with random disk capacities when it has all the facts. Assuming the hardware is part of the problem, it would be expected that the readings would be consistent (which they are in v8.5 but not in v8.2) and so how can the monitoring software query the same hardware and report these different results across OMSA services restarts:

56,098,816.00GB
55,574,528.00GB
39,059,456.00GB
57,671,680.00GB
57,147,392.00GB
52,166,656.00GB
18,874,368.00GB
7,234,142,328.78GB

Thanks.

DELL-Daniel My

Moderator

•

6.2K Posts

0

June 29th, 2017 18:00

I was finally able to get a system running to try to reproduce this issue. I did not test with a non-certified drive or an unsupported OS(Server 2016. I tested an S100 on an R310 with 2008R2 and OMSA 8.5, everything is reported correctly.

The issue you are experiencing appears to be due to compatibility with unsupported hardware or software.

Thanks

View All

No Events found!

PowerEdge HDD/SCSI/RAID

T310 PERC S100 RAID-5 showing wrong capacity/size after HDD replacement