This post is more than 5 years old
10 Posts
0
275822
LUN showing plenty of free space but VMware showing low on space!
Hi,
I've got a very odd issue on a Loan SAN, I don't have the model number at hand at the moment but it's a PS6000 with NL-SAS disks.
When I look in VMware at DRLUN01 it appears that there is only 149 GB free. When I look at the same LUN on the SAN it says that it has 985 GB free!
The SAN is running the latest firmware and is connected to an ESXi 4.1.0 server via iSCSI. The LUN contains Veeam replicas of my production systems. When I mentioned this to Veeam they advised that they have seen this twice but only know that Dell resolved the issues.
As I'm not being offered any support, even though I'm likely to spend aout 20k with Dell and have Pro Support on my production SAN, I was wondering if anyone might have any ideas?
Thanks
Greg



Anonymous
274.2K Posts
2
03-08-2012 18:00 PM
If you have allocated (from ESX perspective) all your space then that's a different issue. ESX needs free space to work properly. Temp files, swapfiles, logs and for VMware snapshots. For example I typically create 600GB volumes (generally speaking) I leave at least 100GB free. If that gets near, I create a new datastore and move VMs or just create the new ones on the new datastore.
Re: ESX knows. Some SCSI/Disk basics. At the disk level of storage there are no "files" there are only blocks. Each block has an address, called an LBA, Logical Block Address. So every volume(disk) reports it's capacity to the controller.
The volume(disk) starts at LBA 0 and goes to LBA The ESX server keeps a table of the total blocks and when you do a write, it selects and LBA to use. That LBA (and data) is written to the volume(disk).
The array also knows how big each volume is, so it knows how many blocks there are. The ESX server sends the data with the LBA ESX choose to use, at the same time it updates the allocation table that's on that VMFS volume.. The array can then locate LBA (X) and write the data there.
So since you handed us (X)MB, we can deduct that from the total volume free space. Works great as long as you never delete anything. A delete from ESX is a WRITE to the VMFS allocation table. We record that write like any other. There's nothing to tell the array, that LBA (X) is no longer used, so return it to the member free space. That's where the new UNMAP command comes in. Now the server can send us a command that says unallocate(free) LBA (X). The in-use space will go down and free space on that member will go up.
re: Array reuse. So since it's the ESX server that determines the LBA to be used, it will re-write LBAs as needed.
Re: Free space reporting. No. The important gauge is what the OS says, not the array. Eventually when SCSI UNMAP is in effect those two will be much closer together. If a VMFS volume goes to 100% you are in trouble. ESX will no longer have a list of LBA to select from. Writes will fail and VMs will go down.
The array in use is an aid to make sure you are not allocating more space than you actually have.
I.e. If you had a 2TB array, you could create a thin provision volume MUCH larger than that. But until you actually write more than 2TB worth of data you're fine. If you do, then the array runs out of space and volumes go offline and all the VMs would stop. This gauge allows you to judge when to buy more storage.
So w/o UNMAP, eventually the EQL GUI will NOT reflect the correct free space for a volume.
Bottom line, follow what ESX (or any server) says about in-use and free space on a volume.
Better?
DELL-Joe S
7 Technologist
7 Technologist
•
729 Posts
0
03-08-2012 12:00 PM
For thin provisioned volumes, the volume reserve decreases to the amount of the in-use space, and the free volume reserve becomes unreserved space.
-joe
NGUK Support
10 Posts
0
03-08-2012 13:00 PM
Thanks Don. I think I've got it now:-)
As far as VMware is concerned the LUN is running low on space beacuse each VM disk has reserved space in the VMFS filesystem allocation table.
As far as the SAN is concerned all is well.
What's likely to happen from a VMware point of view? Will it just keep on writing data or will it stop when all of the 'available' space in the filesystem allocation table has been used?
The LUN contains my Veeam replicas so I'm keen for it to remain happy! I've had a look at a few of the thin LUN's on the production SAN and can see the same thing happening so the answer is handy for the loan and production SAN's!
Many thanks
Greg
NGUK Support
10 Posts
0
03-08-2012 13:00 PM
Hi Joe,
Sorry but I don't understand. Could you explain again pls?
My very limited knowledge says that I have configured a 1.98 TB thin LUN and that it's saying that only 1.01 TB are in use, ie the blocks have been touched. Surely it has the potential to grown to it's full 1.98 TB? I find it strange that the SAN and VMware don't report at least a similar amount of free space.
Thanks
Greg
Anonymous
274.2K Posts
1
03-08-2012 13:00 PM
With VMware you see this because when you create a VM, say with a 100GB disk, 100GB isn't written to the array. However, 100GB is deducted from the filesystem allocation table on the VMFS volume. The array can only report how much has been actually written to the volume. Eventually, it will go the other way. As data is written and deleted, the array in use space will be greater than what ESX reports. That's because when a file is deleted, that's a WRITE to the allocation table that only ESX can see. There is a new command for SCSI called UNMAP, that will correct this behavior. EQL FW version 6.0 and greater support this. ESXi v5.0 update 1 or better also does this but you have to do it manually, it won't automatically tell the array to release freed blocks. Other filesystems like Linux EXT4 and Windows server 2012 will also support this. One note: EQL v6.0 firmware only supports this with ESX on NON replicated volumes.
There is a long KB article on the equallogic website which describes this in more detail.
I've included it here.
Solution Title
ARRAY: GUI space usage differs from what the OS shows
Solution Details
Q: Why is there a difference between what my file system shows as space used and what the PS array GUI shows for in-use for the volume?
A: The PS array is block-storage, and only knows about areas of a volume that have ever been written. The PS Series GUI reports this information for each volume. Volume allocation grows automatically due to application data writes. If later the application frees up space, the space is not marked as unused in the PS Series GUI. Hence the difference in views between the OS/file system and the PS Series GUI.
With thin provisioned volumes, this perception can be more pronounced.
Thin provisioning is a storage virtualization and provisioning model that allows administrators to logically allocate large addressable storage space to a volume, yet not physically commit storage resources to this space until it is used by an application. For example, using thin provisioning you can create a volume that an application views as 3 TB, while only allocating 300 GB of physical storage to it. As the operating system writes to the volume, physical disk space is allocated to the volume by the storage array. This physical disk space is taken from the available free space in the pool automatically and transparently. As a result, less physical storage is needed over time, and the stranded storage problem is eliminated. The administrator enjoys management benefits similar to over-provisioning, yet maintains the operational efficiencies of improved physical storage utilization. This more efficient use of physical storage resources typically allows an organization to defer or reduce storage purchases.
So Thin provisioning is a forward planning tool for storage allocation in which all the storage an application will need is allocated upfront, eliminating the trauma of expanding available storage in systems that do not support online expansion. Because the administrator initially provisions the application with all the storage it will need, repeated data growth operations are avoided.
Most important, because of the difference between reality and perception, anyone involved with thin provisioned storage must be aware of the duality in play. If all players are not vigilant someone could start drawing on the un-provisioned storage – exceeding capacity, disrupting operations, or requiring additional unplanned capital investments.
A thin-provisioned volume also grows automatically due to application data writes – the space is drawn from the pool free space (rather than having been pre-allocated in a normal volume). If later the application frees up space, the space is free in the file system but is not returned to the free space in the PS Series pool. The only way to reduce the physical allocation in the SAN is to create a new volume, copy the application data from the old volume to the new, and then delete the old volume.
A similar problem is when the initiator OS reports significantly more space in use than the array does. This can be pronounced in systems like VMWare that create large, sparse files. In VMWare, if you create yourself a 10GB disk for a VM as a VMDK file, VMWare does not write 10GB of zeros to the file. It creates an empty (sparse) 10GB file, and subtracts 10GB from free space. The act of creating the empty file only touches a few MB of actual sectors on the disk. So VMWare says 10GB missing, but the array says, perhaps, only 2MB written to.
Since the minimum volume reserve for any volume is 10%, the filesystem has a long way to go before the MB-scale writes catch up with the minimum reservation of a volume. For instance, a customer with a 100GB volume might create 5 VMs with 10GB disks. That's 50GB used according to VMWare, but only perhaps 5 x 2MB (10MB) written to the array. Until the customer starts filling the VMDK files with actual data, the array won't know anything is there. If has no idea what VMFS is; it only knows what's been written to the volume.
• Example: A file share is thin-provisioned with 1 TB logical size. Data is placed into the volume so that the physical allocation grows to 500 GB. Files are deleted from the file system, reducing the reported file system in use to 100 GB. The remaining 400 GB of physical storage remains allocated to this volume in the SAN.
• This issue can also occur with maintenance operations including defragmentation, database re-organization, and other application operations.
In most environments, file systems do not dramatically reduce in size, so this issue occurs infrequently. Also some file systems will not make efficient re-use of previously allocated space, and may not reuse deleted space until it runs out of unused space (this is not an issue for NTFS, VMFS).
A related question is how can this previously used, but now not-in-use space be reclaimed in the SAN?
Today this would require the creation of a new volume and copying the volume data from old to new. This likely would require the application/users that use the file system to be offline (a planned maintenance window). In the future, file systems such as NTFS in Windows Server 2008 will allow online shrink in addition to the existing online grow operations. The result with this procedure would be the ability to reclaim free space would be done online (perhaps during non-peak times), where the file system is shrunk online, then re-grown online to its original size. This would reclaim the volume space, however there may be a delay in gaining the free space back to the pool free space if snapshots are present, as they would hold the released space until the snapshots age (and are automatically deleted).
In some cases the amount of space used on the array will show LESS than what is shown by the OS. For example, VMware ESX. When ESX 3.x creates a 20GB VMDK, it doesn't actually write 20GB of data to the volume. A small file is created, then ESX tells the file allocation table that 20GB has been allocated. Over time as data is actually written and deleted, this will swing back the other way. Where ESX says there's less space allocated than what the array GUI indicates.
Anonymous
274.2K Posts
0
03-08-2012 14:00 PM
It will work fine. ESX knows what blocks are used or free. The array will re-use a previously accessed pages automatically. So even if the EQL GUI says 100% used, ESX could say much less than that. So the EQL GUI isn't where you want to see how much free space your OS has. It's important if you have OVER provisioned your array group. That would be bad if you actually used all the available space. When the volumes attempt to allocate new pages, they will fail and go offline.
Again the future is much brighter as more OS's support the new SCSI command.
Does that help?
NGUK Support
10 Posts
0
03-08-2012 17:00 PM
Hi Don,
That does help but it also creates a few more questions! Sorry
-It will work fine.
OK, so ESX will be fine and even though Veeam is going a bit mental and saying that the LUN is low on space things should roll on nicely. I'll check that Veeam doesn't have any settings that stop replication when free space is below x.
-ESX knows what blocks are used or free.
I think I understand this. ESX has reserved space in the filesystem allocation table but it hasn't used the blocks? Is it just a kind of bug that it can't disaplay the free space accurately then?
-The array will re-use a previously accessed pages automatically.
For some reason I though the SAN wasn't intelignet enough to know what's in use and what isn't? My understanding was that if a block is touched it can't be released, at least not until v6 firmware?
-So even if the EQL GUI says 100% used, ESX could say much less than that. So the EQL GUI isn't where you want to see how much free space your OS has.
OK, are you saying that I should use which ever is reporting the most free space? In my situation it's ESX that's telling lies but after some time it will be the SAN that will tell me lies?
-It's important if you have OVER provisioned your array group. That would be bad if you actually used all the available space. When the volumes attempt to allocate new pages, they will fail and go offline.
OK
Sorry for the questions but for some reason I just can't quite get my head around it!
Cheers
Greg
NGUK Support
10 Posts
0
06-08-2012 16:00 PM
Hi Don,
Thanks for taking the time to explain that to me, I really do appreciate it:-)
All the best
Greg
Anonymous
274.2K Posts
0
06-08-2012 17:00 PM
You are very welcome!!
Glad to help out.
Regards,