Start a Conversation

Unsolved

This post is more than 5 years old

4751

September 22nd, 2015 09:00

EMC VNX alert regarding auto-tiering & relocation

Dear Friends,

Need some inputs on below error.

Time Stamp 09/22/15 09:00:01 Event Number 712d841a Severity Error Host A-IMAGE Storage Array CKM00122901163 SP N/A Device N/A Description Internal Information Only. Could not complete operation Relocate 0xB0007249C Cancelled CBFS status 0x26 because  0xe12d8426.

00000400 03002c00 d3040000 1a842de1 1a842de1 00000000 00000000 00000000 00000000 00000000 712d841a

I understand the cause and resolution (below) however I have some further questions at the end of the message.

Cause :

  1. 1.The messages/cancellations occur when the window for Auto Tiering ends and there are still relocations left in the queue
  2. 2.this cancellation occur While attempt to move (Relocating)the data from one tier to another tier and there are still relocations left in the queue .

This Is happen  as per the design

Resolution :

  1. 1.       Ignore the alert as its is not critical , It will relocate the data in next relocation .
  2. 2.       Expand the pool  so that the tier consumption will reduced to 90 % and data will start relocation without fail .
  3. 3.       Manually  move the  data  from extreme performance tear to capacity tier
  4. 4.       Increase your relocation duration time or schedule another relocation window for the remaining slices which were aborted due to the relocation window expiring.

Questions:

  1. 1. Ignore the alert as its is not critical , It will relocate the data in next relocation .--> how can we check what data did not relocate before and if it has been relocated in the next move or not ? Is it really wise to ignore this alert.

  1. 2. Expand the pool  so that the tier consumption will reduced to 90 % and data will start relocation without fail --> If pool subscription and allocation is under control and have a lot of free space, does this point still hold valid ?

  1. 3. Manually  move the  data  from extreme performance tear to capacity tier --> How can this be possible from GUI or CLI ? What is data needs to be moved up the tier ?

  1. 4. Increase your relocation duration time or schedule another relocation window for the remaining slices which were aborted due to the relocation window expiring. --> We have an 8 hour relocation window. How can we decide what time window to choose and if that is appropriate or not and whether to reduce or increase that window ?

  1. 5. what does the bold codes refer in alert ?

Could not complete operation Relocate 0xB0007249C Cancelled CBFS status 0x26 because  0xe12d8426. ?



  1. 6. Can we explicitly only disable this alert from notification template without disabling the other alerts?



Thanks,

Yuvik

4.5K Posts

September 22nd, 2015 11:00

Questions:

  1. 1. Ignore the alert as its is not critical , It will relocate the data in next relocation .--> how can we check what data did not relocate before and if it has been relocated in the next move or not ? Is it really wise to ignore this alert.
    • Ans: No - the data is either a 235KB or 8KB slice within the pool. There are some logs that show this, but they rollover pretty quickly.  You can ignore this. If you're seeing a lot of these, then you should look at expanding the Pool or letting Auto-tiering run longer. One note, if you're using Deduplication (on the VNX2) you can also get these messages when Deduplication is running (this is fixed in the latest Flare release - 33.119)

  1. 2. Expand the pool  so that the tier consumption will reduced to 90 % and data will start relocation without fail --> If pool subscription and allocation is under control and have a lot of free space, does this point still hold valid ?
    • Ans: As long as you have at least 10% (for VNX) or 5% (for VNX2) you should be OK. Auto-tiering is moving a lot of slices of data around between tiers and within tiers and the slices are either owned by SPA or SPB. At any one point in time the number of free, available, slices on each SP may be different - you might have less available on SPA for example and slices would be more apt to fail. This is also affected by the Default and Current SP owner of each LUN. Make sure the the Default SP owner for a LUN matches the Allocated SP owner (right click on LUN and select Properties to see the ownership for the LUN). The same applies for LUNs that are Trespassed. All trespassed LUN must be returned to the correct Default SP owner. The correct ownership for LUN is: Current = Default = Allocated. If this is not maintained, you can see both performance issues and slices running out of free space.

  1. 3. Manually  move the  data  from extreme performance tear to capacity tier --> How can this be possible from GUI or CLI ? What is data needs to be moved up the tier ?
    • Ans: You can set the Tiering Policy from High:Auto to Low:Same - that will move the slices down to the NL-SAS tier and keep them there. If you know for certain that specific LUNs will always have low performance requirements (latency is NOT an issue), then you can move the data for those LUNs to the NL-SAS tier, freeing up the SAS and SSD tiers.

  1. 4. Increase your relocation duration time or schedule another relocation window for the remaining slices which were aborted due to the relocation window expiring. --> We have an 8 hour relocation window. How can we decide what time window to choose and if that is appropriate or not and whether to reduce or increase that window ?
    • Ans: Look at the Pool Properties/Tiering tab - check to see if all the data to be relocated each night is actually finishing each night. If it isn't, then increase the time.

  1. 5. what does the bold codes refer in alert ?

Could not complete operation Relocate 0xB0007249C Cancelled CBFS status 0x26 because  0xe12d8426. ?

          Ans: These are the status codes - the latest one, 0xe12d8426, tells us the cause - using this code to search in the Knowledgebase on suppport.emc.com will return the KB that explains with event - KB 91463 and KB 168806.         


  1. 6. Can we explicitly only disable this alert from notification template without disabling the other alerts?
    • No


glen

13 Posts

September 23rd, 2015 04:00

Hi Glen,

Thank you for the answers.

Regards,

Yuvik

157 Posts

January 26th, 2016 06:00

Glen - is there any way to tell by the error codes which pool is trying to relocate but can't? There doesn't seem to be anything in the SP logs which say the pool ID.

Is it safe to assume that 5% is the bare minimum for relocation to work and if a pool is 97% full it most likely can't do it?

thanks

4.5K Posts

January 26th, 2016 08:00

When relocation has finished, there is a messages in the logs "Relocation Completeted" that shows what was done during the relocation window. Look at the Remain entry - this will show that some relocations failed. Below is an example of the message - you can see the Pool ID in the message

A      01/06/16 06:45:01 PEService        71660400 Relocation completed for Storage Pool 1. Slices to relocate = 6160, Completed = 2002, Remain = 4158, Failed = 0.

A      01/06/16 06:45:01 MLU              712d841a Could not complete operation Relocate 0xB0002115E Cancelled CBFS status 0x26 because 0xe12d8426.

A      01/06/16 06:45:02 MLU              712d841a Could not complete operation Relocate 0xB00021161 Cancelled CBFS status 0x26 because 0xe12d8426.

For VNX1 you need a minimum of 10% and for VNX2 you need a minimum of 5% free in the Pool. See KB 15782 and 78223 for more information on this.

glen

No Events found!

Top