Unsolved

1 Rookie

 • 

16 Posts

1519

February 25th, 2021 20:00

High latency during Unity raid-group rebalancing?

Unity 650F LUN latency sometimes ramps up to hundreds of milliseconds starting at 5:03 PM and ending around 6PM.  Happens at random, but seems to occur more often when we're moving a lot of data around, creating and deleting large LUNs.  Drive bandwidth becomes VERY high but CPU and front-end activity remains normal.

Latency example:

richardm112_0-1614313804718.png

 

Drive bandwidth:

richardm112_1-1614313887720.png

 

LUN IOPS does not follow a similar pattern:

richardm112_2-1614313979659.png

 

LUN bandwidth actually drops because the system is performing so poorly:

richardm112_3-1614314065837.png

 

CPU remains normal:

richardm112_4-1614314128084.png

 

EMC\C4Core\log\c4_safe_ktrace.log contains lines similar to this during the period of high latency:

2021/02/25-22:03:38.244931    2     7FCAA3D3D704      std:INFO OBJ   11D  100C0 : [PSV]: func: fbe_extent_pool_balance_state_set_flag_update_and_persist: line: 406

2021/02/25-22:03:38.320994 525 7FCAA3D3D704 std:INFO OBJ 11D 100C0 : balance task send edge up event to mrg: raid_id:0, raid_extent_index:0
2021/02/25-22:03:38.321000 7 7FCAA3D3D704 std:INFO OBJ 11D 100C0 : fbe_extent_pool_update_heat_value_for_copy_and_shuffle entry
2021/02/25-22:03:38.321001 1 7FCAA3D3D704 std:INFO OBJ 11D 100C0 : update heat value for balance - rg 0x0
2021/02/25-22:03:38.321013 11 7FCAA3CDC705 std:INFO OBJ 11E 100C0 : MarkCopy: P-Req:Set lfl:0x0 pfl:0x0 state:0x100400000
2021/02/25-22:03:38.321018 5 7FCAA3D3D704 std:INFO OBJ 11D 100C0 : balance task send edge up event to mrg: raid_id:1, raid_extent_index:1
2021/02/25-22:03:38.321021 3 7FCAA3D3D704 std:INFO OBJ 11D 100C0 : fbe_extent_pool_update_heat_value_for_copy_and_shuffle entry
2021/02/25-22:03:38.321022 1 7FCAA3D3D704 std:INFO OBJ 11D 100C0 : update heat value for balance - rg 0x1
2021/02/25-22:03:38.321034 12 7FCAA3D3D704 std:INFO OBJ 11D 100C0 : balance task send edge up event to mrg: raid_id:2, raid_extent_index:4
2021/02/25-22:03:38.321035 1 7FCAA3C7B706 std:INFO OBJ 120 100C0 : MarkCopy: A-Req:Set lfl:0x0 pfl:0x0 state:0x100400000
2021/02/25-22:03:38.321037 2 7FCAA3D3D704 std:INFO OBJ 11D 100C0 : fbe_extent_pool_update_heat_value_for_copy_and_shuffle entry
2021/02/25-22:03:38.321038 1 7FCAA3D3D704 std:INFO OBJ 11D 100C0 : update heat value for balance - rg 0x2









...

2021/02/25-22:03:38.321190    3     7FCAA3D3D704      std:INFO OBJ   11D  100C0 : EXP-NOTIFICATION: obj 0x11d dpg 0 percent 0%  DATA_BALANCE_IN_PROGRESS

...

2021/02/25-22:04:36.091645    3     7F94F405570B     cbfs:CBFSA: FSREORG: 6: Evacuate Completed CB (cont): fsid: 1073742973, relocation-rate: 0 KB/sec (0 MB/sec),

2021/02/25-22:04:36.091646    1     7F94F405570B     cbfs:CBFSA: relocBlks-vs-totalBlks: 0%, status: OK, RBTs(0, 0 usec), IBOs: 0, Subjobs(0, 0 usec)

---

What is this Unity doing at 5:03PM that makes it so slow?

1 Rookie

 • 

16 Posts

February 25th, 2021 20:00

Drive bandwidth from 6PM to 7PM.  Note the peculiar pattern.  "Packs" of disks seem to be synchronizing or rebalancing.  We do not see this access pattern under any other occasion.  It only shows up during these 5PM "storms."

richardm112_0-1614314722538.png

 

4 Operator

 • 

8.6K Posts

March 11th, 2021 07:00

I would suggest to open a case with support

1 Rookie

 • 

16 Posts

April 16th, 2021 11:00

We've opened tickets twice.  The first time, the result was "it has to be something on your side generating this activity" when it's very clear that front-end activity was business-as-usual during the 'IOPS storm.'

The second time resulted in us receiving an "everything mostly looks fine" report from support that was a mishmash of incohesive graphs and observations and it was difficult to tell exactly what we were looking at. The case was submitted with attachments from internal logs (c4_safe_ktrace and C4Core\log) showing clearly a RAID group rebalance getting underway at the exact time of the IOPS storm.  They ignored it.

I don't know how else to put the eyes on this.

Moderator

 • 

7.7K Posts

April 16th, 2021 16:00

Hello richardm112,

If you still have the service request# can, you please send me them via private message & I can try to get them escalated & reviewed.

1 Rookie

 • 

16 Posts

August 6th, 2021 09:00

Hi, Unity OS 5.1.0 seems to have fixed the problem.  Just want to give a folllow up here in case anyone else has this issue.

No Events found!

Top