Unsolved
1 Rookie
•
16 Posts
0
1519
February 25th, 2021 20:00
High latency during Unity raid-group rebalancing?
Unity 650F LUN latency sometimes ramps up to hundreds of milliseconds starting at 5:03 PM and ending around 6PM. Happens at random, but seems to occur more often when we're moving a lot of data around, creating and deleting large LUNs. Drive bandwidth becomes VERY high but CPU and front-end activity remains normal.
Latency example:
Drive bandwidth:
LUN IOPS does not follow a similar pattern:
LUN bandwidth actually drops because the system is performing so poorly:
CPU remains normal:
EMC\C4Core\log\c4_safe_ktrace.log contains lines similar to this during the period of high latency:
2021/02/25-22:03:38.244931 2 7FCAA3D3D704 std:INFO OBJ 11D 100C0 : [PSV]: func: fbe_extent_pool_balance_state_set_flag_update_and_persist: line: 406
2021/02/25-22:03:38.320994 525 7FCAA3D3D704 std:INFO OBJ 11D 100C0 : balance task send edge up event to mrg: raid_id:0, raid_extent_index:0
2021/02/25-22:03:38.321000 7 7FCAA3D3D704 std:INFO OBJ 11D 100C0 : fbe_extent_pool_update_heat_value_for_copy_and_shuffle entry
2021/02/25-22:03:38.321001 1 7FCAA3D3D704 std:INFO OBJ 11D 100C0 : update heat value for balance - rg 0x0
2021/02/25-22:03:38.321013 11 7FCAA3CDC705 std:INFO OBJ 11E 100C0 : MarkCopy: P-Req:Set lfl:0x0 pfl:0x0 state:0x100400000
2021/02/25-22:03:38.321018 5 7FCAA3D3D704 std:INFO OBJ 11D 100C0 : balance task send edge up event to mrg: raid_id:1, raid_extent_index:1
2021/02/25-22:03:38.321021 3 7FCAA3D3D704 std:INFO OBJ 11D 100C0 : fbe_extent_pool_update_heat_value_for_copy_and_shuffle entry
2021/02/25-22:03:38.321022 1 7FCAA3D3D704 std:INFO OBJ 11D 100C0 : update heat value for balance - rg 0x1
2021/02/25-22:03:38.321034 12 7FCAA3D3D704 std:INFO OBJ 11D 100C0 : balance task send edge up event to mrg: raid_id:2, raid_extent_index:4
2021/02/25-22:03:38.321035 1 7FCAA3C7B706 std:INFO OBJ 120 100C0 : MarkCopy: A-Req:Set lfl:0x0 pfl:0x0 state:0x100400000
2021/02/25-22:03:38.321037 2 7FCAA3D3D704 std:INFO OBJ 11D 100C0 : fbe_extent_pool_update_heat_value_for_copy_and_shuffle entry
2021/02/25-22:03:38.321038 1 7FCAA3D3D704 std:INFO OBJ 11D 100C0 : update heat value for balance - rg 0x2
...
2021/02/25-22:03:38.321190 3 7FCAA3D3D704 std:INFO OBJ 11D 100C0 : EXP-NOTIFICATION: obj 0x11d dpg 0 percent 0% DATA_BALANCE_IN_PROGRESS
...
2021/02/25-22:04:36.091645 3 7F94F405570B cbfs:CBFSA: FSREORG: 6: Evacuate Completed CB (cont): fsid: 1073742973, relocation-rate: 0 KB/sec (0 MB/sec),
2021/02/25-22:04:36.091646 1 7F94F405570B cbfs:CBFSA: relocBlks-vs-totalBlks: 0%, status: OK, RBTs(0, 0 usec), IBOs: 0, Subjobs(0, 0 usec)
---
What is this Unity doing at 5:03PM that makes it so slow?



richardm112
1 Rookie
•
16 Posts
0
February 25th, 2021 20:00
Drive bandwidth from 6PM to 7PM. Note the peculiar pattern. "Packs" of disks seem to be synchronizing or rebalancing. We do not see this access pattern under any other occasion. It only shows up during these 5PM "storms."
Rainer_EMC
4 Operator
•
8.6K Posts
0
March 11th, 2021 07:00
I would suggest to open a case with support
richardm112
1 Rookie
•
16 Posts
0
April 16th, 2021 11:00
We've opened tickets twice. The first time, the result was "it has to be something on your side generating this activity" when it's very clear that front-end activity was business-as-usual during the 'IOPS storm.'
The second time resulted in us receiving an "everything mostly looks fine" report from support that was a mishmash of incohesive graphs and observations and it was difficult to tell exactly what we were looking at. The case was submitted with attachments from internal logs (c4_safe_ktrace and C4Core\log) showing clearly a RAID group rebalance getting underway at the exact time of the IOPS storm. They ignored it.
I don't know how else to put the eyes on this.
DELL-Sam L
Moderator
•
7.7K Posts
0
April 16th, 2021 16:00
Hello richardm112,
If you still have the service request# can, you please send me them via private message & I can try to get them escalated & reviewed.
richardm112
1 Rookie
•
16 Posts
0
August 6th, 2021 09:00
Hi, Unity OS 5.1.0 seems to have fixed the problem. Just want to give a folllow up here in case anyone else has this issue.