Unsolved
1 Rookie
•
16 Posts
0
990
Unity 650F Performance Concern
Unity 650F LUN latency sometimes ramps up to hundreds of milliseconds starting at 5:03 PM and ending around 6PM. Happens at random, but seems to occur more often when we're moving a lot of data around, creating and deleting large LUNs. Drive bandwidth becomes VERY high but CPU and front-end activity remains normal.
Latency example:
Drive bandwidth:
Note bizarre data access pattern at 7 to 8PM:
LUN IOPS does not follow a similar pattern:
LUN bandwidth actually drops because the system is performing so poorly:
CPU remains normal:
EMC\C4Core\log\c4_safe_ktrace.log contains lines similar to this during the period of high latency:
2021/02/25-22:03:38.244931 2 7FCAA3D3D704 std:INFO OBJ 11D 100C0 : [PSV]: func: fbe_extent_pool_balance_state_set_flag_update_and_persist: line: 406
2021/02/25-22:03:38.320994 525 7FCAA3D3D704 std:INFO OBJ 11D 100C0 : balance task send edge up event to mrg: raid_id:0, raid_extent_index:0
2021/02/25-22:03:38.321000 7 7FCAA3D3D704 std:INFO OBJ 11D 100C0 : fbe_extent_pool_update_heat_value_for_copy_and_shuffle entry
2021/02/25-22:03:38.321001 1 7FCAA3D3D704 std:INFO OBJ 11D 100C0 : update heat value for balance - rg 0x0
2021/02/25-22:03:38.321013 11 7FCAA3CDC705 std:INFO OBJ 11E 100C0 : MarkCopy: P-Req:Set lfl:0x0 pfl:0x0 state:0x100400000
2021/02/25-22:03:38.321018 5 7FCAA3D3D704 std:INFO OBJ 11D 100C0 : balance task send edge up event to mrg: raid_id:1, raid_extent_index:1
2021/02/25-22:03:38.321021 3 7FCAA3D3D704 std:INFO OBJ 11D 100C0 : fbe_extent_pool_update_heat_value_for_copy_and_shuffle entry
2021/02/25-22:03:38.321022 1 7FCAA3D3D704 std:INFO OBJ 11D 100C0 : update heat value for balance - rg 0x1
2021/02/25-22:03:38.321034 12 7FCAA3D3D704 std:INFO OBJ 11D 100C0 : balance task send edge up event to mrg: raid_id:2, raid_extent_index:4
2021/02/25-22:03:38.321035 1 7FCAA3C7B706 std:INFO OBJ 120 100C0 : MarkCopy: A-Req:Set lfl:0x0 pfl:0x0 state:0x100400000
2021/02/25-22:03:38.321037 2 7FCAA3D3D704 std:INFO OBJ 11D 100C0 : fbe_extent_pool_update_heat_value_for_copy_and_shuffle entry
2021/02/25-22:03:38.321038 1 7FCAA3D3D704 std:INFO OBJ 11D 100C0 : update heat value for balance - rg 0x2
...
2021/02/25-22:03:38.321190 3 7FCAA3D3D704 std:INFO OBJ 11D 100C0 : EXP-NOTIFICATION: obj 0x11d dpg 0 percent 0% DATA_BALANCE_IN_PROGRESS
...
2021/02/25-22:04:36.091645 3 7F94F405570B cbfs:CBFSA: FSREORG: 6: Evacuate Completed CB (cont): fsid: 1073742973, relocation-rate: 0 KB/sec (0 MB/sec),
2021/02/25-22:04:36.091646 1 7F94F405570B cbfs:CBFSA: relocBlks-vs-totalBlks: 0%, status: OK, RBTs(0, 0 usec), IBOs: 0, Subjobs(0, 0 usec)
---
What is this Unity doing at 5:03PM that makes it so slow?
DELL-Josh Cr
Moderator
Moderator
•
8.7K Posts
0
February 26th, 2021 10:00
Hi Richardm112,
What version of the OS are you running? Are you seeing any other errors? Id Compression/data reduction enabled? Is replication in use? Is there anything else running during that time?
Syifer
8 Posts
0
March 2nd, 2021 09:00
The "LUN IOPS, LUN Bandwidth, LUN IOPS" charts that are part of the Unity web interface do not include "backend" activity. For example, I have a Unity array that is solely a target of replication, and replication IO is considered backend. So those LUN charts are flatlined at zero. If you look in CloudIQ, there is a category of metrics, System Backend and Pool Backend. Check those out. It may help.
richardm112
1 Rookie
1 Rookie
•
16 Posts
0
March 5th, 2021 15:00
We upgraded from 5.0.3 to 5.0.6 and the problem still exists though it's now less severe.
Turning off compression/dedupe helps the performance overall but has no impact on these 5:03PM "events". Backups don't start until 6PM. At 5PM we're just running business-as-usual.