Start a Conversation

Unsolved

R

1 Rookie

 • 

16 Posts

990

February 25th, 2021 21:00

Unity 650F Performance Concern

Unity 650F LUN latency sometimes ramps up to hundreds of milliseconds starting at 5:03 PM and ending around 6PM.  Happens at random, but seems to occur more often when we're moving a lot of data around, creating and deleting large LUNs.  Drive bandwidth becomes VERY high but CPU and front-end activity remains normal.

Latency example:

richardm112_0-1614313804718.png

 

Drive bandwidth:

richardm112_1-1614313887720.png

Note bizarre data access pattern at 7 to 8PM:

richardm112_1-1614315590927.png

 

LUN IOPS does not follow a similar pattern:

richardm112_2-1614313979659.png

 

LUN bandwidth actually drops because the system is performing so poorly:

richardm112_3-1614314065837.png

 

CPU remains normal:

richardm112_4-1614314128084.png

 

EMC\C4Core\log\c4_safe_ktrace.log contains lines similar to this during the period of high latency:

2021/02/25-22:03:38.244931    2     7FCAA3D3D704      std:INFO OBJ   11D  100C0 : [PSV]: func: fbe_extent_pool_balance_state_set_flag_update_and_persist: line: 406

2021/02/25-22:03:38.320994 525 7FCAA3D3D704 std:INFO OBJ 11D 100C0 : balance task send edge up event to mrg: raid_id:0, raid_extent_index:0
2021/02/25-22:03:38.321000 7 7FCAA3D3D704 std:INFO OBJ 11D 100C0 : fbe_extent_pool_update_heat_value_for_copy_and_shuffle entry
2021/02/25-22:03:38.321001 1 7FCAA3D3D704 std:INFO OBJ 11D 100C0 : update heat value for balance - rg 0x0
2021/02/25-22:03:38.321013 11 7FCAA3CDC705 std:INFO OBJ 11E 100C0 : MarkCopy: P-Req:Set lfl:0x0 pfl:0x0 state:0x100400000
2021/02/25-22:03:38.321018 5 7FCAA3D3D704 std:INFO OBJ 11D 100C0 : balance task send edge up event to mrg: raid_id:1, raid_extent_index:1
2021/02/25-22:03:38.321021 3 7FCAA3D3D704 std:INFO OBJ 11D 100C0 : fbe_extent_pool_update_heat_value_for_copy_and_shuffle entry
2021/02/25-22:03:38.321022 1 7FCAA3D3D704 std:INFO OBJ 11D 100C0 : update heat value for balance - rg 0x1
2021/02/25-22:03:38.321034 12 7FCAA3D3D704 std:INFO OBJ 11D 100C0 : balance task send edge up event to mrg: raid_id:2, raid_extent_index:4
2021/02/25-22:03:38.321035 1 7FCAA3C7B706 std:INFO OBJ 120 100C0 : MarkCopy: A-Req:Set lfl:0x0 pfl:0x0 state:0x100400000
2021/02/25-22:03:38.321037 2 7FCAA3D3D704 std:INFO OBJ 11D 100C0 : fbe_extent_pool_update_heat_value_for_copy_and_shuffle entry
2021/02/25-22:03:38.321038 1 7FCAA3D3D704 std:INFO OBJ 11D 100C0 : update heat value for balance - rg 0x2









...

2021/02/25-22:03:38.321190    3     7FCAA3D3D704      std:INFO OBJ   11D  100C0 : EXP-NOTIFICATION: obj 0x11d dpg 0 percent 0%  DATA_BALANCE_IN_PROGRESS

...

2021/02/25-22:04:36.091645    3     7F94F405570B     cbfs:CBFSA: FSREORG: 6: Evacuate Completed CB (cont): fsid: 1073742973, relocation-rate: 0 KB/sec (0 MB/sec),

2021/02/25-22:04:36.091646    1     7F94F405570B     cbfs:CBFSA: relocBlks-vs-totalBlks: 0%, status: OK, RBTs(0, 0 usec), IBOs: 0, Subjobs(0, 0 usec)

---

What is this Unity doing at 5:03PM that makes it so slow?

Moderator

 • 

8.7K Posts

February 26th, 2021 10:00

Hi Richardm112,

What version of the OS are you running? Are you seeing any other errors? Id Compression/data reduction enabled? Is replication in use? Is there anything else running during that time?

8 Posts

March 2nd, 2021 09:00

The "LUN IOPS, LUN Bandwidth, LUN IOPS" charts that are part of the Unity web interface do not include "backend" activity. For example, I have a Unity array that is solely a target of replication, and replication IO is considered backend. So those LUN charts are flatlined at zero. If you look in CloudIQ, there is a category of metrics, System Backend and Pool Backend. Check those out. It may help.

1 Rookie

 • 

16 Posts

March 5th, 2021 15:00

We upgraded from 5.0.3 to 5.0.6 and the problem still exists though it's now less severe.

Turning off compression/dedupe helps the performance overall but has no impact on these 5:03PM "events".  Backups don't start until 6PM.  At 5PM we're just running business-as-usual.

No Events found!

Top