ScaleIO Ready Node: Repeated rebuilds on ScaleIO Ready Node cluster

Summary: ScaleIO Dell Ready Node cluster can trigger repeated rebuilds when DAS Caches are configured incorrectly.

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms




When DAS s Cache/SSD space is full, DAS Cache starts flushing the data in the SSD to the hard drive. The way it does that is by sending a lot of IOs to a small part of the drive, thus minimizing the seek and maximizing the throughput on the hard drive. If there are other IOs to the same hard drive, e.g. some large Reads that bypassed the cache, are issued to a different location in the drive, the RAID Controller and the drive will priorities the IOs with the small seek to get the max throughput and sometimes, will cause the other IOs to have high latency.

Cause

The combination of IOPs from both DAS cache and ScaleIO caused such a long Queue on the disk that results in slow IOs that causes rebuild and sometimes if the I/O is too long SDS failure.

Resolution

Apply the below configuration settings, step by step needed to enable DAS Cache (per ScaleIO-ready Dell node): 

1. Enter the relevant SDS in Maintenance Mode.

2. Change DAS cache configuration:

a. Set DAS cache parameters:

fscli --set-param AggressiveCachePopulation=0
fscli --set-param BypassLengthKB=128
fscli --set-param RcMaxLengthKB=32
fscli --set-param LowSpaceBypassKb=0

b. Modify DAS cache configuration file ("/etc/fio/config"):

FlusherCmdsNormalToBeStarted =  1 
FlusherMaxCmdsToBeStarted =  2 

c. Reset node to reload DAS cache driver to apply settings (only needed for step 'b') 2. Change server RAID writes cache settings to write through (effective immediately):

/opt/MegaRAID/perccli/perccli64 /c0/vall set wrcache=wt

3. Modify ScaleIO performance parameters as follows (management only - effective immediately):

scli --set_performance_parameters --sdc_max_inflight_requests 200 --all_sdc --tech
scli --set_performance_parameters --sdc_max_inflight_data 20 --all_sdc --tech

4. Exit the relevant SDS from Maintenance Mode. We recommend applying the above settings to only one SDS at the start, checking everything is working properly for a few days before proceeding to the next SDS, and so on.  

Affected Products

ScaleIO Ready Node-PowerEdge 13G

Products

ScaleIO Ready Node-PowerEdge 13G
Article Properties
Article Number: 000051944
Article Type: Solution
Last Modified: 28 Nov 2024
Version:  3
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.