Start a Conversation

Unsolved

This post is more than 5 years old

A

2216

February 10th, 2017 20:00

IO errors during rebalance

Hello.

I was evaluating ScaleIO for some time. It was quite stable, but I decided to try how reliable it really is, and was performing some cluster modifications to see how it will perform.

So, I added additional SDS (had 3, added 1). Fine

Added new SSD disk and include it into SSD pool.

Immediately after that action I started to get different IO errors on XFS filesystem and finally I have got "xfs log i/o error detected. shutting down filesystem". The server with SDC was nearly idle, so it was probably doing 1-5 IOPS.

I decided to save my data by unplugging the new disk. I can't say that this helped, because the filesystem was already shut down. I unmounted, checked and remounted filesystem and everything ended ok.

What was that? Could ScaleIO read data from uninitialized disk and provide it as valid? Do it has any checksums to verify the data? Is data corruption possible not due to multiply hardware failure?

73 Posts

February 11th, 2017 09:00

Hi agerasimov,

The most likely reason for this issue was the setup of the new SDS, not the disk. When a new SDS is added, but no disks are added, then the blocks don't need to rebalance and the clients don't need to know about any changes. As soon as the new disk is added, as you saw, a rebalance of the data blocks begins. If the SDC cannot reach the new SDS, and the blocks that it now holds, that would explain the behavior you saw. There was most likely nothing wrong with the disk, just the connectivity between the new SDS and the SDC.

Are you using IP roles? Was MTU set properly on the new SDS (if not default)? Check the connectivity between the SDC and this new SDS. Verify that all is in order there. That would be the most likely answer.

The other, but less likely answer, would be the SDC could not talk to the MDMs, which holds a map of where the blocks now live. When the rebalance kicked in, the MDM tried to update the SDC but if it cannot talk to the SDC, then it cannot tell the SDC where the blocks now reside. Assuming nothing has changed on the MDM and SDC side, this is less likely. But still something to verify.

Thanks,

Rick

February 12th, 2017 19:00

It definitely was some kind of network problem. But this problem is not due to network failure, rather ScaleIO configuration.

Before adding new SDS I had 2 networks. One 1Gb for all traffic, including VMs, SDS, SDCs and so on, another one was 10Gb just for ScaleIO. It was just a 3-node 10Gb network (SDC was on the same physical server as one of SDSs, but as a VM).

I added new SDS with just one ip for the first network (1Gb). It is simple /24 network. Every ScaleIO node was accessible at this network. There were no link problems. There were no errors during addition through GUI.

I noticed that new ScaleIO SDS node was trying to establish connection from primary network to secondary (10Gb) network, although they do not have any link. I have found out that if I add ip for an additional network, every single SDS should be available at this network. So when I removed additional IPs for other SDSs and left just one network, everything worked just fine. After IP removal I have tried to add once again the same IPs as before, but got an error.

Could you please explain how ScaleIO validates IP when I add it in GUI manager?

I am curious how this behavior could handle network failures. If, for example, I have one link down from 10Gb network and I have it configured to work alongside with primary 1Gb network, it could provoke IO problems like I had? What is the recommendations for redundancy in this case? How subnets should be configured to make it fault tolerant?

p.s. Also, after changing one of MDM servers and changing the list of MDM servers at SDC side, I got this:

[root@dbsrv2 ~]# /opt/emc/scaleio/sdc/bin/drv_cfg --query_mdms

Retrieved 2 mdm(s)

MDM-ID 5e4a91d66cd4d9a0 SDC ID 8ddb92f300000003 INSTALLATION ID 6fe89bd5694f603f IPs [0]-10.0.1.72

MDM-ID 0000000000000000 SDC ID 0000000000000000 INSTALLATION ID 0000000000000000 IPs [0]-10.0.1.75

Why doesn't it has correct info? The MDM is inside the cluster. Status: normal, tb present. I can ping it from SDC.

73 Posts

February 13th, 2017 07:00

I'm glad you got that figured out.

That said, as long as the SDSes can talk to each other and the MDM on one of the networks, then it will work. That would also be how a failure will be handled. If you have 2 networks and 1 goes down (by switch or something else that takes the whole network), the SDS process handles both load balancing and fault tolerance on the network side. You will lose network capacity and throughput of course, but the SDSes will still continue communicating and serving up IO. It is recommended to have at least 2 data networks, of the same speed and capacity, ie. both 10 Gb, not a 10 Gb and a 1 Gb. If so, you are really only running 2x1Gb networks, as it will be as fast as the slowest component there as the 10 Gb network will be constantly waiting for the 1 Gb network to catch. It will work, but not ideal.

If you can show the error you get when you were adding the SDS, that will help to explain. It is typically just a communication error. If so, then there is something going on with the communication.

On your --query_mdm output, that needs to be fixed. It thinks the 10.0.1.75 MDM is from a completely different cluster. Try running this:

/bin/emc/scaleio/drv_cfg --mod_mdm_ip --ip 10.0.1.72 --new_mdm_ip 10.0.1.72,10.0.1.75 --file /bin/emc/scaleio/drv_cfg.txt

Note that there is no space in between the comma and the 2nd IP address.

Rick

March 1st, 2017 13:00

Ok. I have followed your recommendations. After modifying mdm addresses it seems like it have got corrected. But I noticed that SDC was thinking that 2 clusters existed (one with 2 ip addresses .72 and .75 and another one .75). After SDC reboot SDC have forgotten about every mdm ips, so I had to manually reenter mdm configuration.

How do I save mdm configuration in SDC? (Centos 7)

And about networks.

For example I have 3 database servers and some app servers. App servers does not require fast disk access, just redundancy. So it seems to me that the most logic way is to have 2 switches of 1Gb + some 10Gb port (to connect database servers and respective SDS to 10Gb ports) and other servers at 1Gb (with 2 different scaleio pools). In this particular case everything will work at 1Gb only?

Is there any way to establish 2 networks.. one active (10Gb) and one standby (1Gb) just for 10Gb network maintenance or failure? 

73 Posts

March 2nd, 2017 06:00

On the first question, if you leave off the end parameter (--file /bin/emc/scaleio/drv_cfg.txt), it will only be a temporary change. That file must be modified to keep the changes across reboots. You can also manually edit the file and either restart the driver or reboot, and that will accomplish the same thing. It reads the drv_cfg.txt file upon a reload of the driver.

If the SDS<-->SDS communication is across the 10Gb ports and the Database servers are also on 10 Gb ports, then they will be able to take full advantage of the speed/bandwidth offered. While the App servers, if they are only connected to 1 Gb ports, then they will only be able to operate at that 1 Gb speed. This assumes that the SDCs are separated from the SDSs, and you are not running in a converged (SDS+SDC) setup.

Another alternative you might consider is to have all the SDCs on 10 Gb switch ports, but just use a QoS setting on the volume/SDC itself. You can do this quickest in the GUI (Frontend -> Volumes -> Right-click volume -> Set Volume Limits). That way you are limiting the bandwidth your app servers take, but you will have more room to grow if needed and it won't slow down anyone else.

Hope that helps.

No Events found!

Top