Because storage that scales out means the potential for many nodes, it would be time consuming to maintain and upgrade them individually. Scale IO is hypervisor independent. So it can't rely on a specific hypervisor to facilitate upgrades. It takes care of its own co-ordination and clustering. Furthermore scale IO can deploy over 1000 nodes in the same storage system. So making maintenance and upgrades, seamless and automated is vitally important. We need a way to coordinate the upgrade and maintenance procedure across the system and it needs to be simple, centralized and automated. Furthermore, it has to be non disruptive.
It shouldn't interrupt IO and it shouldn't interrupt other storage tasks. Fortunately, the scale IO management server and gooey can do exactly this. That's what I'm going to demonstrate first, a quick overview and then we'll see it in action here. We have a scale IO cluster that's six total nodes. Some of them have 10 drive bays and some of them have 24 drive bays. All of them have both spinning hard drives and solid state drives of flash storage S SDS S SDS in every node are used as caching for the hard drives and the nodes with more S SDS are using the rest for a separate storage pool be stored in the system is striped across the drives in all of these nodes and mirrored.
So scale IO could lose one of these nodes or put the node in maintenance mode without any disruption in service. All of these nodes are also part of a VM ware ESX cluster running vsphere. So this environment has its hypervisor and storage converged. Let's look at the components involved and you'll see why this is a complex but automat procedure. First, there's the management server, this is hosted outside the pool of nodes running scale IO Next, there's the metadata manager or MDM. This keeps track of all the storage system components and tells them where to find each other. There are multiple copies running but only one of them is authoritative.
Any given scale a system will have either three or five total metadata manager instances. The rest are backup copies or tiebreakers used to keep the system resilient. No user data flows through the MD M. It's only responsible for coordinating. Then there's the scale IO data server in the case of our ESX cluster. This is a special VM guest that has direct access to all the physical disks in the node where it resides. Next, the Scio data client. This is a kernel module that will present the storage that's on all of these nodes and make it look like a single volume, the ESX I hypervisor, it's both providing storage using the scale IO data server. In that VM guest and consuming storage from every other scale IO data server with its client.
We've turned all of these nodes into a cluster in V center so that we can migrate guests between nodes. Finally, there are the VM guests in this cluster running applications. In our case, we just have a few and they're configured to read and write from all the storage continuously while the upgrade process is underway. I want to emphasize that all of the clients are accessing a given server during normal operation and any client is going to need to access all of the scale IO data servers in order to retrieve data. It's a scenario where all of the data is laid out in a high throughput, low latency mesh and mirrored. So there won't be any interruption to service while these nodes are being upgraded. Here's how it's going to go.
The management server sends a signal to the scale IO server components running on the note. Now they're in maintenance mode and the other nodes in the system are going to continue operating without them. The scale IO data server gets upgraded. Now the MDM gets upgraded, but we still have the scale IO client software to upgrade. And that's going to mean stopping the ESX hypervisor on this node from accessing data on our scale IO volume. So the scale IO management server tells VSPHERE to move VM guests to other nodes via V motion. There's no interruption in service and V centers distributed resource scheduler or DRS make sure that the guests end up on the right node that we use this node clear to upgrade the scio client without any traffic going to the data servers.
Now, we can bring the ESX node back online. Let DRS migrate VM guests again. Here we take the scio data server out of maintenance mode, bring the MD M back online and we're up and running and fully upgraded on this node. The management server running the upgrade process just moves on to the next one, repeating the same steps until all the nodes have been upgraded. Now that it's clear what the upgrade process consists of time to watch it in action. Here, we've got the scale IO gooey. Now we'll take a moment to start VD bench so that there's some traffic running in the background, both reads and writes. Now we can open the upgrade dialogue. We have a scale IO supplied upgrade package here that will load in to identify the destination version.
In this case, we're upgrading from 2.0 0.1 0.1 to 2.0 0.1 0.2 when it's uploaded, click. Ok. And the procedure begins here on the left is the list of nodes that are awaiting an upgrade. You can see that the list is split into two parts. The nodes that have MD M components that need upgrading and the nodes that don't, since we have a full roster of MD M rolls, we have five of these nodes. Even if we had a much larger cluster, it would still be five returning to the dashboard is just a matter of clicking the tab at the top and we can move back and forth between the upgrade process and anything else we might need to do while it's still running at this point scale. O has told the center to migrate the V MS off of this node and onto other nodes in the cluster.
In our case, we have one VM generating a lot of load rather than many V MS each generating a little. So the move will be extremely fast clusters that have a large population of V MS might need a little longer for the step from here on out. We're just going to be watching the tasks ticked off one by one in the background. Nodes will be going into maintenance mode. They'll have their VM guests migrated to other nodes, they'll have their storage software upgraded. All we need to do is sit back and watch at this point. I'll speed up playback and we'll see the nodes get upgraded one by one and the upgraded list on the right gradually fill up.
Note that in the upper right of the upgrade pane, you can keep track of how long has elapsed for the current upgrade process. Four nodes down just two more to go there. We have it. All of our nodes have been upgraded. We can see on the timer that it took approximately 14 minutes and we can see that VD bench is still running its load generators across the nodes. Traffic is still being generated. And here's where we've ended up. The thought of upgrading many nodes in a scale out storage cluster might seem daunting or time consuming. But scale IO has a built in method for performing non disruptive upgrades even in environments where storage and hypervisors are converged in a seamless and automated way.