RAID 5 and Precision 3660 Workstation
I am using a Precision 3660 workstation machine for a on demand/ periodic data storage and retrieval application.
There are 3 SSD's, 1TB each and software RAID 5 is configured in the machine. There are 3 partitions,
Recently, the OS on the running machine got crashed and I am unable to boot the system again.
Under the F12 -> Diagnostics menu, one drive is showing a status as FAILED and is placed in the non-RAID member section. The other two drives are healthy and showing as RAID members.
Having faced such scenarios multiple times, the usual process to bring the machine to life is by deleting the RAID 5 array, reinstalling the Windows (10 pro), use the system image to install, redesign missing things (e.g. data on the D and E partitions is not recovered in this scenario.).
The downtime is long and the business in turn is affected tremendously.
Someone had suggested to take a manual backup from these drives before deleting the RAID (5), but I am looking for a faster, more reliable storage architecture here.
Considering my scenario here, I have a few questions -
1. What is the best method to bring my system back (using the system image).
2. Pl suggest an alternative system design to achieve OS redundancy (If OS is crashed, there is a backup available), ease of rebuilding the RAID without data loss (in case of failure)
3. I want to understand if the cause of OS corruption is related to the software RAID 5 configuration in any way because the machine is strictly kept away from internet access, pen-drive access. Also, the operators hesitate to force shutting or starting of the machine.
My application is write intensive. Also, there was no hardware failure in this case or before.
Details of the RAID volume after failure -