PowerFlex Filesystem not recognized after reboot
Summary: Filesystem corruption or not exist after reboot.
Symptoms
Scenario
- Create a file system on a ScaleIO device (scinia, sinib,…)
- Mount the file system on /dev/scinia
- Reboot the server
Symptoms
After reboot, the user is unable to mount the file system.
When the server had completed the reboot, I was unable to mount the file system, nor repair it with fsck.
# mount /dev/scinia /mnt
mount: you must specify the filesystem type
# mount -t ext4 /dev/scinia /mnt
mount: wrong fs type, bad option, bad superblock on /dev/scinia,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so
# fsck /dev/scinia
fsck from util-linux-ng 2.17.2
e2fsck 1.41.12 (17-May-2010)
fsck.ext4: Superblock invalid, trying backup blocks...
fsck.ext4: Bad magic number in super-block while trying to open /dev/scinia
The superblock could not be read or does not describe a correct ext2
filesystem. If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock: e2fsck -b 8193
# e2fsck -b 8193 /dev/scinia
e2fsck 1.41.12 (17-May-2010)
e2fsck: Bad magic number in super-block while trying to open /dev/scinia
The superblock could not be read or does not describe a correct ext2
filesystem. If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
e2fsck -b 8193 <device>not
From /var/log/messages:
[Reboot occurred here] Dec 18 13:15:48 e8c4-dl360g7-01 kernel: ECS R1_20:Created device scinia (16,0). Capacity 33554432 LB Dec 18 13:15:48 e8c4-dl360g7-01 kernel: scinia: unknown partition table Dec 18 13:25:21 e8c4-dl360g7-01 kernel: EXT4-fs (scinia): VFS: Can't find ext4 filesystem Dec 18 13:26:58 e8c4-dl360g7-01 kernel: EXT4-fs (scinia): VFS: Can't find ext4 filesystem Dec 18 13:27:11 e8c4-dl360g7-01 kernel: EXT4-fs (scinia): VFS: Can't find ext4 filesystem Dec 18 13:28:30 e8c4-dl360g7-01 kernel: EXT4-fs (scinia): VFS: Can't find ext4 filesystem Dec 18 13:33:26 e8c4-dl360g7-01 kernel: EXT4-fs (scinia): VFS: Can't find ext4 filesystem Dec 18 13:34:07 e8c4-dl360g7-01 kernel: EXT4-fs (scinia): VFS: Can't find ext4 filesystem Dec 18 16:11:00 e8c4-dl360g7-01 kernel: ECS R1_20:Created device scinia (16,0). Capacity 33554432 LB Dec 18 16:11:00 e8c4-dl360g7-01 kernel: scinia: unknown partition table Dec 18 16:11:00 e8c4-dl360g7-01 kernel: EXT4-fs (scinia): VFS: Can't find ext4 filesystem
Impact
The file system looks unusable and corrupted.
Cause
When Linux comes up after reboot, it starts scanning the devices and automatically assigns letters to the device by order (first he finds - get the first letter). Usually, when the user does not change anything, after reboot, the devices will come up with the same device name, however, sometimes the device name can change, and be assigned to a different device.
If the user mounted his file system to /dev/sciniX and not to the UUID of the device, he can think that the file system is corrupted, but it is assigned to a different device name. This is a Linux issue and not only ScaleIO related.
Example: The user had 10 devices, then he reduced it to 5 and did a reboot only a day later. You can see that the device that was assigned to scinia is later assigned to a different device.
17th (after reduction from 10 to 5 devices)
On server-01
# ls -l /dev/disk/by-id/scaleio* lrwxrwxrwx 1 root root 12 Dec 17 12:00 /dev/disk/by-id/scaleio-vol-376584c0169c4216-49ff9f7a0000000c -> ../../scinij lrwxrwxrwx 1 root root 12 Dec 17 12:00 /dev/disk/by-id/scaleio-vol-376584c0169c4216-49ff9f7b0000000d -> ../../scinib lrwxrwxrwx 1 root root 12 Dec 17 12:00 /dev/disk/by-id/scaleio-vol-376584c0169c4216-49ff9f7c0000000e -> ../../scinii lrwxrwxrwx 1 root root 12 Dec 17 12:00 /dev/disk/by-id/scaleio-vol-376584c0169c4216-49ff9f7d0000000f -> ../../scinia lrwxrwxrwx 1 root root 12 Dec 17 12:00 /dev/disk/by-id/scaleio-vol-376584c0169c4216-49ff9f7e00000010 -> ../../scinih
Now: i.e., after reboot on the 18th when File System (FS) "disappeared"
ls -l /dev/disk/by-id/scaleio* lrwxrwxrwx 1 root root 12 Dec 19 10:51 /dev/disk/by-id/scaleio-vol-376584c0169c4216-49ff9f7a0000000c -> ../../scinia lrwxrwxrwx 1 root root 12 Dec 19 10:24 /dev/disk/by-id/scaleio-vol-376584c0169c4216-49ff9f7b0000000d -> ../../scinid lrwxrwxrwx 1 root root 12 Dec 19 10:24 /dev/disk/by-id/scaleio-vol-376584c0169c4216-49ff9f7c0000000e -> ../../scinic lrwxrwxrwx 1 root root 12 Dec 19 10:24 /dev/disk/by-id/scaleio-vol-376584c0169c4216-49ff9f7d0000000f -> ../../scinie lrwxrwxrwx 1 root root 12 Dec 19 10:33 /dev/disk/by-id/scaleio-vol-376584c0169c4216-49ff9f7e00000010 -> ../../scinib
Resolution
Workaround
Ask the user to use UUID for mount.
Example: They should map the volume using device uuid and not the device name:
ls /dev/disk/by-uuid/ to find out the UUID
Example for fstab entry:
UUID=<UUID> <mount point> ext4 defaults,errors=remount-ro 0 1