同raidtools一樣,mdadm也可以軟件模擬故障,命令選項為
--fail或
--set-faulty:
[
root@localhost eric4ever]# mdadm --set-faulty --help
Usage: mdadm arraydevice options component devices...
This usage is for managing the component devices within an array.
The --manage option is not needed and is assumed if the first argument
is a device name or a management option.
The first device listed will be taken to be an md array device, and
subsequent devices are (potential) components of that array.
Options that are valid with management mode are:
--add -a : hotadd subsequent devices to the array
--remove -r : remove subsequent devices, which must not be active
--fail -f : mark subsequent devices a faulty
--set-faulty : same as --fail
--run -R : start a partially built array
--stop -S : deactivate array, releasing all resources
--readonly -o : mark array as readonly
--readwrite -w : mark array as readwrite
[
root@localhost eric4ever]# mdadm --fail --help
Usage: mdadm arraydevice options component devices...
This usage is for managing the component devices within an array.
The --manage option is not needed and is assumed if the first argument
is a device name or a management option.
The first device listed will be taken to be an md array device, and
subsequent devices are (potential) components of that array.
Options that are valid with management mode are:
--add -a : hotadd subsequent devices to the array
--remove -r : remove subsequent devices, which must not be active
--fail -f : mark subsequent devices a faulty
--set-faulty : same as --fail
--run -R : start a partially built array
--stop -S : deactivate array, releasing all resources
--readonly -o : mark array as readonly
--readwrite -w : mark array as readwrite
接下來我們模擬
/dev/sdb故障:
[
root@localhost eric4ever]# mdadm --manage --set-faulty /dev/md0 /dev/sdb
mdadm: set /dev/sdb faulty in /dev/md0
查看一下系統日誌,如果你配置了冗余磁盤,可能會顯示如下信息:
kernel: raid5: Disk failure on sdb, disabling device.
kernel: md0: resyncing spare disk sde to replace failed disk
檢查
/proc/mdstat,如果配置的冗余磁盤可用,陣列可能已經開始重建。
首先我們使用
mdadm --detail /dev/md0命令來查看一下RAID的狀態:
[
root@localhost eric4ever]# mdadm --detail /dev/md0
/dev/md0:
Version : 00.90.00
Creation Time : Thu May 24 13:45:35 2007
Raid Level : raid5
Array Size : 16777088 (16.00 GiB 17.18 GB)
Used Dev Size : 8388544 (8.00 GiB 8.59 GB)
Raid Devices : 3
Total Devices : 5
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Thu May 24 14:07:55 2007
State : active, degraded, recovering
Active Devices : 2
Working Devices : 3
Failed Devices : 2
Spare Devices : 1
Layout : left-symmetric
Chunk Size : 64K
Rebuild Status : 3% complete
UUID : 4b15050e:7d0c477d:98ed7d00:0f3c29e4
Events : 0.6
Number Major Minor RaidDevice State
0 8 16 0 faulty spare /dev/sdb
1 8 32 1 active sync /dev/sdc
2 8 48 2 active sync /dev/sdd
3 8 64 3 spare rebuilding /dev/sde
查看
/proc/mdstat:
[
root@localhost eric4ever]# cat /proc/mdstat
Personalities : [raid5]
read_ahead 1024 sectors
md0 : active raid5 sdb[4] sde[3] sdd[2] sdc[1]
16777088 blocks level 5, 64k chunk, algorithm 2 [3/2] [_UU]
[==>..................] recovery = 10.2% (858824/8388544) finish=12.4min speed=10076K/sec
unused devices: <none>
再查看一下RAID狀態:
[
root@localhost eric4ever]# mdadm --detail /dev/md0
/dev/md0:
Version : 00.90.00
Creation Time : Thu May 24 13:45:35 2007
Raid Level : raid5
Array Size : 16777088 (16.00 GiB 17.18 GB)
Used Dev Size : 8388544 (8.00 GiB 8.59 GB)
Raid Devices : 3
Total Devices : 5
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Thu May 24 14:08:27 2007
State : active, degraded, recovering
Active Devices : 2
Working Devices : 4
Failed Devices : 1
Spare Devices : 2
Layout : left-symmetric
Chunk Size : 64K
Rebuild Status : 11% complete
UUID : 4b15050e:7d0c477d:98ed7d00:0f3c29e4
Events : 0.8
Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 32 1 active sync /dev/sdc
2 8 48 2 active sync /dev/sdd
3 8 64 3 spare /dev/sde
4 8 16 4 spare /dev/sdb
已經完成到
11%了。查看一下日誌消息:
[
root@localhost eric4ever]# tail /var/log/messages
May 24 14:08:27 localhost kernel: --- rd:3 wd:2 fd:1
May 24 14:08:27 localhost kernel: disk 0, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
May 24 14:08:27 localhost kernel: disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdc
May 24 14:08:27 localhost kernel: disk 2, s:0, o:1, n:2 rd:2 us:1 dev:sdd
May 24 14:08:27 localhost kernel: RAID5 conf printout:
May 24 14:08:27 localhost kernel: --- rd:3 wd:2 fd:1
May 24 14:08:27 localhost kernel: disk 0, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
May 24 14:08:27 localhost kernel: disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdc
May 24 14:08:27 localhost kernel: disk 2, s:0, o:1, n:2 rd:2 us:1 dev:sdd
May 24 14:08:27 localhost kernel: md: cannot remove active disk sde from md0 ...
使用
mdadm -E命令查看一下
/dev/sdb的情況:
[
root@localhost eric4ever]# mdadm -E /dev/sdb
/dev/sdb:
Magic : a92b4efc
Version : 00.90.00
UUID : 4b15050e:7d0c477d:98ed7d00:0f3c29e4
Creation Time : Thu May 24 13:45:35 2007
Raid Level : raid5
Used Dev Size : 8388544 (8.00 GiB 8.59 GB)
Array Size : 16777088 (16.00 GiB 17.18 GB)
Raid Devices : 3
Total Devices : 5
Preferred Minor : 0
Update Time : Thu May 24 14:08:27 2007
State : active
Active Devices : 2
Working Devices : 4
Failed Devices : 1
Spare Devices : 2
Checksum : a6a19662 - correct
Events : 0.8
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 4 8 16 4 spare /dev/sdb
0 0 0 0 0 faulty removed
1 1 8 32 1 active sync /dev/sdc
2 2 8 48 2 active sync /dev/sdd
3 3 8 64 3 spare /dev/sde
4 4 8 16 4 spare /dev/sdb
自動修復完成後,我們再查看一下RAID的狀態:
[
root@localhost eric4ever]# mdadm --detail /dev/md0
/dev/md0:
Version : 00.90.00
Creation Time : Thu May 24 13:45:35 2007
Raid Level : raid5
Array Size : 16777088 (16.00 GiB 17.18 GB)
Used Dev Size : 8388544 (8.00 GiB 8.59 GB)
Raid Devices : 3
Total Devices : 5
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Thu May 24 14:21:54 2007
State : active
Active Devices : 3
Working Devices : 4
Failed Devices : 1
Spare Devices : 1
Layout : left-symmetric
Chunk Size : 64K
UUID : 4b15050e:7d0c477d:98ed7d00:0f3c29e4
Events : 0.9
Number Major Minor RaidDevice State
0 8 64 0 active sync /dev/sde
1 8 32 1 active sync /dev/sdc
2 8 48 2 active sync /dev/sdd
4 8 16 4 spare /dev/sdb
[
root@localhost eric4ever]# cat /proc/mdstat
Personalities : [raid5]
read_ahead 1024 sectors
md0 : active raid5 sdb[4] sde[0] sdd[2] sdc[1]
16777088 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
unused devices: <none>
我們可以看到
/dev/sde已經替換了
/dev/sdb。看看系統的日誌消息:
[
root@localhost eric4ever]# tail /var/log/messages
May 24 14:21:54 localhost kernel: --- rd:3 wd:3 fd:0
May 24 14:21:54 localhost kernel: disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sde
May 24 14:21:54 localhost kernel: disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdc
May 24 14:21:54 localhost kernel: disk 2, s:0, o:1, n:2 rd:2 us:1 dev:sdd
May 24 14:21:54 localhost kernel: md: updating md0 RAID superblock on device
May 24 14:21:54 localhost kernel: md: sdb [events: 00000009]<6>(write) sdb's sb offset: 8388544
May 24 14:21:54 localhost kernel: md: sde [events: 00000009]<6>(write) sde's sb offset: 8388544
May 24 14:21:54 localhost kernel: md: sdd [events: 00000009]<6>(write) sdd's sb offset: 8388544
May 24 14:21:54 localhost kernel: md: sdc [events: 00000009]<6>(write) sdc's sb offset: 8388544
May 24 14:21:54 localhost kernel: md: recovery thread got woken up ...
recovery thread got woken up ... 這時我們可以從
/dev/md0中移除
/dev/sdb設備:
[
root@localhost eric4ever]# mdadm /dev/md0 -r /dev/sdb
mdadm: hot removed /dev/sdb
類似地,我們可以使用下列命令向
/dev/md0中添加一個設備: