摘要 |
An apparatus and a method for improving the fault tolerance of storage systems by replacing disk drives, which are about to fail, are disclosed. The set of disk drives in a storage system are monitored to identify failing disk drives. A processing unit identifies the failing disk drive and selects a spare disk drive to replace the failing disk drive. The selected spare disk drive is powered on, and data from the failing disk drive is copied to the selected spare disk drive. A memory unit stores attributes and sensor data for the disk drives in the storage system. The attributes and sensor data are used by the processing unit to identify a failing disk drive. Attributes for disk drives are obtained by using SMART, and sensor data is obtained from environmental sensors such as temperature and vibration sensors.
|