摘要 |
Techniques, systems, and devices are disclosed for remediating a failed drive in a set of drives, such as a RAID system, without having to physically replace the failed drive. After receiving a signal of an error indicating a specific physical portion on a storage drive in the set of storage drives has caused the drive to fail, the system can unmount the drive from the filesystem while other drives continue to operate. Next, the system can identify one or more files in the filesystem that have associations with the specific physical portion on the failed drive. Next, the system can remount the drive onto the filesystem and subsequently delete the identified files from the filesystem. The system can then perform a direct I/O write to the specific physical portion on the failed drive to force reallocation of the specific physical portion to a different area on the failed drive. The system can also power-cycle the drive before this remediation, e.g., to determine if this remediation can be avoided. |