[RAS] badblocks
Thomas Krichel
krichel at openlib.org
Tue Feb 26 23:02:18 EST 2008
Christian Zimmermann writes
> I am back and stand ready to run to the server farm if necessary.
With 11 hours time difference, I was in bed. I have been thinking
a bit more.
I remember, when I had a similar problem with raneb, there
were only 12 or 40 bad bad blocks, but they caused the disk
to crash. Now that the offending disk has been replaced, it's
all quiet on the raneb front. I would therefore suggest that
the troubles come from the bad block
The way I understand disks, is that decay is expenential.
Most modern disks have some extra space through RAID, that
is hidden from the O/S. When bad block appear the data is
moved from the bad blocks to blocks that are healthy, in
a way that is transparent to the o/s. When there are too
many bad blocks, the o/s start seeing them, and that's
when Linux gets rather merciless, it does not take hardware
issues lightly.
So even with 3 bad blocks, we need to get rid of the disk,
software updates will not help.
e2fsck has a -c option that will scan for bad blocks and
mark the bad blocks as bad, so that they are not used by the
o/s. When we run this on startup, with the root file system
mounted read-only, it should mark the bad blocks. If
we then immediately (so that there are no further bad
blocks) rsync the files from sda to sdb, make sdb bootable,
then swap disks to boot from sdb, we should be fine.
I did such an operation locally and can give further
instructions if you agree with the general course of
action.
Cheers,
Thomas Krichel http://openlib.org/home/krichel
RePEc:per:1965-06-05:thomas_krichel
phone: +7 383 330 6813 skype: thomaskrichel
More information about the RAS-run
mailing list