[RAS] badblocks

Wed Feb 27 13:52:19 EST 2008

----- Forwarded message from Bob Parks <bparks at artsci.wustl.edu> -----

Envelope-to: krichel at localhost
Delivery-date: Thu, 28 Feb 2008 00:47:49 +0600
From: Bob Parks <bparks at artsci.wustl.edu>
To: Thomas Krichel <krichel at openlib.org>
X-Antivirus: avast! (VPS 080227-0, 02/27/2008), Outbound message
X-Antivirus-Status: Clean
X-SA-Exim-Connect-IP: 128.252.93.43
X-SA-Exim-Mail-From: bparks at artsci.wustl.edu
X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on snefru.openlib.org
X-Spam-Level: 
X-Spam-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00 autolearn=ham
	version=3.2.3
Subject: Re: [RAS] badblocks
X-SA-Exim-Version: 4.2.1 (built Tue, 21 Aug 2007 23:39:36 +0000)
X-SA-Exim-Scanned: Yes (on snefru.openlib.org)

Thomas Krichel wrote:

>  Bob Parks writes
>
>  
>> Yes, IMHO.  As Christian wrote earlier about nebka, there are limits to 
>> directory sizes.  He seemed to indicate that a cron job
>> with du might have been the entire problem.  We have had similar problems 
>> in the past.    
>
>  my theory: du puts stress on the disk, it hits the bad block, and bang! 
>  
Possible, very possible.

>> There are bad blocks on every disk.  Bad blocks, unless a large number, 
>> do not show that the 'disk' is failing. And again, this is a mirror'ed 
>> disk, two disks, in Raid 1, with a hardware controller.  Now that I think 
>> on it,
>> it is not clear what badblocks on what disk are being reported by the 
>> Adaptec controller -
>>    
>
>  my theory: the disk is one disk to the o/s.  
Yes it is, but a bad block is a physical disk concept - but who knows what 
evil lurks in the depths.

>  
>> Note that nearly identical hardware exists on Bill's RFE machine and 
>> never an error.  You have had problems
>> on nebka, and snefru (idential hardware) and raneb (very different 
>> hardware).  That alone leads me to suspect
>> software.
>>    
>
>  I don't remember a problem on snefru. The common file set are
>  the adrepec files (common on raneb, sahure, fafner, nebka,  mutabor) and 
> the citec files, common on mutabor, raneb,
>  snefru, sahure, fafner (Yes, I back up!). 
>  What I think is what's written in 27.2.4. badblocks and e2fsck
>  of 
> http://eduunix.ccut.edu.cn/index/html/linux/OReilly.LPI.Linux.Certification.in.a.Nutshell.2nd.Edition.Jul.2006/0596005288/lpicertnut2-CHP-27-SECT-2.html
>
>  They say 
> When a disk is failing, it will usually get an exponential increase in
> bad blocks, and after a short while it will run out of spare blocks,
> whereupon you will get into trouble with your filesystems on that
> disk.
>
>  It has already run out of spare blocks, that's why some
>  bad blocks show up to the o/s. 
>  
Could very well be - the eduunix.ccut.edu is very good and I will go with 
your theory.  I will be interested to know just
how you rsync to the 143 gig and then make it bootable.
Bob

>  Cheers,
>
>  Thomas Krichel                    http://openlib.org/home/krichel
>                                RePEc:per:1965-06-05:thomas_krichel
>  phone: +7 383 330 6813                       skype: thomaskrichel
>
> _______________________________________________
> RAS-run mailing list
> RAS-run at lists.openlib.org
> http://lists.openlib.org/cgi-bin/mailman/listinfo/ras-run
>