[RAS] nebka problems

Christian Zimmermann christian.zimmermann at uconn.edu
Sun Jan 20 11:06:56 EST 2008


On Sun, 20 Jan 2008, Thomas Krichel wrote:

>
>
>  Christian Zimmermann writes
>
>> - a reboot fixed it, the machine looked fine
>> - nebka went down again, approx 24 hours after the first crash
>
>  I just looked at this again, the crontab I commented
>  from mutabor, and whom I think is responsible, is NOT
>  an upload from mutabor to nebka but the reverse
>
> #!/bin/sh
> rsync -t --log-format=%n aras at nebka.openlib.org:citec-export/* /home/adnetec/ras-exports/ | ~/Ivan/handle_ras_exports.pl /home/adnetec/ras-exports/
>
>
>> - Thomas was doing a complete backup of the machine with rsync at the
>> time. He did not get to the original data of the machine, the aras
>> account.

That backup, in parallel to the rsync above, must have been working on the 
directories I mention below.

>> - We had a similar set of crashes in June 2006, that were diagnosed as an
>> issue with a directory in CitEc that had too many files. At the time, I
>> wrote:
>
>  But this was an upload, and it was a number taht was a lot
>  bigger than the numbers we have now.
>

But rsync is a huge resource hog, and we have less free space than 18 
months ago. Looking at some literature on rsync, it turns out it holds the 
information about the whole directory tree in memory. So plicing things up 
can give welcome relief. Swap space would be grateful.

>
>> Does this make sense? In the immediate, we would need to reboot the
>> machine Monday, comment out all crontab jobs, investigate the true origin
>> of the problem (we found it last year by looking a problematic inodes with
>> fsck), and then only try to back up (only the aras account, in particular
>> the userdata directory).
>
>  OK.
>
>  In addition, I suggest you open an account at the ideas
>  machine, to hold the most important data from acis and ras.
>  This backup should be conducted every hour or so, in addition
>  to backups to sahure (later to raneb) and fafner, done
>  on alternate days.
>

That is a possibility for the new machine. Not the current one, which has 
1/3 of the disk space nebka has. But I absolutely refuse to use rsync.
We have even debated moving RAS to the new machine, to economize on rack 
space. But we may want to have both for redundancy.

>> I will be in a train back to Paris again while the machine probably gets
>> back up (Monday EST 10am-3pm), but I will check in as soon as
>> possible once back in Paris.
>
>  I will be at home on Monday night. I am 5 hours ahead of
>  you, 11 hours ahead of EST.
>
>  If I can be of any help any time, please don't hesitate
>  to call me on my home number below. I can call you right
>  back.
>
>  Cheers,
>
>  Thomas Krichel                    http://openlib.org/home/krichel
>                                RePEc:per:1965-06-05:thomas_krichel
>  phone: +7 383 330 6813                       skype: thomaskrichel
>



More information about the RAS-run mailing list