[RAS] nebka problems
Christian Zimmermann
christian.zimmermann at uconn.edu
Sun Jan 20 09:15:08 EST 2008
I have investigated a little what the problem with nebka may be. Here is
what we know:
- we have ext3 errors that gave some sort of panic in the night of
Thursday to Friday.
- a reboot fixed it, the machine looked fine
- nebka went down again, approx 24 hours after the first crash
- Thomas was doing a complete backup of the machine with rsync at the
time. He did not get to the original data of the machine, the aras
account.
- We had a similar set of crashes in June 2006, that were diagnosed as an
issue with a directory in CitEc that had too many files. At the time, I
wrote:
According to http://en.wikipedia.org/wiki/Ext3, the maximum number of
files a directory can have is V*2^(-13), where V is the size of the volume
in blocks. On raneb, this would be 56335 (V=461494280). On nebka, this is
8551 (V=70057172). This would mean we are still in trouble for both (we
have 12000 NBER WPs). I hope I am misunderstanding.
So I investigated on raneb to see whether we have any overfull directories
that may get mirrored to nebka. I found in the adrepec account
~/ftp/CitEc/nbr/nberwo
~/ftp/opt/CitEc/nbr/nberwo
which each have 10630 files. So if my forecast from 18 months ago is
correct, we have the same problem as before, but in a subdirectory this
time.
If this is correct: the solution, I think, is to have a larger volume. It
turns out we have one for this machine, Bob sent it two months ago. We had
to divert it for the machine running IDEAS because of a more serious HD
problem. We have a new machine for IDEAS, we just need to configure it and
transfer content, then the drive could be reallocated to nebka. I would
just need Tim to get started on the new machine before I am back to
Connecticut (January 28).
Does this make sense? In the immediate, we would need to reboot the
machine Monday, comment out all crontab jobs, investigate the true origin
of the problem (we found it last year by looking a problematic inodes with
fsck), and then only try to back up (only the aras account, in particular
the userdata directory).
I will be in a train back to Paris again while the machine probably gets
back up (Monday EST 10am-3pm), but I will check in as soon as
possible once back in Paris.
Christian Zimmermann FIGUGEGL!
Department of Economics
University of Connecticut
341 Mansfield Road, Unit 1063
Storrs, CT 06269-1063
http://ideas.repec.org/zimm/ christian.zimmermann at uconn.edu
http://ideas.repec.org/e/pzi1.html
More information about the RAS-run
mailing list