[RAS] nebka up
Christian Zimmermann
christian.zimmermann at uconn.edu
Tue Jan 29 10:30:09 EST 2008
On Tue, 29 Jan 2008, Christian Zimmermann wrote:
> The machine survived overnight. It passed all test on the ACIS side. I am now
> restoring progessively services. The RI daemon is now running, I did a run of
> /home/aras/acis/bin/nightly >>/home/aras/nightly.log 2>&1, which is scheduled
> it crontab to run at 23:45, i.e. just after the last known instructions
> before the crashes. It worked well.
>
> I have not reestablished the following services:
That should read:
I have *now* reestablished the following services:
>
> #
> # Report, backup, rotate, archive
> #
> 54 23 * * * /home/aras/acis/bin/nightly >>/home/aras/nightly.log 2>&1
>
> #
> # Make RePEc:ras (RePEc:per) archive
> #
> */9 * * * * cd /home/aras/acis && /home/aras/acis/bin/make-repec-per.sh
>
> #
> # Clean up old ACIS user sessions
> #
> */26 * * * * /home/aras/acis/bin/clean-up >> /home/aras/acis/clean-up.log
>
> #
> # Clean up old ACIS user sessions
> #
> */26 * * * * /home/aras/acis/bin/clean-up >> /home/aras/acis/clean-up.log
>
> #
> # Update daemon database checkpoint
> #
> 27 * * * * cd /home/aras/lib/bdb/bin ; ./db_checkpoint -1 -h
> /home/aras/acis/RI\/data && ./db_archive -d -h /home/aras/acis/RI/data
>
> I am holding off the rest for the moment. Should we revert the DNS record so
> that people can connect now?
>
>
> On Mon, 28 Jan 2008, Ivan Kurmanov wrote:
>
>> Sounds hopeful.
>>
>> There is also a job or two in crontab of user adrepec.
>>
>> in root, do "crontab -lu adrepec"
>>
>> ivan
>>
>>
>> On 28 Jan 2008, at 22:27, Christian Zimmermann wrote:
>>
>>> I looked everywhere in the logs, I see nothing wrong. There are some
>>> indications of corrupt mysql tables, but when I checked those used by RAS
>>> after the first crash, they were fine. Maybe there are corrupt tables
>>> elsewhere. I have not yet run the checks, I'll try this evening.
>>>
>>> I commented out crontab in the root and aras accounts with '#CZ'. Let's
>>> see whether the machine survives the night. If so, and nobody else see a
>>> problem, we should gradually get the service back. The first thing would
>>> be to get adrepec current. Then open the web server to users. Then get
>>> CitEc data back. Does this make sense?
>>>
>>>
>>>
>>> On Mon, 28 Jan 2008, Christian Zimmermann wrote:
>>>
>>>> First things I see: both crashes happened exactly at the same time:
>>>>
>>>> Jan 17 23:09:01 nebka /USR/SBIN/CRON[14205]: (aras) CMD (cd
>>>> /home/aras/acis && /home/aras/acis/bin/make-repec-per.sh )
>>>> Jan 17 23:10:01 nebka /USR/SBIN/CRON[14237]: (www-data) CMD ([ -x
>>>> /usr/lib/cgi-bin/awstats.pl -a -f /etc/awstats/awstats.conf -a -r
>>>> /var/log/apache/access.log ] && /usr/lib/cgi-bin/awstats.pl
>>>> -config=awstats -update >/dev/null)
>>>> Jan 17 23:10:01 nebka /USR/SBIN/CRON[14238]: (root) CMD (test -x
>>>> /usr/lib/atsar/atsa1 && /usr/lib/atsar/atsa1)
>>>> Jan 17 23:15:01 nebka /USR/SBIN/CRON[14474]: (root) CMD ([ -x
>>>> /usr/lib/sysstat/sa1 ] && { [ -r "$DEFAULT" ] && . "$DEFAULT" ; [
>>>> "$ENABLED" = "true" ] && exec /usr/lib/sysstat/sa1 $SA1_OPTIONS 1 1 ; })
>>>> Jan 17 23:16:01 nebka /USR/SBIN/CRON[14476]: (aras) CMD
>>>> (/home/aras/acis/bin/apu 7 >>/home/aras/apu-job.log 2>&1)
>>>> Jan 17 23:17:01 nebka /USR/SBIN/CRON[14489]: (root) CMD ( cd / &&
>>>> run-parts --report /etc/cron.hourly)
>>>> Jan 17 23:18:01 nebka /USR/SBIN/CRON[14492]: (aras) CMD (cd
>>>> /home/aras/acis && /home/aras/acis/bin/make-repec-per.sh )
>>>> Jan 17 23:20:01 nebka /USR/SBIN/CRON[14547]: (www-data) CMD ([ -x
>>>> /usr/lib/cgi-bin/awstats.pl -a -f /etc/awstats/awstats.conf -a -r
>>>> /var/log/apache/access.log ] && /usr/lib/cgi-bin/awstats.pl
>>>> -config=awstats -update >/dev/null)
>>>> Jan 17 23:20:01 nebka /USR/SBIN/CRON[14548]: (root) CMD (test -x
>>>> /usr/lib/atsar/atsa1 && /usr/lib/atsar/atsa1)
>>>> Jan 17 23:22:01 nebka /USR/SBIN/CRON[14703]: (root) CMD (du -cs /* >
>>>> du_slash_`date -I`)
>>>> Jan 18 14:04:08 nebka syslogd 1.4.1#18: restart.
>>>>
>>>>
>>>> ...
>>>>
>>>>
>>>> Jan 17 23:09:01 nebka /USR/SBIN/CRON[14205]: (aras) CMD (cd
>>>> /home/aras/acis && /home/aras/acis/bin/make-repec-per.sh )
>>>> Jan 17 23:10:01 nebka /USR/SBIN/CRON[14237]: (www-data) CMD ([ -x
>>>> /usr/lib/cgi-bin/awstats.pl -a -f /etc/awstats/awstats.conf -a -r
>>>> /var/log/apache/access.log ] && /usr/lib/cgi-bin/awstats.pl
>>>> -config=awstats -update >/dev/null)
>>>> Jan 17 23:10:01 nebka /USR/SBIN/CRON[14238]: (root) CMD (test -x
>>>> /usr/lib/atsar/atsa1 && /usr/lib/atsar/atsa1)
>>>> Jan 17 23:15:01 nebka /USR/SBIN/CRON[14474]: (root) CMD ([ -x
>>>> /usr/lib/sysstat/sa1 ] && { [ -r "$DEFAULT" ] && . "$DEFAULT" ; [
>>>> "$ENABLED" = "true" ] && exec /usr/lib/sysstat/sa1 $SA1_OPTIONS 1 1 ; })
>>>> Jan 17 23:16:01 nebka /USR/SBIN/CRON[14476]: (aras) CMD
>>>> (/home/aras/acis/bin/apu 7 >>/home/aras/apu-job.log 2>&1)
>>>> Jan 17 23:17:01 nebka /USR/SBIN/CRON[14489]: (root) CMD ( cd / &&
>>>> run-parts --report /etc/cron.hourly)
>>>> Jan 17 23:18:01 nebka /USR/SBIN/CRON[14492]: (aras) CMD (cd
>>>> /home/aras/acis && /home/aras/acis/bin/make-repec-per.sh )
>>>> Jan 17 23:20:01 nebka /USR/SBIN/CRON[14547]: (www-data) CMD ([ -x
>>>> /usr/lib/cgi-bin/awstats.pl -a -f /etc/awstats/awstats.conf -a -r
>>>> /var/log/apache/access.log ] && /usr/lib/cgi-bin/awstats.pl
>>>> -config=awstats -update >/dev/null)
>>>> Jan 17 23:20:01 nebka /USR/SBIN/CRON[14548]: (root) CMD (test -x
>>>> /usr/lib/atsar/atsa1 && /usr/lib/atsar/atsa1)
>>>> Jan 17 23:22:01 nebka /USR/SBIN/CRON[14703]: (root) CMD (du -cs /* >
>>>> du_slash_`date -I`)
>>>> Jan 18 14:04:08 nebka syslogd 1.4.1#18: restart.
>>>>
>>>> du /* seems to be the tripping point.
>>>>
>>>> Christian Zimmermann FIGUGEGL!
>>>> Department of Economics
>>>> University of Connecticut
>>>> 341 Mansfield Road, Unit 1063
>>>> Storrs, CT 06269-1063
>>>> http://ideas.repec.org/zimm/ christian.zimmermann at uconn.edu
>>>> http://ideas.repec.org/e/pzi1.html
>>>>
>>>> On Mon, 28 Jan 2008, Christian Zimmermann wrote:
>>>>
>>>>> Tim seems to have put nebka back online, and it seems to be spewing out
>>>>> emails. I will comment everything in crontab and kill whatever is
>>>>> running
>>>>> to let us investigate the problems.
>>>>>
>>>>> Christian Zimmermann FIGUGEGL!
>>>>> Department of Economics
>>>>> University of Connecticut
>>>>> 341 Mansfield Road, Unit 1063
>>>>> Storrs, CT 06269-1063
>>>>> http://ideas.repec.org/zimm/ christian.zimmermann at uconn.edu
>>>>> http://ideas.repec.org/e/pzi1.html
>>>>>
>>>>> _______________________________________________
>>>>> RAS-run mailing list
>>>>> RAS-run at lists.openlib.org
>>>>> http://lists.openlib.org/cgi-bin/mailman/listinfo/ras-run
>>>>>
>>>>
>>>> _______________________________________________
>>>> RAS-run mailing list
>>>> RAS-run at lists.openlib.org
>>>> http://lists.openlib.org/cgi-bin/mailman/listinfo/ras-run
>>>>
>>>
>>> _______________________________________________
>>> RAS-run mailing list
>>> RAS-run at lists.openlib.org
>>> http://lists.openlib.org/cgi-bin/mailman/listinfo/ras-run
>>
>> -ivan
>>
>>
>>
>
More information about the RAS-run
mailing list