[CollEc] Helos offline

Christian Zimmermann zimmermann at stlouisfed.org
Sun Jul 25 13:24:58 UTC 2021


I see there is no http://collec.repec.org/robots.txt...

Christian Zimmermann                          FIGUGEGL!
Economic Research
Federal Reserve Bank of St. Louis
P.O. Box 442
St. Louis MO 63166-0442 USA
https://ideas.repec.org/zimm/   @CZimm_economist

On Sun, 25 Jul 2021, Thomas Krichel wrote:

>  D�ben, Christian writes
>
>> At the beginning of June, I installed a script that records the
>> times CollEc was accessed - no other variable, just the access
>> time. When plotting the results aggregated by day, you can see that
>> the number of daily app visits tends to fluctuate around 1,000 (see
>> Subset.pdf). However, yesterday it surged to almost 30,000 (see
>> Full_Period.pdf). Monit just notified me at 9:30 am today that the
>> app was offline. So, I do not know whether that is related to the
>> server issue. But tons of machines firing requests at port 80 on one
>> day and the server becoming inaccessible on the next appears to be
>> an odd coincidence.
>
>  Well, if you just log the times, how can you claim it's "ton of
>  machines"? I did go through the apache log, and the surge appears
>  to come from indeed, a bunch of servers from Huawei's petalsearch.
>  The requests look legit. I'm sure they use reasonable defaults. It
>  just that the shinyapp is slow.
>
>  Apache keeps saying 503 but keeps logging so it was still up. The
>  odd thing is that we could not get through on the ssh. Since we
>  only have that route to the server we are stuck, and have to
>  ask for Cezar.
>
>  There is a change for 502 to 503
>
> 114.119.158.156 - - [24/Jul/2021:09:04:30 +0200] "GET /app_direct/collec_app/?_inputs_&navbars=%22tab_Coauthors%22&_values_&g_author=%22pel60%22 HTTP/1.1" 502 646 "-" "Mozilla/5.0 (Linux; Androi
> d 7.0;) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; PetalBot;+https://webmaster.petalsearch.com/site/petalbot)"
> 114.119.136.243 - - [24/Jul/2021:09:04:39 +0200] "GET /app_direct/collec_app/?_inputs_&navbars=%22tab_Coauthors%22&_values_&g_author=%22ppa963%22 HTTP/1.1" 502 646 "-" "Mozilla/5.0 (Linux; Andro
> id 7.0;) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; PetalBot;+https://webmaster.petalsearch.com/site/petalbot)"
> 114.119.134.212 - - [24/Jul/2021:09:04:43 +0200] "GET /app_direct/collec_app/?_inputs_&navbars=%22tab_Coauthors%22&_values_&g_author=%22pkr268%22 HTTP/1.1" 503 575 "-" "Mozilla/5.0 (Linux; Android 7.0;) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; PetalBot;+https://webmaster.petalsearch.com/site/petalbot)"
> 114.119.146.29 - - [24/Jul/2021:09:04:43 +0200] "GET /app_direct/collec_app/?_inputs_&navbars=%22tab_Coauthors%22&_values_&g_author=%22pbe625%22 HTTP/1.1" 503 575 "-" "Mozilla/5.0 (Linux; Android 7.0;) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; PetalBot;+https://webmaster.petalsearch.com/site/petalbot)"
>
>  at 9:04 so that's pretty consistent with what you note.
>  The non-accessibilty presumably has to do with helos
>  running out of memory, but why did the oom killer not work?
>  Well it run, but was not enough. We have in syslog
>
> root at helos /var/log # grep 'R invoked oom-killer' syslog.1
> Jul 24 08:59:09 helos kernel: [14922235.506685] R invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0
> Jul 24 09:30:07 helos kernel: [14924093.497980] R invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0
> Jul 24 10:26:46 helos kernel: [14927492.848174] R invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0
> Jul 24 10:58:08 helos kernel: [14929347.932058] R invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0
> Jul 24 12:08:50 helos kernel: [14933616.461377] R invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0
> Jul 24 12:58:00 helos kernel: [14936548.248476] R invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0
> Jul 24 13:10:19 helos kernel: [14937294.624810] R invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0
> Jul 24 13:23:38 helos kernel: [14938104.947025] R invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0
> Jul 24 14:08:06 helos kernel: [14940762.579273] R invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0
> Jul 24 14:24:13 helos kernel: [14941739.313980] R invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0
> Jul 24 16:32:52 helos kernel: [14949437.368614] R invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0
> Jul 24 17:50:50 helos kernel: [14954122.341626] R invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0
>
>  But seemingly these oom kill are not enough to keep ssh up.
>
>  I suspect what could be done is a script that checks whether
>  the uptime is greater than a day. In that case, grep for
>  'R invoked oom-killer' in syslog, if found, reboot. Run
>  that every hour. I've never written / run anything like that.
>
>  The easier thing is to disable petal via hosts.txt.
>
>  Your thoughts?
>
> -- 
>
>  Cheers,
>
>  Thomas Krichel                  http://openlib.org/home/krichel
>                                              skype:thomaskrichel
>
> _______________________________________________
> CollEc-run mailing list
> CollEc-run at lists.openlib.org
> http://lists.openlib.org/cgi-bin/mailman/listinfo/collec-run
>


More information about the CollEc-run mailing list