[CollEc] Website fail

Thomas Krichel krichel at openlib.org
Sun Sep 15 14:39:26 UTC 2024


  Christian Düben writes

> For performance reasons, threads write to their own files.

  I am not sure what threads are and why we need them here.  All I
  need is to have the paths from one author to all others in a
  file. These can all be run in parallel. In your run, you seem to try
  to do all authors at the same time. This poses a great strain on the
  machine.  I suggest to calculate one author at a time, using
  parallel proccessig in a database on when author data has been
  changed.
  
> This way, I can use parallelism without locks. If you prefer all
> paths, distances, and closeness centrality values to respectively be
> in single files instead of thread-specific files, I can change
> that. However, that probably slows down the program's execution.

  This massive parallel way of handling the job makes no sense to
  me.

> All shortest paths within an author pair are not necessarily stored
> consecutively. A paths file might contain the first shortest path
> from author 1 to author 2, followed by the first shortest path from
> author 1 to author 4, followed by the second shortest path from
> author 1 to author 2. I can order them, if needed - again at a
> performance penalty.

  This makes no sense to me. This is not how I built the old
  CollEc. I ran a system that took nodes and updated them. Then
  I could run updates around the clock, and I can ran as many
  processess as I have machine capacity for. That is a
  completely different approach than what you try, which is
  to make a complete calculation every now and then.

  Now the machine is so slow that I can hardly use it. 

  It would be better to solve the task at hand, which is
  to create a fast program to do binary paths for an
  individual author. I can then take this up and try
  to rescuciate the old site.


-- 
  Written by Thomas Krichel http://openlib.org/home/krichel on his 21653rd day.



More information about the CollEc-run mailing list