[CollEc] Website fail
Thomas Krichel
krichel at openlib.org
Sun Sep 15 14:39:26 UTC 2024
Christian Düben writes
> For performance reasons, threads write to their own files.
I am not sure what threads are and why we need them here. All I
need is to have the paths from one author to all others in a
file. These can all be run in parallel. In your run, you seem to try
to do all authors at the same time. This poses a great strain on the
machine. I suggest to calculate one author at a time, using
parallel proccessig in a database on when author data has been
changed.
> This way, I can use parallelism without locks. If you prefer all
> paths, distances, and closeness centrality values to respectively be
> in single files instead of thread-specific files, I can change
> that. However, that probably slows down the program's execution.
This massive parallel way of handling the job makes no sense to
me.
> All shortest paths within an author pair are not necessarily stored
> consecutively. A paths file might contain the first shortest path
> from author 1 to author 2, followed by the first shortest path from
> author 1 to author 4, followed by the second shortest path from
> author 1 to author 2. I can order them, if needed - again at a
> performance penalty.
This makes no sense to me. This is not how I built the old
CollEc. I ran a system that took nodes and updated them. Then
I could run updates around the clock, and I can ran as many
processess as I have machine capacity for. That is a
completely different approach than what you try, which is
to make a complete calculation every now and then.
Now the machine is so slow that I can hardly use it.
It would be better to solve the task at hand, which is
to create a fast program to do binary paths for an
individual author. I can then take this up and try
to rescuciate the old site.
--
Written by Thomas Krichel http://openlib.org/home/krichel on his 21653rd day.
More information about the CollEc-run
mailing list