[CollEc] Website fail

Thomas Krichel krichel at openlib.org
Sun Sep 15 17:01:56 UTC 2024


  Christian Düben writes

> Honestly, precomputing all shortest paths is a terrible idea. It is
> unnecessarily inefficient.

  This is dependent on what we say the aim is. I always thought the
  aim is for folks to see the path. Here is your path so some other
  economist. 

> Centrality measures need to be computed beforehand, but paths should
> be derived during user sessions.

  If you say so you must be right. But my design does not suit this
  thinking. It was build on the idea path first, centrality
  second. You think the opposite. I lack knowledge to ascertain
  whether my or your apprach is better. I suspect is is a matter
  of business case

>  All
> paths taken together occupy hundreds of GB on disk.

  I am by no means a specialist in this, but the problem is not
  disk space. The problem is computing time. 

> The best way would to store the data in Neo4j and update the data
> base based on messages to an API.

  I don't knwo what neo4j is but, yes, I think that is correct.
  We want to calculate new paths on demand when we think that
  something is changed. I am not a specialist in this area,
  that's why I used my admitingly primitive but robust approach.
  As I look at neo4j I see it's a commercial offering which is
  likely to lead to funding problems down the line.

> But there is no API. CollEc's input is an xml file, which does not
> even come with a change log, just as the full data set.

  If you say what the changelog should be we can build one.

> I can reduce the number of threads, i.e. the number of workers
> running in parallel, if the load is too heavy. RAM utilization is
> already minimal. The new code is the most performant program any
> version of CollEc has ever seen.

  Yes, it would need to run continously and write the paths
  in files per origin. 

> I have sacrificed multiple days to craft this piece of software
> exactly to your demands. You now have the binary paths for individual
> authors.

  In a bunch of aggregates that are 100G each (?), which I then
  have to parse, but when?

> You have distance values and you have closeness centrality
> results. Everything is stored in the requested antique output
> formats.

  I can try write software that try to compile my path files
  from your output

-- 
  Written by Thomas Krichel http://openlib.org/home/krichel on his 21653rd day.



More information about the CollEc-run mailing list