[CollEc] RePEc Visual

Düben, Christian Christian.Dueben at uni-hamburg.de
Thu May 21 13:39:12 UTC 2020


I checked how fast CollEc computations run when executed in C and C++ through an R package. The underlying graph contained 47,192 authors who wrote at least one co-authored paper. I weighted the edges between co-authors by their number of joint papers.

First, I calculated the distance matrix. Distances are measured as the length of Dijkstra's shortest cost paths. Calculating and writing those 2,227,084,864 cell values to disk took 4.77 minutes in a process parallelized across 8 cores. Computing each author's closeness value and writing it to disk took 4.27 minutes in an 8 core process. Betweenness is quite slow in comparison.

The code still leaves space for improvement. All three measures are derived from shortest cost paths. So, it would be more efficient to derive those paths once and use them for all three measures rather than computing them thrice. Another point is the parallel process structure. The iterations' chunk size may not be optimal and could be improved through further tests.

If users only access data for a small number of authors at once, it is not even necessary to previously calculate those values and store them on disk or on a SQL database server. With the graph kept in memory computations are quasi instant for small sets of authors.

See you tomorrow.

Christian Düben
Research Associate
Chair of Macroeconomics
Hamburg University
Von-Melle-Park 5, Room 3102
20146 Hamburg
Germany
+49 40 42838 1898
christian.dueben at uni-hamburg.de
http://www.christian-dueben.com

-----Ursprüngliche Nachricht-----
Von: Thomas Krichel <krichel at openlib.org> 
Gesendet: Mittwoch, 20. Mai 2020 14:14
An: Düben, Christian <Christian.Dueben at uni-hamburg.de>
Cc: CollEc Run <collec-run at lists.openlib.org>
Betreff: Re: RePEc Visual

  Düben, Christian writes

> I went through some of the files and checked what I would need for an 
> extension of CollEc. I have a few ideas in mind on what to add and how 
> to present it in an interactive application.

  It's very hard to do a worse job than I did vizualizing that
  data!
  
> When consulting our IT department here at Hamburg University, they 
> suggested to host RePEc Visual on one of their managed Linux servers. 
> At this point I am still waiting for the administration to process my 
> application requesting such a server. And just like every 
> administrative procedure at our institution, this takes a while. Once 
> I have access to the respective infrastructure I am going to test 
> implementations of RePEc Visual and potential CollEc extensions on it. 
> Those applications would of course run under an external domain, not a 
> Hamburg University domain.

  We could run this on the existing CollEc server. This would
  be especially valuable if you manage to find a way to run the
  calculations faster. At this time, it's dreadfully slow. You could
  just take over the whole thing, well almost. We need to keep the
  mention of the sponsor, and I'd like to be aknowledged as the
  orginal creator. 

> I do not have Telegram and apparently do not have the correct login 
> credentials for the Skype setup on my office Laptop. Do you use Zoom? 
> If you do, I can send you a meeting link. If you do not, I will try to 
> find out what login credentials our IT set for Skype.

  Zoom should be fine. I'm in UTC+7. I can do late evenings no
  problem. My schedule is completely open. Maybe someone else would
  want to attend? I copy CollEc-run.

-- 

  Cheers,

  Thomas Krichel                  http://openlib.org/home/krichel
                                              skype:thomaskrichel



More information about the CollEc-run mailing list