[rclis] Julio's data in AMF
Thomas Krichel
krichel at openlib.org
Thu Nov 6 17:31:03 EST 2003
Hi,
this is just to say that there is a first release of
Julio's data (the one that forms the bulk of DoIS)
in AMF, within a Geneva style archive, at
http://wotan.liu.edu/rclis/jul
This has been a struggle for about two weeks to get
it to go. It will have to be tested out in practice
if and when we build a successor/implementation for
DoIS that is based on AMF/XML.
I am still working on converting DBLP to AMF. I have done the conversion
of the journal data (about 1/3 of the total), but the rest,
essentially data on conference papers is still to be done. The problem
here is the representation of conferences. If each conference is own
collection, we have a huge collection of conferences. This is not
problematic (a part from being labor-intensive to maintain) in itself,
but it limits the usefulness of collection level data. On the other
hand, we can try to collection conference series data, since many
conferences are held annually or so. Such classification would give a
lot better subject classification through the work of the conferences,
but it would be more work and needs real expertise to maintain.
In the meantime, I have started working on the Konz project, see
http://rclis.org/internal/konz.html. Progress there has been quite
good. I completed a first implementation in about three weeks working
on this full-time in Novosibirsk. But when I started to run, at the
end of my stay in Siberia, the disk I worked with, based in New
York, crashed. I suspected the problem is that is is too big at
160G, and that the kernel can not see it. Fixing
the disk and the computer has cost me quite a bit of time, it is
still not fully stable. But I now have it at home in my closet,
and I monitor it constantly. I predict that using konz, on the full
DBLP, we will be able to get 30,000 full texts. This is really pretty
good. I don't want to go into more details right now. Konz
is obviously a sophisticated piece of work. Complete DBLP
conversion and konz running on the whole set will
be done, I expect, by the end of the year. Of course, with me
working on my own essentially, on data collection, it will take more
time until we have a really good set. But I will not give up and I am
optimistic that this work will reap great reward.
Other good news is that Google really seems to love portals.
Just look for example at my boss Michael Koenig, last time
I searched for his name on Google, DoIS came right up as the
the first hit. Thus I am sure, once we get the coverage
of DoIS extended will will disseminate quite will if we get
few people to open links to it.
Cheers,
Thomas Krichel mailto:krichel at openlib.org
http://openlib.org/home/krichel
RePEc:per:1965-06-05:thomas_krichel
More information about the rclis
mailing list