[ArchEc] start of Lebach work
Thomas Krichel
krichel at openlib.org
Thu May 20 14:36:20 UTC 2021
I just started to write a test WARC. Here is the start of a
test file.
WARC/1.0^M
WARC-Type: warcinfo^M
WARC-Record-ID: <urn:uuid:1baaba9e-b976-11eb-aed6-901b0ef71694>^M
WARC-Date: 2021-05-20T14:17:28Z^M
Content-Type: application/warc-fields^M
Content-Length: 232^M
^M
operator: Thomas Krichel <krichel at openlib.org>
funder: Fondation Banque de France
project: Lebach, http://governance.repec.org/applications/lebach.docx
conformsTo: http://bibnum.bnf.fr/WARC/WARC_ISino_28500_version1_latestdraft.pdf
^M
^M
WARC/1.0^M
WARC-Type: resource^M
WARC-Target-URI: file:///RePEc/aah/aarhec/aarhec1988.rdf^M
WARC-Date: 2004-04-01T21:21:24Z^M
WARC-Record-ID: <urn:uuid:1baabcec-b976-11eb-aed6-901b0ef71694>^M
Content-Type: application/octet-stream^M
WARC-Block-Digest: sha1:4SQLBS5JEULWYJ7JUJEO5XXCFL5FS7XM^M
Content-Length: 4683^M
^M
Template-Type: ReDIF-Paper 1.0^M
Title: TWO PAPERS ON THE TEST OF LUCAS VARIABILITY HYPOTHESIS.^M
Author-Name: CHRISTENSEN, M.^M
Author-Name: PALDAM, M.^M
Keywords: tests ; supply ; economic theory ; demand^M
Overall this is looking good.
Files can be stored as resource records. The URI is the file
starting with RePEc. Sure this is not an absolute file name
but I don't think we need to be that pedantic. The time
on the resource is the time in the tarball that I have. I will take
care to also archive files with a ~ ending as if they are versions
of the file without the tilda.
The UUID is the same, I still have to find out why.
I intend to add the tarball date to the warcinfo fields.
The idea is to have on file per RePEc archive. Later, we will be
able to run this on a daily bases.
Comments on these choices are very welcome. A bad policy now
will be hard to undo!
I have written to Olaf and Jan about the need for me to have more
disk space. While I hope this project will save disk space it's
not enough. The problem is that darni is 95% full.
--
Cheers,
Thomas Krichel http://openlib.org/home/krichel
skype:thomaskrichel
More information about the ArchEc-run
mailing list