[rclis] Re: geneva document

Thomas Krichel rclis@lists.openlib.org
Thu, 31 Oct 2002 16:05:42 -0600


  I asked Ivan Kurmanov for comments on the Geneva document. 
  I did well to do so. He came back with some highly perceptive
  comments. They make for rather depressing reading. 
  They show that we will have to do some more homework on
  this. 

  I am very grateful for these comments. 

  Some comments by him have been included in the latest version
  of the document. For the benefit of those who have further 
  interest, here is my response to Ivan. 

> So I read it.  Well I see some interesting developments you are doing.

  Thanks, actually your compliments run out quickly.

> Many comments I have, not many of them are important.  I address my
> comments to you, but in fact they should be addressed to all the
> authors of the doc.

  Done here. 
 
> The text is quite boring and not easy to follow, sorry.  At paragraph
> 25 I was already falling asleep.  I know I was able to grasp its ideas
> because I know what is RePEc, handle, AMF, AMF nouns, OAI.  But if I
> didn't know these things, I would be completely lost before reaching
> the middle point.  I don't know what SICI is, though.  (And text
> doesn't help me to find that out.)

  Yes, I added a paragraph with links to AMF and OAI, and RePEc. 
  The SICI recommendation will have to be further tested out. 

> In para 12 you say $archive_handle is something like
> /^rclis:[a-z]{3}:arch$/.  I don't like the idea of introducing data
> type information (the ":arch" part) into identifiers, but that doesn't
> worry me too much.  

  I think we can not have the :arch: there, removed. 

> What worries me is what you say later (para 16): series id should be
> /^$archive_handle:seri:[a-z]{6}$/.  From the above I directly conclude
> that sample series id will look like rclis:???:arch:seri:??????.  Is
> that what you mean? (that's para 16.)  In para 17 you use
> $series_handle, which you didn't define anywhere.  Why not say simply
> "series handle"?  (Same thing with $archive_handle abused to my mind
> all over the text.)

  This is a severe mistake. 

> Also this means that each file archive of rclis must have a file named
> "rclis:???:arch.amf.xml".  This is long and colon is a forbidden
> character for filenames on some filesystems (including Windows).  See
> para 21.

  The issue of the forbidden character is a problem that I have
  no answer for. This needs further reflection. 

> Reading para 23 I get an impression that persons will be identified
> with a date only.  At the second look I notice the meaningful absence
> of the dollar sign at the end of the regular expression.  So I have
> idea that something will follow it, but other readers may not.  

  I appended [a-z_]+$ to make this more explicit. 


> Also note a typo there in "is a valid data", where you mean date.
> 
> Para 25 is very difficult.  It says:  
> 
>             A special archive rclis:can will build channel data. This
>             archive will keep authoritative data for an archive and
>             link it to the series that describe it. 
> 
>  may be, data for the channel?
> 
> 	    The channel archive may link a channel to zero or more
> 	    series. 
> 
>  may be, the channel *record* may link a channel to zero or more series?
> 
>     	    The
>             keeper of the archive serves as a general authority of all
>             document data. 
> 
>  That is unclear. Do you mean the keeper of rclis:cha:chan archive or
>  what?  If yes, then what exactly does "general authority" mean?
> 
> 	    While archives are free to propose any
>             document data that they wish to, it is likely that only
>             the series that are linked to a channel will be used by
>             user services. Thus, rclis:can serves as a clearing
>             house. Here is a fictions example that shows data from two
> 
>  fictitious
> 
>             series to form the data for the channel. Deduplification
>             will have to be handled at the level of the series, with
>             rclis:can to play a crucial intermediating rule.

  I have corrected some of the language and split the paragraph
  in two.

> Para 27 says that three-letter archive code will be dropped in the
> blessed handles, together with the four-letter type code.  How do you 
> ensure blessed handles' uniqueness then?

  This will have to be done with the aid of the channel data. The
  channel data must be kept in such a way that duplificaiton 
  does not occur. 

> Not much importance:
> 
> para 3: "we have a structure of archives, ..." Actually you list types
> of data records, but you do not describe (here) the *structure* of
> relis.

 I fixed this to make this distinction explicit. 

> 
> para 5: the table's frame should be made visible, because otherwise it
> looks confusing.  It is good that I know that collection, text and so
> on are AMF nouns, but a table header wouldn't hurt other readers.

  done 

> 
> para 13: "All file archives live on a single directory." what about
> another way to put it: "A file archive lives on a single
> web-accessible directory."?  then: "Some files within this directory
> that contain embedded AMF data." word 'that' here is redundant.  Then:
> "... regular expression /amf.xml$/."  1st, you may want to add a dot
> before amf.  2nd, a dot in perl regular expression means any char,
> unless you quote it with a backslash.  So i would write it like this:
> "/\.amf\.xml$/".

  Fixed.
> 
> para 23: how are you going to ensure that a date is in the person's
> lifetime?  I would suggest to drop that out.

  I changed this to "should" 

>  Also you may want to
> specify that "the numeric expression is a valid date" must be of the
> form YYYY-MM-DD, just to be precise.

  I think this is sufficently explicit. 


> para 24: regular expression for channels says:
> "/^rclis:cha:chan:[0-9]{6}$/". So it only allows digits in the latter
> part of the handle?

  Yes, to make it distinct from teh series handle and to encourage
  cryptivity. 


> 
> para 26: "Several archives may create and managed group descriptions."
> A typo: manage.

  fixed.

> 
> para 27: "natural component". what's the nature of the term?  Any
> reason to call it natural or just a nice word?

  just a nice word.

> 
> para 28: "blessed handle". same question, what's the nature of the
> term?

  just a nice word, I think. the only bit of artistry in the 
  document.

> 
> 
> General comments, suggestions:
> 
> I would suggest to use "rclis:pers:" instead of "rclis:per:pers:",
> "rclis:orga:" instead of "rclis:org:orga:".  This is if you plan to
> collect/maintain this data centrally.

  Yes, but we may change our minds on this. So, we have a structure
  that hindes the internal handling from the external. 

> I still doubt that any AMF-based project will lift up, because of AMF
> being difficult to write manually.  I believe ReDIF is much more
> convenient for that.

  I doubt about your doubts. 

> 
> Do you really expect any OAI archive to participate in rclis? 

  Yes, e-lis will be an archive using the eprints software. 


> While strict and detailed specifications are necessary in cooperative
> developments, where many people try to conform to a single standard, I
> think this is not very much our case.  I think one of the lessons of
> RePEc is that we much more need easy-to-read and easy-to-understand,
> example-based gradually-involving tutorials then strict
> formally-written specifications.

  Yes. But this is still at the pre-implementation stage.

  Thanks again!!



  Cheers,

  Thomas Krichel                                   mailto:krichel@openlib.org
                                              http://openlib.org/home/krichel
                                          RePEc:per:1965-06-05:thomas_krichel