[OAI-eprints] Interoperability - subject classification/terminology

Stevan Harnad harnad@ecs.soton.ac.uk
Fri, 22 Nov 2002 19:39:23 +0000 (GMT)


I would like to raise a query on a point of information regarding the
problem of subject classification for University Eprint Archives.

Let us first clarify a potential point of misunderstanding. There are
(at least) two ways to think of University Eprint Archives, both of
them important and valid, but most decidedly not both the same. Hence
conflating these two aspects of Institutional Archiving and assuming one
size shoe is needed for both risks creating podiatric problems for both!

(1) The University Eprint Archive as the University Digital Library --
or, more specifically, the University Digital Library for all of the
University's own scholarly, scientific and pedagogic output. (This
includes journal articles, books, teaching materials, and any other
digital content the University produces and wishes to include in its
Eprint Archive.)

There is no question whatsoever that a rigorous system of classification
and tagging -- to make such a total university digital output navigable,
and integrable and interoperable with corresponding digital output from
other universities, in similar University Eprint Archives -- is extremely
important to have, indeed a prerequisite for the usefulness and usability
of such an Archive.

(2) The University Eprint Archive as a means of providing open access
to all of the university's peer-reviewed research output (before and
after peer review). Almost without exception, this is the work that
also appears in the peer-reviewed journals sooner or later (indeed,
that is how it gets peer-reviewed).

It should be clear that (2) is a very special subset of (1). But
it should be equally clear that that special subset does not have any
particular or pressing classification problem! These are not books. They
are journal articles. Our journal articles are not indexed in our
university library card catalogues (only the journals in which they appear
are). When we want to search the journal literature, we do not look
to any university classification system, we go to indexing services
such as INSPEC, MEDLINE, ISI, etc. (These have their own classification
systems, but I am willing to bet that for this corpus not one of those
can beat google-style boolean search on an inverted full-text index,
especially if aided by citation-frequency, hit-based, recency-based,
or relevance-based ranking of search output, as done, for example,
by http://citebase.eprints.org/help/index.php ).

I think it is extremely important to make it crystal clear that the
peer-reviewed research corpus -- and those University Eprint Archives
for which this is the main target literature at this time -- do not have
a classification problem, and need not and should not wait for any
solution to any classification problem before getting on with the
infinitely more pressing task of filling those archives with their
university's research output!

Now some specific comments and queries:

On Fri, 22 Nov 2002, Pauline Simpson wrote:

> At the OAI Geneva I undertook to do the following:
> 2. Investigate OAI and OAF email archives for prior discussion and
> synthesize
> 3. Open the discussion with the intention of constructing a model/s to
> address perceived problems. We will need a statement of the problem/s and
> suggested solutions (some already articulated on Saturday)

I was unfortunately unable to attend the Geneva OAI Meeting, so I would
like to address a question to Pauline:

Are the perceived problems in question the classification problems of
University Eprint Archives conceived in sense (1), i.e. as university
digital libraries for all university scholarly  and pedagogic output?
or conceived in sense (2), i.e., as a means of providing open access to
university research output?

And if the two were not distinguished formally and explcitily in this
way, was it made clear to all concerned at least informally that the
classification problem applies only to (1) and not to (2)?

> At present we have completed item 2 and and are now compiling a table of
> all e-Print archives (that we can find!) with an annotation of what subject
> classification they 'appear' to be using  :  LOC;  DDC;  In House
> Classification (possibly based on LOC or another);  In House
> terminology;  Faculty/Dept/Group; None.

Again, I wonder whether you could make it clear what the objective of
this exercise would be for those University Eprint Archives that have
been created exclusively, or primarily, to provide open access to
university research output (i.e., 2), hence having no need whatsoever
to adopt or use any classification system?

> I believe this evidence gathering
> exercise will be a worthwhile tool in our deliberations concerning
> harvesting and interoperability between institutional and discipline based
> e-Print archives.

Again, I think it would be immensely helpful, and would help both agenda
(1) and agenda (2) along their respective paths if the two agendas were
clearly distinguished and it were made clear that the classification
problem pertains exclusively to agenda (1).

[Let me add that agenda (1) (the university digital output library) is
very important and very worth pursuing; it is also an extremely valuable
collaborator to agenda (2) (open access to peer-reviewed research through
institutional self-archiving), but only if the two agendas facilitate
rather than restrain one another -- as any implication that agenda (2)
has classification problems to solve would most definitely do.]

Stevan Harnad

NOTE: A complete archive of the ongoing discussion of providing open
access to the peer-reviewed research literature online is available at
the American Scientist September Forum (98 & 99 & 00 & 01 & 02):

    http://amsci-forum.amsci.org/archives/september98-forum.html
                            or
    http://www.cogsci.soton.ac.uk/~harnad/Hypermail/Amsci/index.html

Discussion can be posted to: september98-forum@amsci-forum.amsci.org 

See also the Budapest Open Access Initiative:
    http://www.soros.org/openaccess

the Free Online Scholarship Movement:
    http://www.earlham.edu/~peters/fos/timeline.htm

the SPARC position paper on institutional repositories:
    http://www.unites.uqam.ca/src/sante.htm

the OAI site:
    http://www.openarchives.org

and the free OAI institutional archiving software site:
    http://www.eprints.org/