[OAI-eprints] Introducing the Subject Categorization discussion

Steve Hitchcock sh94r@ecs.soton.ac.uk
Wed, 15 Jan 2003 15:06:26 +0000


At 13:31 15/01/03 +0000, Pauline Simpson wrote:
>Following on from the OAI Geneva meeting  - to open the discussion  please see
>http://tardis.eprints.org/discussion/

Pauline,        A thought-provoking page that helpfully outlines all the 
issues. A few points below, but first we need to make a distinction between 
works where the full text is not available digitally, and those where it 
is. So the question whether there is a need for classification boils down 
to: Yes for the former, and (mostly) No for the latter.

By (mostly) I mean let's make it optional. That means, in the case of 
institutional repositories of research papers (the latter category), don't 
burden the repository with the need to maintain categorization as a core 
task. Leave that to services. If it's worth doing, then people will find 
the resources to do it, but it must not compromise the task of 
repositories, which is to make the texts available.

If full texts are available, we have the chance to automate search and 
indexing, say full-text indexing or citation indexing. This is vastly more 
powerful and cost-effective, but we have to recognise it is not the same 
thing as classification. Full text indexing can begin to tell us what a 
text is *about*, rather than simply where it is located, the classical 
purpose of classification. Through knowing what a text is about, we can 
make connections with other works in ways that are much more flexible than 
is offered by classification.

You ask: Can we rely on web search engines like Google to search deeply or 
accurately enough?

At the moment, simply, yes. It's not the fault of Google that it can't 
index most of the journal literature.

Where I think classification may continue to have a role is in interface 
design - you give examples. Classification can inform browsing. This brings 
us back to services. Services will produce interfaces. In principle, 
repositories do not need to produce user (as opposed to author or 
management) interfaces, although in practice there will be few 
institutional repositories that will be able to resist doing so, for good 
reasons, but again, they don't have to, and it should be optional and minimal.

When you ask if the 'push' scenario should replace harvesting, that's 
interesting because it is counter to the framework OAI has put in place. 
That is, to reduce the burden on data providers at the expense of service 
providers, recognising that we have to make the entry threshold for authors 
and repositories as low as possible. That can make it difficult for service 
providers, see Liu et al.
http://www.dlib.org/dlib/april01/liu/04liu.html
but overall it probably remains the best approach, especially if 
repositories concentrate on optimising the submitted metadata within the 
OAI framework.

Steve Hitchcock
Open Citation (OpCit) Project <http://opcit.eprints.org/>
IAM Research Group, Department of Electronics and Computer Science
University of Southampton SO17 1BJ,  UK
Email: sh94r@ecs.soton.ac.uk
Tel:  +44 (0)23 8059 3256     Fax: +44 (0)23 8059 2865