[OAI-eprints] Introducing the Subject Categorization
discussion
Steve Hitchcock
sh94r@ecs.soton.ac.uk
Wed, 15 Jan 2003 15:06:26 +0000
At 13:31 15/01/03 +0000, Pauline Simpson wrote:
>Following on from the OAI Geneva meeting - to open the discussion please see
>http://tardis.eprints.org/discussion/
Pauline, A thought-provoking page that helpfully outlines all the
issues. A few points below, but first we need to make a distinction between
works where the full text is not available digitally, and those where it
is. So the question whether there is a need for classification boils down
to: Yes for the former, and (mostly) No for the latter.
By (mostly) I mean let's make it optional. That means, in the case of
institutional repositories of research papers (the latter category), don't
burden the repository with the need to maintain categorization as a core
task. Leave that to services. If it's worth doing, then people will find
the resources to do it, but it must not compromise the task of
repositories, which is to make the texts available.
If full texts are available, we have the chance to automate search and
indexing, say full-text indexing or citation indexing. This is vastly more
powerful and cost-effective, but we have to recognise it is not the same
thing as classification. Full text indexing can begin to tell us what a
text is *about*, rather than simply where it is located, the classical
purpose of classification. Through knowing what a text is about, we can
make connections with other works in ways that are much more flexible than
is offered by classification.
You ask: Can we rely on web search engines like Google to search deeply or
accurately enough?
At the moment, simply, yes. It's not the fault of Google that it can't
index most of the journal literature.
Where I think classification may continue to have a role is in interface
design - you give examples. Classification can inform browsing. This brings
us back to services. Services will produce interfaces. In principle,
repositories do not need to produce user (as opposed to author or
management) interfaces, although in practice there will be few
institutional repositories that will be able to resist doing so, for good
reasons, but again, they don't have to, and it should be optional and minimal.
When you ask if the 'push' scenario should replace harvesting, that's
interesting because it is counter to the framework OAI has put in place.
That is, to reduce the burden on data providers at the expense of service
providers, recognising that we have to make the entry threshold for authors
and repositories as low as possible. That can make it difficult for service
providers, see Liu et al.
http://www.dlib.org/dlib/april01/liu/04liu.html
but overall it probably remains the best approach, especially if
repositories concentrate on optimising the submitted metadata within the
OAI framework.
Steve Hitchcock
Open Citation (OpCit) Project <http://opcit.eprints.org/>
IAM Research Group, Department of Electronics and Computer Science
University of Southampton SO17 1BJ, UK
Email: sh94r@ecs.soton.ac.uk
Tel: +44 (0)23 8059 3256 Fax: +44 (0)23 8059 2865