[OAI-eprints] Introducing the Subject Categorization discussion

Another comment from Chris Gutteridge on Google and harvesting stemming
from our discussion.
There are some problems to google searches for certain things,
One of which that the smartest the google can get (currently) is
it contain word &quot;X&quot; but many words have multiple meanings, or
have different<br>
subject area-specific meanings. One good example is that &quot;wave&quot;
means something<br>
utterly different to <br>
&nbsp;* physics - energy wave, raditation etc.<br>
&nbsp;* oceanography - waves on the ocean<br>
&nbsp;* combat tactics - attack waves<br>
Probably more. This is, I believe, the primary argument for subject 
classification, and also, possibly, the ability to browse or get
on items in your field of interest.<br><br>
The interesting question is do we expect/need OAI harvesters that
harvest just history, or just art? And if so to what
&gt; Pauline, A thought-provoking page that helpfully outlines all the
&gt; issues. A few points below, but first we need to make a distinction
between <br>
&gt; works where the full text is not available digitally, and those
where it <br>
&gt; is. So the question whether there is a need for classification boils
down <br>
&gt; to: Yes for the former, and (mostly) No for the latter.<br>
&gt; <br>
&gt; By (mostly) I mean let's make it optional. That means, in the case
of <br>
&gt; institutional repositories of research papers (the latter category),
don't <br>
&gt; burden the repository with the need to maintain categorization as a
core <br>
&gt; task. Leave that to services. If it's worth doing, then people will
find <br>
&gt; the resources to do it, but it must not compromise the task of 
&gt; repositories, which is to make the texts available.<br>
&gt; <br>
&gt; If full texts are available, we have the chance to automate search
and <br>
&gt; indexing, say full-text indexing or citation indexing. This is
vastly more <br>
&gt; powerful and cost-effective, but we have to recognise it is not the
same <br>
&gt; thing as classification. Full text indexing can begin to tell us
what a <br>
&gt; text is *about*, rather than simply where it is located, the
classical <br>
&gt; purpose of classification. Through knowing what a text is about, we
can <br>
&gt; make connections with other works in ways that are much more
flexible than <br>
&gt; is offered by classification.<br>
&gt; <br>
&gt; You ask: Can we rely on web search engines like Google to search
deeply or <br>
&gt; accurately enough?<br>
&gt; <br>
&gt; At the moment, simply, yes. It's not the fault of Google that it
can't <br>
&gt; index most of the journal literature.<br>
&gt; <br>
&gt; Where I think classification may continue to have a role is in
interface <br>
&gt; design - you give examples. Classification can inform browsing. This
brings <br>
&gt; us back to services. Services will produce interfaces. In principle,
&gt; repositories do not need to produce user (as opposed to author or
&gt; management) interfaces, although in practice there will be few 
&gt; institutional repositories that will be able to resist doing so, for
good <br>
&gt; reasons, but again, they don't have to, and it should be optional
and minimal.<br>
&gt; <br>
&gt; When you ask if the 'push' scenario should replace harvesting,
that's <br>
&gt; interesting because it is counter to the framework OAI has put in
place. <br>
&gt; That is, to reduce the burden on data providers at the expense of
service <br>
&gt; providers, recognising that we have to make the entry threshold for
authors <br>
&gt; and repositories as low as possible. That can make it difficult for
service <br>
&gt; providers, see Liu et al.<br>
<a href="http://www.dlib.org/dlib/april01/liu/04liu.html" eudora="autourl">http://www.dlib.org/dlib/april01/liu/04liu.html</a><br>
&gt; but overall it probably remains the best approach, especially if
&gt; repositories concentrate on optimising the submitted metadata within
the <br>
&gt; OAI framework.<br>
&gt; <br>
