[OAI-eprints] Introducing the Subject Categorization discussion

Jessie Hey jmnh@ecs.soton.ac.uk
Mon, 20 Jan 2003 16:46:18 +0000


<html>
Another comment from Chris Gutteridge on Google and harvesting stemming
from our discussion.<br>
Jessie<br><br>
<blockquote type=cite class=cite cite>Date: Fri, 17 Jan 2003 15:31:14
+0000<br>
From: ePrints Support &lt;eprints-support@ecs.soton.ac.uk&gt;<br>
To: EPrints Underground List
&lt;eprints-underground@ecs.soton.ac.uk&gt;<br>
Subject: Re: [EP-underground] Re: Interoperability - subject
classification/terminology (fwd)<br>
User-Agent: Mutt/1.2.5.1i<br>
X-ECS-MailScanner: Found to be clean, Found to be clean<br>
Sender: owner-eprints-underground@ecs.soton.ac.uk<br>
Reply-To: EPrints Underground List
&lt;eprints-underground@ecs.soton.ac.uk&gt;<br><br>
There are some problems to google searches for certain things,
however.<br><br>
One of which that the smartest the google can get (currently) is
does<br>
it contain word &quot;X&quot; but many words have multiple meanings, or
have different<br>
subject area-specific meanings. One good example is that &quot;wave&quot;
means something<br>
utterly different to <br>
&nbsp;* physics - energy wave, raditation etc.<br>
&nbsp;* oceanography - waves on the ocean<br>
&nbsp;* combat tactics - attack waves<br>
Probably more. This is, I believe, the primary argument for subject 
<br>
classification, and also, possibly, the ability to browse or get
updates<br>
on items in your field of interest.<br><br>
The interesting question is do we expect/need OAI harvesters that
can<br>
harvest just history, or just art? And if so to what
granularity?<br><br>
On Wed, Jan 15, 2003 at 09:36:45 +0000, Stevan Harnad wrote:<br>
&gt; <br>
&gt; <br>
&gt; ---------- Forwarded message ----------<br>
&gt; Date: Wed, 15 Jan 2003 15:06:26 +0000<br>
&gt; From: Steve Hitchcock &lt;sh94r@ecs.soton.ac.uk&gt;<br>
&gt; To: Pauline Simpson &lt;ps@soc.soton.ac.uk&gt;,
OAI-eprints@fafner.openlib.org<br>
&gt; Subject: Re: Interoperability - subject
classification/terminology<br>
&gt; <br>
&gt; At 13:31 15/01/03 +0000, Pauline Simpson wrote:<br>
&gt; <br>
&gt; &gt;Following on from the OAI Geneva meeting&nbsp; - to open the
discussion&nbsp; please see<br>
&gt;
&gt;<a href="http://tardis.eprints.org/discussion/" eudora="autourl">http://tardis.eprints.org/discussion/</a><br>
&gt; <br>
&gt; Pauline, A thought-provoking page that helpfully outlines all the
<br>
&gt; issues. A few points below, but first we need to make a distinction
between <br>
&gt; works where the full text is not available digitally, and those
where it <br>
&gt; is. So the question whether there is a need for classification boils
down <br>
&gt; to: Yes for the former, and (mostly) No for the latter.<br>
&gt; <br>
&gt; By (mostly) I mean let's make it optional. That means, in the case
of <br>
&gt; institutional repositories of research papers (the latter category),
don't <br>
&gt; burden the repository with the need to maintain categorization as a
core <br>
&gt; task. Leave that to services. If it's worth doing, then people will
find <br>
&gt; the resources to do it, but it must not compromise the task of 
<br>
&gt; repositories, which is to make the texts available.<br>
&gt; <br>
&gt; If full texts are available, we have the chance to automate search
and <br>
&gt; indexing, say full-text indexing or citation indexing. This is
vastly more <br>
&gt; powerful and cost-effective, but we have to recognise it is not the
same <br>
&gt; thing as classification. Full text indexing can begin to tell us
what a <br>
&gt; text is *about*, rather than simply where it is located, the
classical <br>
&gt; purpose of classification. Through knowing what a text is about, we
can <br>
&gt; make connections with other works in ways that are much more
flexible than <br>
&gt; is offered by classification.<br>
&gt; <br>
&gt; You ask: Can we rely on web search engines like Google to search
deeply or <br>
&gt; accurately enough?<br>
&gt; <br>
&gt; At the moment, simply, yes. It's not the fault of Google that it
can't <br>
&gt; index most of the journal literature.<br>
&gt; <br>
&gt; Where I think classification may continue to have a role is in
interface <br>
&gt; design - you give examples. Classification can inform browsing. This
brings <br>
&gt; us back to services. Services will produce interfaces. In principle,
<br>
&gt; repositories do not need to produce user (as opposed to author or
<br>
&gt; management) interfaces, although in practice there will be few 
<br>
&gt; institutional repositories that will be able to resist doing so, for
good <br>
&gt; reasons, but again, they don't have to, and it should be optional
and minimal.<br>
&gt; <br>
&gt; When you ask if the 'push' scenario should replace harvesting,
that's <br>
&gt; interesting because it is counter to the framework OAI has put in
place. <br>
&gt; That is, to reduce the burden on data providers at the expense of
service <br>
&gt; providers, recognising that we have to make the entry threshold for
authors <br>
&gt; and repositories as low as possible. That can make it difficult for
service <br>
&gt; providers, see Liu et al.<br>
&gt;
<a href="http://www.dlib.org/dlib/april01/liu/04liu.html" eudora="autourl">http://www.dlib.org/dlib/april01/liu/04liu.html</a><br>
&gt; but overall it probably remains the best approach, especially if
<br>
&gt; repositories concentrate on optimising the submitted metadata within
the <br>
&gt; OAI framework.<br>
&gt; <br>
&gt; Steve Hitchcock<br>
&gt; Open Citation (OpCit) Project
&lt;<a href="http://opcit.eprints.org/" eudora="autourl">http://opcit.eprints.org/</a>&gt;<br>
&gt; IAM Research Group, Department of Electronics and Computer
Science<br>
&gt; University of Southampton SO17 1BJ,&nbsp; UK<br>
&gt; Email: sh94r@ecs.soton.ac.uk<br>
&gt; Tel:&nbsp; +44 (0)23 8059 3256&nbsp;&nbsp;&nbsp;&nbsp; Fax: +44
(0)23 8059 2865<br>
&gt; <br>
&gt; <br>
&gt; _______________________________________________<br>
&gt; OAI-eprints mailing list<br>
&gt; OAI-eprints@lists.openlib.org<br>
&gt;
<a href="http://lists.openlib.org/mailman/listinfo/oai-eprints" eudora="autourl">http://lists.openlib.org/mailman/listinfo/oai-eprints</a><br><br>
-- <br><br>
&nbsp;Christopher
Gutteridge&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
eprints-support@ecs.soton.ac.uk<br>
&nbsp;ePrints2 Coder, Support and
Stuff&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; +44 23 8059
4833</blockquote>
<x-sigsep><p></x-sigsep>
~~~~~~~~<br>
Jessie M.N. Hey&nbsp;&nbsp;&nbsp;&nbsp;  <br>
Research Fellow TARDIS eprints project,<br>
NOL, University of Southampton Waterfront Campus, European Way,<br>
Southampton, SO14 3ZH, England<br>
Tel: +44 (0)23 8059 6112&nbsp; Fax +44 (0)23 8059 6115<br>
&nbsp;<a href="http://tardis.eprints.org/" eudora="autourl"><font color="#0000FF"><u>http://tardis.eprints.org/</a><br><br>
</font></u></html>