[OAI-eprints] Self-Archiving and Journal Subscriptions: Flawed Method and No Data

Sun Nov 12 19:31:29 EST 2006

        ** Apologies for Cross-Posting **

    Self-Archiving and Journal Subscriptions: 
    Critique of Publishing Research Consortium Study

    by Stevan Harnad

The following is a critique of:

    Chris Beckett and Simon Inger, Self-Archiving and Journal
    Subscriptions: Co-existence or Competition? An international Survey
    of Librarians' Preferences. Commissioned by the Publishing Research
    Consortium (http://www.publishingresearch.org.uk) from Scholarly
    Information Strategies Ltd (SIS), a scholarly publishing consultancy
    October 2006 http://www.publishingresearch.org.uk/prcweb/PRCWeb.nsf/

    SUMMARY OF CRITIQUE: There is no evidence to date that Open Access (OA)
    self-archiving causes journal cancellations. The Publishing
    Research Consortium commissioned a study of acquisitions librarian
    preferences to see whether they could predict such cancellations
    in the future using a "Share of Preference model," but the study
    has a glaring methodological flaw that invalidates its conclusion
    (that self-archiving will cause cancellations). The study consisted
    of asking librarians which of three hypothetical products -- A, B
    or C -- they preferred least and most, for a variety of hypothetical
    combinations of 6 properties with 3-4 possible values each:
        1. ACCESS DELAY: 24-months, 12-months, 6-months, immediate access
        2. PERCENTAGE OF JOURNAL'S CONTENT: 100%, 80%, 60%, 40%
        3. COST: 100%, 50%, 25%, 0% 
        4. VERSION: preprint, refereed, refereed+copy-edited, published-PDF;
        5. ACCESS RELIABILITY: high, medium, low
        6. JOURNAL QUALITY: high, medium, low
    No mention was made of OA self-archiving (in order to avoid "bias");
    but, as a result, the survey cannot make any prediction at all
    about the effects of self-archiving on cancellations. The questions
    were about relative preferences for *acquisition* among competing
    "products" having different combinations of properties, and it treated
    OA (0% cost) as if it were just one of those product properties. But
    self-archived articles are not products purchased by acquisitions
    librarians: they are papers given away by researchers, anarchically,
    and in parallel. Hence from the survey's "Share of Preference model"
    it is impossible to draw any conclusions about self-archiving
    causing cancellations by librarians, because the librarians were
    never asked what they would cancel, under what conditions; just what
    hypothetical products they would prefer over what. And of course they
    would prefer lower-priced, immediate products over higher-priced,
    delayed products! But if all articles in all journals were
    self-archived, the "Share of Preference model" does not give us
    the slightest clue about what journals librarians would acquire or
    cancel. Nor does it give us a clue as to what they would do between
    now (c. 15% self-archiving) and then (100% self-archiving). The
    banal fact that everyone would rather have something for free rather
    than paying for it certainly does not answer this question, or fill
    the gaping evidential gap about the existence, size, or timing of any
    hypothetical effect of self-archiving on cancellations. Nor does
    the study's one nontrivial finding: that librarians don't much
    care about the difference between a refereed author's draft and
    a published-PDF. (Let us hope that this study will be the last futile
    attempt to treat research as if it were done in order to generate or
    protect journal revenues. Even if valid evidence should eventually
    emerge that OA self-archiving does cause journal cancellations,
    it would be for the publishing community to adapt to that new
    reality, not for the research community to abstain from it, and
    its obvious benefits to research, researchers, their institutions,
    their funders, and the tax-paying public that funds the funders and
    for whose benefit the research is conducted.)
    (http://www.scholinfo.com).

Because there has so far been no detectable correlation between author
self-archiving and journal cancellations, the Publishing Research
Consortium commissioned a survey of acquisition librarians' preferences
and attitudes about a number of hypothetical alternatives. From the
responses a theoretical model was constructed, which predicted
cancellations as more self-archived content becomes available. How did
the study arrive at this prediction without any actual cancellation
data? 

The prediction was based on a rather simple methodological flaw:
Librarians were given a series of hypothetical choices, each a choice
among three hypothetical "products," A, B and C. The librarians were
asked to pick which of the three product options they would prefer most
and least. Each hypothetical product option consisted of a complicated
combination of six properties out of 3-4 possible values per property.

Presenting this array of hypothetical product options as choices to
acquisition librarians (apart from being highly complicated and highly
hypothetical, with many hidden assumptions) is specious, for among the
potential properties of the hypothetical "product" options was the
property that some of the options were free.

But a free self-archived journal article is not a product: It is not
something that an acquisitions librarian decides whether or not to
acquire. Open Access (OA) is not a product-*acquisition* issue at all:
At best (or worst) its a product *cancellation* issue.

Hence the only credible and direct hypothetical question one could
have asked librarians about self-archived journal articles (and even
then there would be no guarantee that librarians would actually do as
they predicted they would do under the hypothetical conditions) would
be about the circumstances under which they think they would *cancel*
existing journals:

    "Would you cancel journal X if 100% of its articles were accessible
    free online (80%? 60%? 40%?)? If they were accessible immediately
    (after 6 months? 12? 24?)?"

And even that question is laden with highly speculative and even
indeterminate assumptions: How could librarians (or anyone) *know* what
percentage of a journal was accessible for free, self-archived, for any
particular journal? 

And what about interactions between journal X and journal Y? (How to
spend a given acquisitions budget -- what to acquire and what to cancel
-- is presumably a comparative decision, and we are asking about the
keep/cancel trade-offs.)

But what if 60% of *all* journals were free online (immediately? after
12 months?)?  (Acquisition/cancellation decisions today are largely
competitive ones: X gets cancelled in favour of Y. The rules of this
trade-off game would presumably change if all journals were roughly on
a par for their percentage of freely available online content or the
length of the delay before it is freely available.)

Straightforward questions on what a librarian predicts they would cancel
(in favour of what) under what hypothetical conditions (and how those
conditions could be ascertained) might possibly have some weak
predictive value. But such straightforward questions are not what
this series of questions about preferences among hypothetical "product
options" asked.

[Even straightforward hypothetical answers to straightforward hypothetical
questions may not have any predictive value if the hypotheses are
far-fetched or unfamiliar enough, if they have hidden or incoherent
assumptions: I frankly don't believe there is a librarian alive who has
a clue as to what they would keep or cancel if the self-archived versions
of all journal articles were suddenly available free online today -- let
alone what they would do as all journal contents gradually approached
100% availability, at various (uncertain) speeds, from a trajectory
of increasing (but uncertain) free content (40% to 60% to 80%) and/or
decreasing delay (24 months to 12 months to 6 months).]

And that's without mentioning intangibles such as any continuing demand
for the paper edition, etc., nor how librarians could know the
percentages available, how quickly the percentages would grow, and at
what relative rate they would grow among more and less important
journals, more and less expensive journals.

But it was not even these straightforward, if highly speculative,
questions that were asked of librarians in this survey. Instead, they were
asked to pick the most and least favoured option among three hypothetical
"products," A, B and C, with a variety of complicated combinations of
6 hypothetical properties, which could each take 3-4 values:

    1. ACCESS DELAY: 24-months, 12-months, 6-months, immediate access
    2. PERCENTAGE OF JOURNAL'S CONTENT: 100%, 80%, 60%, 40%
    3. COST: 100%, 50%, 25%, 0% 
    4. VERSION: preprint, refereed, refereed+copy-edited, published-PDF; 
    5. ACCESS RELIABILITY: high, medium, low
    6. JOURNAL QUALITY: high, medium, low

In each case, products A, B and C were given some combination of the
values on properties 1-6, and the librarian had to choose which of the 3
combinations they most and least preferred.