Title: The Emerging Framework for Scholarly Communication
1The Emerging Framework for Scholarly Communication
- Steve Hitchcock
- The Open Citation Project (OpCit), Southampton
University - These slides prepared for The Future of Journal
Publishing - at Nottingham University, 22 March 2002
- OpCit is a joint JISC-NSF
- International Digital Libraries Project 1999-2002
2Emerging framework the hypothesis Scholarly
electronic information will be seamless and
integrated
3Scholarly electronic information will be
seamless and integrated
- The provable truth, using Google
- seamless integration of information
- 500 results, mostly companies offering network
and inter-application software - seamless access to information
- almost 1000 sites, portals and gateways to the
fore - seamless linking
- 450 sites, leading with journal publishers and
databases - Results based on Google searches November 2001
4What is seamless integration?
- From any given document the user might expect to
be able to retrieve any related document within
one mouse click. - Typically what is related is defined, and linked,
by the author or publisher or other service
provider, and is constrained by the tools and
information services at their disposal. - Longer term the relation may be anything the user
might consider to be related.
5Achieving seamless integration Web services
- Emerging Web services standards are motivated by
the need to connect business processes,
especially databases, across the Web. The basic
platform for Web services is XML plus HTTP,
maintaining the ubiquity and simplicity of the
Web. Web services are based on three mechanisms - to register a service (e.g. Web Service
Definition Language, WSDL) - to find a service (e.g. a registry such as
Universal Description, Discovery, and
Integration, UDDI) - to communicate (e.g. Simple Object Access
Protocol, SOAP) - http//www.w3.org/2002/ws/
- Digital library architectures are evolving to
include Web services-like components, and may
ultimately migrate to these emerging standards
6Is seamless integration possible for the refereed
scholarly literature?
- For scholarly research papers - those destined
for peer reviewed journal publication, by authors
who have no intention of receiving direct payment
for publication for the work they produce - this
prospect raises two subsidiary questions about
the seamlessly integrated literature - Will it be complete (from the viewpoint of every
user)? - Will it be free (or appear to be free)? A work
may appear to be free to the user when it is
accessed via a library, for example. - The refereed scholarly literature will need to be
complete, everywhere, if seamless integration,
even on a modest scale, is to be achieved.
7Progress in libraries
- Site licenses for electronic journals, and more
aggregated content from database services - Alternative journals, e.g. support for the
Scholarly Publishing Academic Resources
Coalition (SPARC), to increase competition in the
journal market by facilitating partnerships with
publishers and other journal producers - Open Archives Initiative, interoperability
standards to facilitate the efficient
dissemination of content - Fast-track standardization of OpenURL, to link
users to these subscription and document
services, recognising this vast new array of
electronic content would need to be accessible
and navigable by users within the librarys
information environment
8Site licences
- By licencing access to bundled collections of
e-journals, libraries can claim to have satisfied
their objective of better value for money in
terms of cost per page delivered to users. - The site from which users access content could
be an institution, a state-wide group of
institutions (e.g. OhioLINK), a national
collective, such as in Canada, or even all the
people of a nation, as in Iceland. The UK has the
National Electronic Site Licence Initiative
(NESLI), which brokers deals between publishers
and participating institutions. - The OhioLINK strategy Enablers rather than
gatekeepers - OhioLINK claims to have overcome the
library-imposed, self-limiting, collection
development mentality of information rationing
that pervades our community. Thomas Sanville,
Executive Director, OhioLINK
9Making appropriate connections
- Site licenses give libraries access to more
journal titles. Another outcome of the serials
crisis is that fewer, non-core journals are
subscribed to and libraries have resorted to
just-in-time document delivery and collections
from licensed full-text aggregators. - Library users may thus have authority to access a
paper free of charge via one library subscription
or another. This has become know as the
appropriate copy problem. - OpenURL is a generalized framework for
communicating and resolving links and supports
software solutions to the appropriate copy
problem. OpenURL is described as an
interoperability specification.
10Syntax of OpenURL
- http//(who you are, where you are, your
institution)/(where you want to go) - A
B
C - An OpenURL is mediated by the HTTP protocol
- BASEURL, data about the user, typically inserted
during transport between servers. One interim
mechanism is to store the BASEURL as a cookie in
the users browser. The cookie identifies the
resolver that provides context-sensitive services
for the user. - QUERY, points to the referenced object, which
might be an identifier, e.g. - Digital Object Identifier (DOI)
- Metadata derived from an authored reference
- Partial metadata - a secondary service identifies
the required document - OpenURL has been proposed as a National
Information Standards Organization (NISO)
standard http//library.caltech.edu/openurl/
11Example OpenURL architecture
- OpenURLs might be based on CrossRefDOI services
- (from Beit-Arie et al., 2001, D-Lib Magazine,
September) http//www.dlib.org/dlib/september01/ca
plan/09caplan.html
12The Open Archives Initiative (OAI)
- The OAI (http//www.openarchives.org/) defines
- A Metadata Harvesting Protocol (MHP), an
application-independent interoperability
framework that can be used by a variety of
communities engaged in publishing content on the
Web - Two classes of participants
- Data providers expose metadata about content
- Service providers issue protocol requests to data
providers - OAI is a very simple, low-barrier-to-entry
interface, shifting implementation complexity and
operational processing load away from the data
repositories to the developers of federated
search services, repository redistribution
services, etc.
13OAI service providers an example
- The Open Citation project interposing an OAI
service provider between document (eprints)
source and user interface
14Creating information interfaces portals
- We have to manage the underlying complexity in
the form of interfaces. Portals have become
important interfaces in the scholarly
environment. Portal strategies - by publishers (e.g. Elseviers ScienceDirect)
- by associated networked information services
(e.g. Ingenta), - by library resource discovery networks (e.g.
JISCs RDN) - have yet to establish a pre-eminent model. This
is because all have concentrated on content,
mostly owned content. The best next-generation
portals will build services on top of content,
and for researchers will become the starting
point for all lines of enquiry.
15Information interfaces RDN example
- JISC RDN is a good example of building on content
to provide new services and adaptable interfaces.
The individual subject networks, in medicine,
engineering, humanities and others, can be
searched as though they were one unified
repository, and an interface presenting users
with this search facility can be embedded in any
library Web page.
Guiding the implementation of these services is
the JISC Information Environment (from Powell and
Lyon 2001) http//www.ukoln.ac.uk/distributed-syst
ems/dner/arch/dner-arch.html
16Multiple cooperating services in the
communication chain
FROM
Documents
User interface
http
Server
Client
- TO
- OpenURL,
- OAI,
- JISC IE
- MEDIATING CONTENT
- Site licenses,
- eprint archives,
- etc.
17Access and interfaces implications for journals
- Digital information, rich in media and resources,
formal and informal, mediated by multiple
services, presents the user with an array of
choices that might answer his or her queries most
efficiently. - Those queries might be expressed as input to a
search engine, or by selecting a link. Where
might these citations come from? Personal emails,
discussion lists, open access services such as
OAI, eprint archives, newsletters, library
services, Z-gateways and academic subject
portals, as well as formal research papers and
commercial indexing services. There will be many
more. - The journal package has traditionally been bound
in issues and volumes. With the advent of
multiple networked sources mediated by services
such as OpenURL, the binding has been unstitched.
18What are digital journals for?
- Journals will be scaled back to the single
essential function of quality control, in the
form of managed peer review - Access to journal contents will be mediated by
multiple interfaces - open access services,
portals and information interfaces, other than
just the journal. - Journals cannot remain the exclusive provider of
peer-reviewed papers
19A post-Google information environment
- Electronic journals exist in a post-Gutenberg and
a post-Google information environment - By March 2001 the Internet Archive had stored 10
billion Web pages (100 terabytes of data) - The ability to locate a specified item of
information precisely and instantly among the
mass of information available on the Web has
profound implications. In the electronic
environment the search engine has become the de
facto interface to information, rather than the
fragmented packages that have migrated from the
print world.
20Building eprint archives
- EPrints.org software for building institutional
eprint archives for author self-archiving - Version 2.0 February 2002
- OAI-compliant
- Free open source software
- Developed at the Electronics and Computer Science
Department, University of Southampton - http//www.eprints.org/
21A maximising strategy for authors
- Authors who self-archive their papers in
OAI-compliant institutional or discipline-based
eprint archives will - Maximise interfaces to their work
- Maximise access to their work
- Maximise impact of their work
22Maximising access arXiv example
- Decreasing citation latencies The latency of the
citation peak has been reducing over the period
of the archive, i.e. each year papers are cited
sooner and more often - Mining the Social Life of an Eprint Archive
http//opcit.eprints.org/tdb198/opcit/
23Maximising impact arXiv example
- More highly cited papers show higher and more
sustained download frequencies - Mining the Social Life of an Eprint Archive
http//opcit.eprints.org/tdb198/opcit/
24Maximising interfaces
- Measuring arXiv access and impact data the Open
Citation project has mined - Usage data from selected arXiv mirror server
logs - Reference lists from 155,000 arXiv papers to
build CiteBase, an open citation database
- CiteBase, a new interface to the refereed
literature http//citebase.eprints.org
25Initiatives promoting open access to scholarly
research papers
- Budapest Open Access Initiative (BOAI), funded
by George Soros' Open Society Institute. Open
access "gives readers extraordinary power to find
and make use of relevant literature, and gives
authors and their works vast and measurable new
visibility, readership, and impact. February
2002, has received almost 1800 signatories to
date - http//www.soros.org/openaccess/read.shtml
- Public Library of Science, scientists urge
publishers to allow the research reports that
have appeared in their journals to be distributed
freely by independent, online public libraries of
science. Open letter March 2001, received almost
30 000 signatories - http//www.publiclibraryofscience.org/
26A dynamic digital archive
- Scientists and researchers, Nobel Laureates among
them, have produced the clearest declaration of
their requirement for access to published
research papers a comprehensive collection that
can be efficiently indexed, searched, and linked - Unimpeded access to these archives and open
distribution of their contents will enable
researchers to take on the challenge of
integrating and interconnecting the fantastically
rich, but extremely fragmented and chaotic,
scientific literature. - Roberts et al. (2001) Science, 23rd March, 2001
http//www.sciencemag.org/cgi/content/full/291/551
2/2318a
27Credits
- The Open Citation project is a collaboration
between Southampton University, Cornell
University and arXiv - The project is lead by Stevan Harnad and Carl
Lagoze - Technical development at Southampton is directed
by Les Carr - EPrints.org software is being developed by Chris
Gutteridge - CiteBase is produced and managed by Tim Brody
- A copy of these slides can be found on the OpCit
Web site - http//opcit.eprints.org/. Look for Papers and
Presentations - Contact Steve Hitchcock sh94r_at_ecs.soton.ac.uk