The Emerging Framework for Scholarly Communication - PowerPoint PPT Presentation

About This Presentation
Title:

The Emerging Framework for Scholarly Communication

Description:

Personal emails, discussion lists, open access services such as OAI, eprint ... Reference lists from 155,000 arXiv papers to build CiteBase, an open citation ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 28
Provided by: stevehi3
Category:

less

Transcript and Presenter's Notes

Title: The Emerging Framework for Scholarly Communication


1
The Emerging Framework for Scholarly Communication
  • Steve Hitchcock
  • The Open Citation Project (OpCit), Southampton
    University
  • These slides prepared for The Future of Journal
    Publishing
  • at Nottingham University, 22 March 2002
  • OpCit is a joint JISC-NSF
  • International Digital Libraries Project 1999-2002

2
Emerging framework the hypothesis Scholarly
electronic information will be seamless and
integrated

3
Scholarly electronic information will be
seamless and integrated
  • The provable truth, using Google
  • seamless integration of information
  • 500 results, mostly companies offering network
    and inter-application software
  • seamless access to information
  • almost 1000 sites, portals and gateways to the
    fore
  • seamless linking
  • 450 sites, leading with journal publishers and
    databases
  • Results based on Google searches November 2001

4
What is seamless integration?
  • From any given document the user might expect to
    be able to retrieve any related document within
    one mouse click.
  • Typically what is related is defined, and linked,
    by the author or publisher or other service
    provider, and is constrained by the tools and
    information services at their disposal.
  • Longer term the relation may be anything the user
    might consider to be related.

5
Achieving seamless integration Web services
  • Emerging Web services standards are motivated by
    the need to connect business processes,
    especially databases, across the Web. The basic
    platform for Web services is XML plus HTTP,
    maintaining the ubiquity and simplicity of the
    Web. Web services are based on three mechanisms
  • to register a service (e.g. Web Service
    Definition Language, WSDL)
  • to find a service (e.g. a registry such as
    Universal Description, Discovery, and
    Integration, UDDI)
  • to communicate (e.g. Simple Object Access
    Protocol, SOAP)
  • http//www.w3.org/2002/ws/
  • Digital library architectures are evolving to
    include Web services-like components, and may
    ultimately migrate to these emerging standards

6
Is seamless integration possible for the refereed
scholarly literature?
  • For scholarly research papers - those destined
    for peer reviewed journal publication, by authors
    who have no intention of receiving direct payment
    for publication for the work they produce - this
    prospect raises two subsidiary questions about
    the seamlessly integrated literature
  • Will it be complete (from the viewpoint of every
    user)?
  • Will it be free (or appear to be free)? A work
    may appear to be free to the user when it is
    accessed via a library, for example.
  • The refereed scholarly literature will need to be
    complete, everywhere, if seamless integration,
    even on a modest scale, is to be achieved.

7
Progress in libraries
  • Site licenses for electronic journals, and more
    aggregated content from database services
  • Alternative journals, e.g. support for the
    Scholarly Publishing Academic Resources
    Coalition (SPARC), to increase competition in the
    journal market by facilitating partnerships with
    publishers and other journal producers
  • Open Archives Initiative, interoperability
    standards to facilitate the efficient
    dissemination of content
  • Fast-track standardization of OpenURL, to link
    users to these subscription and document
    services, recognising this vast new array of
    electronic content would need to be accessible
    and navigable by users within the librarys
    information environment

8
Site licences
  • By licencing access to bundled collections of
    e-journals, libraries can claim to have satisfied
    their objective of better value for money in
    terms of cost per page delivered to users.
  • The site from which users access content could
    be an institution, a state-wide group of
    institutions (e.g. OhioLINK), a national
    collective, such as in Canada, or even all the
    people of a nation, as in Iceland. The UK has the
    National Electronic Site Licence Initiative
    (NESLI), which brokers deals between publishers
    and participating institutions.
  • The OhioLINK strategy Enablers rather than
    gatekeepers
  • OhioLINK claims to have overcome the
    library-imposed, self-limiting, collection
    development mentality of information rationing
    that pervades our community. Thomas Sanville,
    Executive Director, OhioLINK

9
Making appropriate connections
  • Site licenses give libraries access to more
    journal titles. Another outcome of the serials
    crisis is that fewer, non-core journals are
    subscribed to and libraries have resorted to
    just-in-time document delivery and collections
    from licensed full-text aggregators.
  • Library users may thus have authority to access a
    paper free of charge via one library subscription
    or another. This has become know as the
    appropriate copy problem.
  • OpenURL is a generalized framework for
    communicating and resolving links and supports
    software solutions to the appropriate copy
    problem. OpenURL is described as an
    interoperability specification.

10
Syntax of OpenURL
  •  http//(who you are, where you are, your
    institution)/(where you want to go)
  • A
    B
    C
  • An OpenURL is mediated by the HTTP protocol
  • BASEURL, data about the user, typically inserted
    during transport between servers. One interim
    mechanism is to store the BASEURL as a cookie in
    the users browser. The cookie identifies the
    resolver that provides context-sensitive services
    for the user.
  • QUERY, points to the referenced object, which
    might be an identifier, e.g.
  • Digital Object Identifier (DOI)
  • Metadata derived from an authored reference
  • Partial metadata - a secondary service identifies
    the required document
  • OpenURL has been proposed as a National
    Information Standards Organization (NISO)
    standard http//library.caltech.edu/openurl/

11
Example OpenURL architecture
  • OpenURLs might be based on CrossRefDOI services
  • (from Beit-Arie et al., 2001, D-Lib Magazine,
    September) http//www.dlib.org/dlib/september01/ca
    plan/09caplan.html

12
The Open Archives Initiative (OAI)
  • The OAI (http//www.openarchives.org/) defines
  • A Metadata Harvesting Protocol (MHP), an
    application-independent interoperability
    framework that can be used by a variety of
    communities engaged in publishing content on the
    Web
  • Two classes of participants
  • Data providers expose metadata about content
  • Service providers issue protocol requests to data
    providers
  • OAI is a very simple, low-barrier-to-entry
    interface, shifting implementation complexity and
    operational processing load away from the data
    repositories to the developers of federated
    search services, repository redistribution
    services, etc.

13
OAI service providers an example
  • The Open Citation project interposing an OAI
    service provider between document (eprints)
    source and user interface

14
Creating information interfaces portals
  • We have to manage the underlying complexity in
    the form of interfaces. Portals have become
    important interfaces in the scholarly
    environment. Portal strategies
  • by publishers (e.g. Elseviers ScienceDirect)
  • by associated networked information services
    (e.g. Ingenta),
  • by library resource discovery networks (e.g.
    JISCs RDN)
  • have yet to establish a pre-eminent model. This
    is because all have concentrated on content,
    mostly owned content. The best next-generation
    portals will build services on top of content,
    and for researchers will become the starting
    point for all lines of enquiry.

15
Information interfaces RDN example
  • JISC RDN is a good example of building on content
    to provide new services and adaptable interfaces.
    The individual subject networks, in medicine,
    engineering, humanities and others, can be
    searched as though they were one unified
    repository, and an interface presenting users
    with this search facility can be embedded in any
    library Web page.

Guiding the implementation of these services is
the JISC Information Environment (from Powell and
Lyon 2001) http//www.ukoln.ac.uk/distributed-syst
ems/dner/arch/dner-arch.html
16
Multiple cooperating services in the
communication chain
FROM
Documents
User interface
http
Server
Client
  • TO
  • OpenURL,
  • OAI,
  • JISC IE
  • MEDIATING CONTENT
  • Site licenses,
  • eprint archives,
  • etc.

17
Access and interfaces implications for journals
  • Digital information, rich in media and resources,
    formal and informal, mediated by multiple
    services, presents the user with an array of
    choices that might answer his or her queries most
    efficiently.
  • Those queries might be expressed as input to a
    search engine, or by selecting a link. Where
    might these citations come from? Personal emails,
    discussion lists, open access services such as
    OAI, eprint archives, newsletters, library
    services, Z-gateways and academic subject
    portals, as well as formal research papers and
    commercial indexing services. There will be many
    more.
  • The journal package has traditionally been bound
    in issues and volumes. With the advent of
    multiple networked sources mediated by services
    such as OpenURL, the binding has been unstitched.

18
What are digital journals for?
  • Journals will be scaled back to the single
    essential function of quality control, in the
    form of managed peer review
  • Access to journal contents will be mediated by
    multiple interfaces - open access services,
    portals and information interfaces, other than
    just the journal.
  • Journals cannot remain the exclusive provider of
    peer-reviewed papers

19
A post-Google information environment
  • Electronic journals exist in a post-Gutenberg and
    a post-Google information environment
  • By March 2001 the Internet Archive had stored 10
    billion Web pages (100 terabytes of data)
  • The ability to locate a specified item of
    information precisely and instantly among the
    mass of information available on the Web has
    profound implications. In the electronic
    environment the search engine has become the de
    facto interface to information, rather than the
    fragmented packages that have migrated from the
    print world.

20
Building eprint archives
  • EPrints.org software for building institutional
    eprint archives for author self-archiving
  • Version 2.0 February 2002
  • OAI-compliant
  • Free open source software
  • Developed at the Electronics and Computer Science
    Department, University of Southampton
  • http//www.eprints.org/

21
A maximising strategy for authors
  • Authors who self-archive their papers in
    OAI-compliant institutional or discipline-based
    eprint archives will
  • Maximise interfaces to their work
  • Maximise access to their work
  • Maximise impact of their work

22
Maximising access arXiv example
  • Decreasing citation latencies The latency of the
    citation peak has been reducing over the period
    of the archive, i.e. each year papers are cited
    sooner and more often
  • Mining the Social Life of an Eprint Archive
    http//opcit.eprints.org/tdb198/opcit/

23
Maximising impact arXiv example
  • More highly cited papers show higher and more
    sustained download frequencies
  • Mining the Social Life of an Eprint Archive
    http//opcit.eprints.org/tdb198/opcit/

24
Maximising interfaces
  • Measuring arXiv access and impact data the Open
    Citation project has mined
  • Usage data from selected arXiv mirror server
    logs
  • Reference lists from 155,000 arXiv papers to
    build CiteBase, an open citation database
  • CiteBase, a new interface to the refereed
    literature http//citebase.eprints.org

25
Initiatives promoting open access to scholarly
research papers
  • Budapest Open Access Initiative (BOAI), funded
    by George Soros' Open Society Institute. Open
    access "gives readers extraordinary power to find
    and make use of relevant literature, and gives
    authors and their works vast and measurable new
    visibility, readership, and impact. February
    2002, has received almost 1800 signatories to
    date
  • http//www.soros.org/openaccess/read.shtml
  • Public Library of Science, scientists urge
    publishers to allow the research reports that
    have appeared in their journals to be distributed
    freely by independent, online public libraries of
    science. Open letter March 2001, received almost
    30 000 signatories
  • http//www.publiclibraryofscience.org/

26
A dynamic digital archive
  • Scientists and researchers, Nobel Laureates among
    them, have produced the clearest declaration of
    their requirement for access to published
    research papers a comprehensive collection that
    can be efficiently indexed, searched, and linked
  • Unimpeded access to these archives and open
    distribution of their contents will enable
    researchers to take on the challenge of
    integrating and interconnecting the fantastically
    rich, but extremely fragmented and chaotic,
    scientific literature.
  • Roberts et al. (2001) Science, 23rd March, 2001
    http//www.sciencemag.org/cgi/content/full/291/551
    2/2318a

27
Credits
  • The Open Citation project is a collaboration
    between Southampton University, Cornell
    University and arXiv
  • The project is lead by Stevan Harnad and Carl
    Lagoze
  • Technical development at Southampton is directed
    by Les Carr
  • EPrints.org software is being developed by Chris
    Gutteridge
  • CiteBase is produced and managed by Tim Brody
  • A copy of these slides can be found on the OpCit
    Web site
  • http//opcit.eprints.org/. Look for Papers and
    Presentations
  • Contact Steve Hitchcock sh94r_at_ecs.soton.ac.uk
Write a Comment
User Comments (0)
About PowerShow.com