Pre-processing OpenURLs - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Pre-processing OpenURLs

Description:

Pre-processing OpenURLs Case Study: University of Kansas John Miller jsmiller_at_ku.edu Univ. of Kansas EndUser 2004 Outline the problems examples possible solutions ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 24
Provided by: John770
Category:

less

Transcript and Presenter's Notes

Title: Pre-processing OpenURLs


1
Pre-processing OpenURLs
  • Case Study University of Kansas

John Miller jsmiller_at_ku.edu Univ. of Kansas
EndUser 2004
2
Outline
  • the problems
  • examples
  • possible solutions / tools
  • examples
  • PubMed exception
  • Benefits

3
The problems...
  • inaccurate incoming OpenURLs
  • data in the wrong element
  • incomplete incoming OpenURLs
  • GENRE sometimes missing
  • Author data missing or hidden
  • problematic journal title data
  • no logging for statistics

4
The problems...
  • lack of optional passing of OpenURL elements to
    Extended Services within LFP SysAdmin
  • only required elements are passed
  • absence of a required element eliminates the link
  • not really possible to pass an entire OpenURL to
    an External Service simply from within SysAdmin

5
Examples of problems ...
  • Journal Titles
  • initial articles (Voyager doesnt handle)
  • quotation marks (Voyager chokes on them)
  • internal dashes (SilverPlatter, et al.)
  • author and other characters at end of title
    (causes some title search links to fail)
  • SilverPlatter
  • author(s) embedded in the PID rather than in
    AUTHOR or AULAST
  • Book titles appear in ATITLE rather than TITLE

6
Example of SilverPlatter author tagging
pid3CAN3E06681153C/AN3E3CAU3ELedesma-Lieba
na2c-Patricia3C/AU3E
6
7
Examples of problems ...
  • ISI (Web of Science)
  • journal titles appear only in STITLE, even though
    not abbreviated
  • mushes the volume and issue together incorrectly
    for Physical Review D -- is 4 digits, should be 2
    or 3
  • American Physical Society Journals
  • need to move ARTNUM to SPAGE
  • GENRE element often missing
  • Only PAGES is supplied, but need SPAGE
  • Need formatted data for later use, to get around
    the only required are passed problem

8
Possible solution ...
  • Intercept the incoming OpenURL
  • alter it, augment it
  • send the revised OpenURL to LFP
  • offers a generalized, flexible approach that can
    be improved over time
  • Coordinate with what is needed by individual
    Extended Services (e.g., OPAC searches, ILL form)
  • Use revised and augmented data supplied by the
    pre-processing program

9
Tools Techniques ...
  • Pre-processor fairly simple Perl / CGI program
    (but could be something else)
  • must be able to receive data from a URL, change
    it, and send a new URL elsewhere
  • substitute pre-processors URL for the normal LFP
    base URL
  • a willingness to fudge with some OpenURL elements
    that are infrequently used and not needed by LFP
  • e.g., a fake BICI element
  • create a log record of each click

10
  • Source
  • Index Citation
  • Catalog Record
  • Footnote

pre-processor
Link Resolver Parser Knowledge Base
Perl - PHP - etc.
Standard Target
10
11
(fake) BICIsidgenreatitlefull_author title
datevolumeissuespage epageissnisbnartnum
  • the above string enables LFP SysAdmin to look
    only for the presence of a BICI as a trigger for
    a particular extended service -- rather than the
    existence of a set of OpenURL elements
  • the extended service then has access to all of
    the above elements that exist
  • some elements (full_author, spage, epage)
    sometimes can be derived from others if they do
    not already exist (full_author is a
    locally-defined tag)

11
12
http//diglib.ku.edu/cgi-bin/illiad?biciBICI
BICI BICI
12
13
13
14
Log file
  • DATETIME SID GENRE TITLE DATE VOLUME
    ISSUE SPAGE EPAGE ISSN ISBN
  • 20040323094818CASCAPLUSarticleJournal of
    Pharmaceutical Sciences200392815310022-3549
  • 20040323094920ISIWoKarticleBIODIVERSITY AND
    CONSERVATION200413110960-3115
  • 20040323095100ISIWoKarticleBIODIVERSITY AND
    CONSERVATION20041312070960-3115
  • 20040323095247ISIWoKarticleBIODIVERSITY AND
    CONSERVATION20041312750960-3115
  • 20040323095518ISIWoKarticleBASIC AND APPLIED
    ECOLOGY2003453851439-1791
  • 20040323095749ISIWoKarticleCONSERVATION
    ECOLOGY200262141195-5449
  • 20040323095948ISIWoKarticleBIOLOGICAL
    CONSERVATION20041151630006-3207
  • 20040323100026SPMLABarticleRussian Studies in
    Literature2001373891061-1975
  • 20040323100112SPMLABarticleRussian Studies in
    Literature2003394661061-1975
  • 20040323100519ISIWoKarticleAGRICULTURE
    ECOSYSTEMS amp2003981-33310167-8809
  • 20040323101518SPPYarticleAmerican
    Psychologist195496320003-066X
  • 20040323101611SPPYarticleAmerican
    Psychologist195712140003-066X

15
20040323095247 ISIWoK article BIODIVERSITY
AND CONSERVATION 2004 13 1 275
0960-3115
Date / Time 20040323095247 SID ISIWoK GENRE
article TITLE BIODIVERSITY AND
CONSERVATION DATE 2004 VOLUME 13 ISSUE 1 SPAG
E 275 ISBN ISSN 0960-3115 ARTNUM
15
16
March 2004 - clicked on titles
  • ASHP Midyear Clinical Meeting 103
  • Journal of Personality and Social Psychology 88
  • International Journal of Eating Disorders 74
  • Social Work 72
  • Psychological Reports 70
  • Child Development 67
  • Journal of the American Academy of Child
  • and Adolescent Psychiatry 55
  • Journal of Adolescence 53
  • Child Abuse and Neglect 48
  • Addictive Behaviors 48
  • Journal of Youth and Adolescence 46
  • Journal of College Student Development 45
  • Drug Top 43
  • Child and Adolescent Social Work Journal 42
  • Journal of Applied Social Psychology 42
  • American Journal of Psychiatry 42
  • Journal of Applied Behavior Analysis 41
  • Perceptual and Motor Skills 41
  • Adolescence 41
  • Am J Health Syst Pharm 41
  • Nature 40
  • Annals of human biology 40
  • Smith College Studies in Social Work 40
  • Human Biology 40
  • (6,626 other titles with fewer than 40 clicks)
  • (18,644 clicks altogether)

17
March 2003 - clicked-from databases
  • Periodical Abstracts (OCLC FS) 235
  • GeoRef (SilverPlatter) 210
  • WorldCat (OCLC FS) 203
  • America History and Life (ABC-Clio) 190
  • Sports Discus (SilverPlatter) 174
  • Education Abstracts (OCLC FS) 171
  • Social Service Abstracts (CSA) 154
  • Compendex (EV2) 145
  • SciFinder Scholar Medline 139
  • Zoological Abstracts 136
  • PapersFirst (OCLC FS) 131
  • EconLit (SilverPlatter) 125
  • (33 other databases with fewer than 40 clicks)
  • (18,644 clicks altogether)
  • PsycInfo (SilverPlatter) 6931
  • Eric (CSA) 2128
  • Social Work Abstracts (SilverPlatter) 1034
  • MLA Bibliography (SilverPlatter) 1000
  • IPA (SilverPlatter) 867
  • SciFinder Scholar CA Plus 724
  • Anthropology Plus (RLG) 552
  • ArticleFirst (OCLC FS) 435
  • Art Index (SilverPlatter) 429
  • Biological Abstracts (SilverPlatter) 344
  • Web of Science (ISI) 342
  • Sociological Abstracts (CSA) 255
  • Linguistics and Language Behavior
  • Abstracts (CSA) 246
  • Anthropological Index, Royal
  • Anthropological Institute (RLG) 243

18
The PubMed exception
  • All that comes from PubMed initially is a PMID
    (PubMed Identifier)
  • Can log the identifier and the time, but nothing
    else
  • Requires redundant External Services to handle
    variations

19
This set-up is combined with custom XML to (1)
suppress duplicates when an incoming OpenURL
satisfies more than one condition and (2) supply
a standard phrase
19
20
ltxslfor-each select"link"gt ltxslvariable
name"link-name" select"name"/gt ltxslchoosegt
ltxslwhen test"contains(link-name,
'ILLiad')"gt ltxslchoosegt ltxslwhen
test"position() 1"gt ... ltul
class"list"gt ltli class"list-item"gt
ltxslvariable name"orig-url"
select"url"/gt ltxslvariable
name"url"gt ltxslvalue-of
select"orig-url"/gt lt/xslvariablegt
lta target"_blank" href"url"gt Request
a loan or copy of this item (if not available in
the KU Libraries) lt/agt ... lt/
xslwhengt ltxslotherwise/gt lt/xslchoosegt
...
  • from LFPDisplay.xsl
  • multiple ILLiad services
  • in priority order in SysAdmin
  • this shows only the first one with a standard
    phrase

20
21
Standard ILL phrase Request a loan or copy ...
21
22
The Benefits?
  • More full text links work
  • More OPAC title searches work
  • Some impossible services become possible
  • more importantly, they become consistently
    possible
  • Source use statistics are compilable

23
Contact information
  • John Miller
  • University of Kansas
  • jsmiller_at_ku.edu

24
20040323095247 ISIWoK article BIODIVERSITY
AND CONSERVATION 2004 13 1 275
0960-3115
Write a Comment
User Comments (0)
About PowerShow.com