Title: Pre-processing OpenURLs
1Pre-processing OpenURLs
- Case Study University of Kansas
John Miller jsmiller_at_ku.edu Univ. of Kansas
EndUser 2004
2Outline
- the problems
- examples
- possible solutions / tools
- examples
- PubMed exception
- Benefits
3The problems...
- inaccurate incoming OpenURLs
- data in the wrong element
- incomplete incoming OpenURLs
- GENRE sometimes missing
- Author data missing or hidden
- problematic journal title data
- no logging for statistics
4The problems...
- lack of optional passing of OpenURL elements to
Extended Services within LFP SysAdmin - only required elements are passed
- absence of a required element eliminates the link
- not really possible to pass an entire OpenURL to
an External Service simply from within SysAdmin
5Examples of problems ...
- Journal Titles
- initial articles (Voyager doesnt handle)
- quotation marks (Voyager chokes on them)
- internal dashes (SilverPlatter, et al.)
- author and other characters at end of title
(causes some title search links to fail) - SilverPlatter
- author(s) embedded in the PID rather than in
AUTHOR or AULAST - Book titles appear in ATITLE rather than TITLE
6Example of SilverPlatter author tagging
pid3CAN3E06681153C/AN3E3CAU3ELedesma-Lieba
na2c-Patricia3C/AU3E
6
7Examples of problems ...
- ISI (Web of Science)
- journal titles appear only in STITLE, even though
not abbreviated - mushes the volume and issue together incorrectly
for Physical Review D -- is 4 digits, should be 2
or 3 - American Physical Society Journals
- need to move ARTNUM to SPAGE
- GENRE element often missing
- Only PAGES is supplied, but need SPAGE
- Need formatted data for later use, to get around
the only required are passed problem
8Possible solution ...
- Intercept the incoming OpenURL
- alter it, augment it
- send the revised OpenURL to LFP
- offers a generalized, flexible approach that can
be improved over time - Coordinate with what is needed by individual
Extended Services (e.g., OPAC searches, ILL form) - Use revised and augmented data supplied by the
pre-processing program
9Tools Techniques ...
- Pre-processor fairly simple Perl / CGI program
(but could be something else) - must be able to receive data from a URL, change
it, and send a new URL elsewhere - substitute pre-processors URL for the normal LFP
base URL - a willingness to fudge with some OpenURL elements
that are infrequently used and not needed by LFP - e.g., a fake BICI element
- create a log record of each click
10- Source
- Index Citation
- Catalog Record
- Footnote
pre-processor
Link Resolver Parser Knowledge Base
Perl - PHP - etc.
Standard Target
10
11(fake) BICIsidgenreatitlefull_author title
datevolumeissuespage epageissnisbnartnum
- the above string enables LFP SysAdmin to look
only for the presence of a BICI as a trigger for
a particular extended service -- rather than the
existence of a set of OpenURL elements - the extended service then has access to all of
the above elements that exist - some elements (full_author, spage, epage)
sometimes can be derived from others if they do
not already exist (full_author is a
locally-defined tag)
11
12http//diglib.ku.edu/cgi-bin/illiad?biciBICI
BICI BICI
12
1313
14Log file
- DATETIME SID GENRE TITLE DATE VOLUME
ISSUE SPAGE EPAGE ISSN ISBN - 20040323094818CASCAPLUSarticleJournal of
Pharmaceutical Sciences200392815310022-3549
- 20040323094920ISIWoKarticleBIODIVERSITY AND
CONSERVATION200413110960-3115 - 20040323095100ISIWoKarticleBIODIVERSITY AND
CONSERVATION20041312070960-3115 - 20040323095247ISIWoKarticleBIODIVERSITY AND
CONSERVATION20041312750960-3115 - 20040323095518ISIWoKarticleBASIC AND APPLIED
ECOLOGY2003453851439-1791 - 20040323095749ISIWoKarticleCONSERVATION
ECOLOGY200262141195-5449 - 20040323095948ISIWoKarticleBIOLOGICAL
CONSERVATION20041151630006-3207 - 20040323100026SPMLABarticleRussian Studies in
Literature2001373891061-1975 - 20040323100112SPMLABarticleRussian Studies in
Literature2003394661061-1975 - 20040323100519ISIWoKarticleAGRICULTURE
ECOSYSTEMS amp2003981-33310167-8809 - 20040323101518SPPYarticleAmerican
Psychologist195496320003-066X - 20040323101611SPPYarticleAmerican
Psychologist195712140003-066X
1520040323095247 ISIWoK article BIODIVERSITY
AND CONSERVATION 2004 13 1 275
0960-3115
Date / Time 20040323095247 SID ISIWoK GENRE
article TITLE BIODIVERSITY AND
CONSERVATION DATE 2004 VOLUME 13 ISSUE 1 SPAG
E 275 ISBN ISSN 0960-3115 ARTNUM
15
16March 2004 - clicked on titles
- ASHP Midyear Clinical Meeting 103
- Journal of Personality and Social Psychology 88
- International Journal of Eating Disorders 74
- Social Work 72
- Psychological Reports 70
- Child Development 67
- Journal of the American Academy of Child
- and Adolescent Psychiatry 55
- Journal of Adolescence 53
- Child Abuse and Neglect 48
- Addictive Behaviors 48
- Journal of Youth and Adolescence 46
- Journal of College Student Development 45
- Drug Top 43
- Child and Adolescent Social Work Journal 42
- Journal of Applied Social Psychology 42
- American Journal of Psychiatry 42
- Journal of Applied Behavior Analysis 41
- Perceptual and Motor Skills 41
- Adolescence 41
- Am J Health Syst Pharm 41
- Nature 40
- Annals of human biology 40
- Smith College Studies in Social Work 40
- Human Biology 40
- (6,626 other titles with fewer than 40 clicks)
- (18,644 clicks altogether)
17March 2003 - clicked-from databases
- Periodical Abstracts (OCLC FS) 235
- GeoRef (SilverPlatter) 210
- WorldCat (OCLC FS) 203
- America History and Life (ABC-Clio) 190
- Sports Discus (SilverPlatter) 174
- Education Abstracts (OCLC FS) 171
- Social Service Abstracts (CSA) 154
- Compendex (EV2) 145
- SciFinder Scholar Medline 139
- Zoological Abstracts 136
- PapersFirst (OCLC FS) 131
- EconLit (SilverPlatter) 125
- (33 other databases with fewer than 40 clicks)
- (18,644 clicks altogether)
- PsycInfo (SilverPlatter) 6931
- Eric (CSA) 2128
- Social Work Abstracts (SilverPlatter) 1034
- MLA Bibliography (SilverPlatter) 1000
- IPA (SilverPlatter) 867
- SciFinder Scholar CA Plus 724
- Anthropology Plus (RLG) 552
- ArticleFirst (OCLC FS) 435
- Art Index (SilverPlatter) 429
- Biological Abstracts (SilverPlatter) 344
- Web of Science (ISI) 342
- Sociological Abstracts (CSA) 255
- Linguistics and Language Behavior
- Abstracts (CSA) 246
- Anthropological Index, Royal
- Anthropological Institute (RLG) 243
18The PubMed exception
- All that comes from PubMed initially is a PMID
(PubMed Identifier) - Can log the identifier and the time, but nothing
else - Requires redundant External Services to handle
variations
19This set-up is combined with custom XML to (1)
suppress duplicates when an incoming OpenURL
satisfies more than one condition and (2) supply
a standard phrase
19
20 ltxslfor-each select"link"gt ltxslvariable
name"link-name" select"name"/gt ltxslchoosegt
ltxslwhen test"contains(link-name,
'ILLiad')"gt ltxslchoosegt ltxslwhen
test"position() 1"gt ... ltul
class"list"gt ltli class"list-item"gt
ltxslvariable name"orig-url"
select"url"/gt ltxslvariable
name"url"gt ltxslvalue-of
select"orig-url"/gt lt/xslvariablegt
lta target"_blank" href"url"gt Request
a loan or copy of this item (if not available in
the KU Libraries) lt/agt ... lt/
xslwhengt ltxslotherwise/gt lt/xslchoosegt
...
- from LFPDisplay.xsl
- multiple ILLiad services
- in priority order in SysAdmin
- this shows only the first one with a standard
phrase
20
21Standard ILL phrase Request a loan or copy ...
21
22The Benefits?
- More full text links work
- More OPAC title searches work
- Some impossible services become possible
- more importantly, they become consistently
possible - Source use statistics are compilable
23Contact information
- John Miller
- University of Kansas
- jsmiller_at_ku.edu
2420040323095247 ISIWoK article BIODIVERSITY
AND CONSERVATION 2004 13 1 275
0960-3115