WebVoyge with a Wrapper - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

WebVoyge with a Wrapper

Description:

CSS. Michael Doran, Systems Librarian. doran_at_uta.edu. Example adding text #!/usr/bin/perl ... qw( link rel='stylesheet' type='text/css' href='/css/my.css' ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 34
Provided by: michae54
Category:
Tags: webvoyge | css | wrapper

less

Transcript and Presenter's Notes

Title: WebVoyge with a Wrapper


1
WebVoyáge with a Wrapper
  • Michael Doran, Systems Librarian
  • doran_at_uta.edu

Kentucky Voyager Users Group Meeting Thomas More
College - June 1, 2007
2
Once upon a time
Hi, my name is Lloyd
If you will just configure me right, I
will turn into a handsome OPAC.
3
What is a handsome OPAC?
  • Aesthetically handsome
  • Functionally handsome
  • An OPAC is handsome if it
  • is simple to use
  • is intuitive to use
  • makes it easy to find stuff
  • Only librarians like to search,
  • everybody else likes to find.
  • Roy Tennant

4
Simple searches
5
WebVoyáge simple search
6
WebVoyáge simple search?
7
It was soooo simple
8
WebVoyáge simple search (after)
9
Code
ltHTMLgt ltHEADgt ltTITLEgtWebVoyágelt/TITLEgt
ltMETA datagt lt/HEADgt ltBODYgt ltPgt
Yada, yada, yada... lt/Pgt ltFORM
ACTION"Pwebrecon.cgi"gt ltINPUT
TYPE"text"gt ltINPUT TYPE"submit"gt
lt/FORMgt ltPgt More yada, yada, yada...
lt/Pgt lt/BODYgt lt/HTMLgt
10
WebVoyáge server-side back end
Voyager OPAC
header
httpd
Pwebrecon.cgi
opac.ini
webrecon.exe
webvoyage.cgi
opacsvr
keysvr
indexes
Oracle
11
WebVoyáge is a black box
Voyager OPAC
header
httpd
Pwebrecon.cgi
opac.ini
webrecon.exe
Black Box
webvoyage.cgi
opacsvr
keysvr
indexes
Oracle
12
They call it a wrapper
Voyager OPAC
header
wrapper script
httpd
Pwebrecon.cgi
opac.ini
webrecon.exe
Black Box
webvoyage.cgi
opacsvr
keysvr
indexes
Oracle
13
They call it a wrapper
Voyager OPAC
httpd
header
wrapper script
Pwebrecon.cgi
Pwebrecon.cgi
Pwebrecon-orig.cgi
opac.ini
webrecon.exe
Black Box
webvoyage.cgi
opacsvr
keysvr
indexes
Oracle
14
Basic wrapper script
httpd
!/usr/bin/perl data_stream ./Pwebrecon-orig.c
gi print data_stream exit
wrapper script
Pwebrecon.cgi
Pwebrecon.cgi
Pwebrecon-orig.cgi
ltHTMLgt ltHEADgt ltTITLEgtWebVoyágelt/TITLEgt
ltMETA datagt lt/HEADgt ltBODYgt ltPgt
Yada, yada, yada... lt/Pgt ltFORM
ACTION"Pwebrecon.cgi"gt ltINPUT
TYPE"text"gt ltINPUT TYPE"submit"gt
lt/FORMgt ltPgt More yada, yada, yada...
lt/Pgt lt/BODYgt lt/HTMLgt
webrecon.exe
Black Box
webvoyage.cgi
opacsvr
keysvr
Oracle
15
Do your thing to that datastream
  • aka screen scraping
  • A technique in which a computer program
    extracts data from the display output of another
    program. The key element that distinguishes
    screen scraping from regular parsing is that the
    output being scraped was intended for final
    display to a human user, rather than as input to
    another program, and is therefore usually neither
    documented nor structured for convenient
    parsing. from Wikipedia
  • text wrangling
  • add text
  • delete text
  • rearrange text

16
Example adding text
  • Voyagers header.htm file
  • is inserted after the ltbodygt tag
  • okay for display tags, but not for others
  • Wrapper script can insert elements within the
    ltheadgt tag
  • metadata
  • JavaScript
  • CSS

17
Example adding text
!/usr/bin/perl data_stream ./Pwebrecon-orig.c
gi meta_code qw(ltlink rel"stylesheet"
type"text/css" href"/css/my.css"gt) data_stream
slt/HEADgtmeta_codelt/HEADgti print
data_stream exit
ltHTMLgt ltHEADgt ltmeta http-equiv"Content-Type"
Content"text/htmlcharsetUTF-8"gt ltTITLEgtLibrary
Catalog - University of Texas at
Arlingtonlt/TITLEgt ltlink rel"stylesheet"
type"text/css" href"/css/my.css"gt lt/HEADgt ltBODY
onLoad"document.querybox.Search_Arg.focus()
ltHTMLgt ltHEADgt ltmeta http-equiv"Content-Type"
Content"text/htmlcharsetUTF-8"gt ltTITLEgtLibrary
Catalog - University of Texas at
Arlingtonlt/TITLEgt lt/HEADgt ltBODY
onLoad"document.querybox.Search_Arg.focus()
18
Example removing text
!/usr/bin/perl data_stream ./Pwebrecon-orig.c
gi data_stream sltTDgtltSTRONGgt.?lt/STRONGgtUni
versity of Texas at Arlington Librarylt/TDgti pri
nt data_stream exit
19
Example rearranging text
!/usr/bin/perl data_stream ./Pwebrecon-orig.c
gi data_stream sSearch request
(.)lt/TDgt.Results (.) entries.ltbr /gt2
entries for ltbgt1lt/bgts print data_stream exit
20
Show and go
  • keyword anywhere search
  • words within quotes are treated as a phrase
  • other words are automatically Boolean ANDed
  • relevancy ranked results

Hmmm thats -like
in functionality
21
No secret handshakes
  • last name, first name for author searches
  • no initial articles for title searches
  • Library of Congress subject headings
  • Boolean operators
  • what an index browse is

22
Wrapper script redux
httpd
!/usr/bin/perl data_stream ./Pwebrecon-orig.c
gi print data_stream exit
wrapper script
Pwebrecon.cgi
Pwebrecon.cgi
Pwebrecon-orig.cgi
webrecon.exe
Black Box
  • Read and parse form input
  • QUERY_STRING (get method)
  • STDIN (post method)

webvoyage.cgi
opacsvr
keysvr
Oracle
23
Truncation adaptation
samuel clem
24
Incoming data
!/usr/bin/perl data_stream ./Pwebrecon-orig.c
gi print data_stream exit
QUERY STRING Search_ArgsamuelclemSearch_Code
GKEY5E PIDD3hcrVVigATy0bZXCTmXMK61orl SEQ200
70527145935CNT50HIST1
25
Incoming data
!/usr/bin/perl ReadParse() data_stream
GetOrigDataStream() sub GetOrigDataStream
data_stream ./Pwebrecon-orig.cgi return
data_stream print data_stream exit
26
Example truncation adaptation
!/usr/bin/perl ReadParse() data_stream
GetOrigDataStream() sub GetOrigDataStream
search_arg formdata'Search_Arg'
search_arg s/\/?/g if (ENV'QUERY_STRING
') ENV'QUERY_STRING'
s/Arg.?Search/Argsearch_argSearch/
data_stream ./Pwebrecon-orig.cgi return
data_stream print data_stream exit
27
Example truncation adaptation
28
Other input data munging
  • fix Voyager 6.x GKEY/TKEY/SKEY keyword multiple
    spaces" no hits bug (Support Web incident
    131344)
  • search_arg s/ / /g
  • deal with right single quotation mark vs.
    apostrophe in search input issue
  • allow for ISBNs with dashes
  • (combined output/input) data munging

29
Is a wrapper right for you?
  • requires some programming expertise
  • requires lots (and lots) of testing
  • test platform
  • ideally a Voyager test server
  • separate WebVoyáge instance (a la preview server)
  • law of unintended consequences
  • extra layer makes WebVoyáge more brittle
  • more dependencies, e.g. with opac.ini
  • upgrades more complicated

30
Getting started
  • wrappers are language-neutral, however
  • Perl is good
  • designed for text processing
  • robust regular expressions
  • is already on your system
  • example wrappers available
  • its fine to think big
  • but start small

31
Resources
  • Michael Doran, University of Texas at Arlington
  • Presentation WebVoyáge with a Wrapper Source
    code http//rocky.uta.edu/doran/wrapper/
  • Ere Maijala, National Library of Finland
  • EEndUser 2006 presentation
  • Enhancement scripts for WebVoyáge OPAC
  • password required see European EndUser on
    SupportWeb
  • Source code http//www.lib.helsinki.fi/english/l
    ibraries/linnea/resources/pwebrecon2.htm

32
A small start
  • copy original Pwebrecon.cgi
  • cp p Pwebrecon.cgi Pwebrecon-orig.cgi
  • create Pwebrecon.cgi wrapper template
  • add desired feature
  • test

!/usr/bin/perl data_stream ./Pwebrecon-orig.c
gi print data_stream exit
33
Q A
Write a Comment
User Comments (0)
About PowerShow.com