How many PURLs would an URL Checker check - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

How many PURLs would an URL Checker check

Description:

Use automatic substitution to replace PURL with (current) underlying URL ... 1. Unblock GPO PURLs and run interactive report monthly (e.g., after Marcive load) ... – PowerPoint PPT presentation

Number of Views:128
Avg rating:3.0/5.0
Slides: 47
Provided by: staf47
Category:
Tags: url | check | checker | many | purls | unblocked | url

less

Transcript and Presenter's Notes

Title: How many PURLs would an URL Checker check


1
How many PURLs would an URL Checker check
  • Millennium URL Checker in the Real World

Mary M. Strouse Catholic University of
America ILUG 2004
2
Evolution
  • 1994 Field 856 u designated for URLs
  • 1998 MARCxGen debuts
  • 2000 URLVerify (telnet version, Rel. 2000)
  • 2002 u added to other MARC fields
  • 2003 Millennium URL Checker (Rel. 2002 ph. 2)

3
URLVerify Web Reporthttp//catalog/screens/urlv
erify.html

4
Millennium URL Checker Report
Summary of error types (uncheck to hide)
Integrated with MilCat
5
Sort by column headers
Resize columns (no truncation)
6
Highlight a row and click Edit to open record
Click edit to get MARC record
Access public view
7
Clicking GO opens URL in a browser window (for
rechecking)
8
Locating Missing Links
9
Automatic Substitution of New URL
Check boxes to select, then click preview tab
10
Uncheck any errors, click process
Summary screen
11
Correcting URL Directly in Report
1) Type in New URL 2) Check replace box 3)
Preview process
12
Copying Old URL to Edit Window
  • Check replace box (must do first)
  • Select Old URL - New URL
  • Edit in new URL window
  • Preview Process

13
Find and Replace (New URL)
14
Interactive Reports
Toggle between most recent Automatic and
Interactive reports
Create new interactive report
15
Interactive report can run against entire
database, a review file, an index range, or a
keyword search
16
Monday Morning Recheck
17
Cant minimize or work with desktop while report
is running
18
Error Types
19
Malformed URL (-2)
htp//app.comm.uscourts.gov
20
New error type in Phase 3 (Millennium report only)
Network is unreachable (-7)
21
http//public.afca.scott.af.mil/public.
22
PURLs and Other Redirects
Every server redirection reported as an error
23
Redirection can be a sign a resource has moved,
and maintenance is warranted.
24
Missing slash after directory name reported as
permanent redirect (301)
Edit to eliminate from future reports
25
Server-side redirect to add timestamp
http//library.nps/navy.mil/uhtbin.cgisirsi/SunAp
r20222815PDT2003/0/520/nss.pdf
26
All PURLs are identified as redirects, not
checked further
True also of 3rd-party link checkers (except Xenu)
27
I-Hate-PURLs Workflow
Use automatic substitution to replace PURL with
(current) underlying URL
Replace box cant be batch-selected.
28
Beware the Leaving GPO Message
29
URL Checker reports entire frwebgate wrapper as
the new URL
http//frwebgate.access.gpo.gov/cgi-bin/leaving.cg
i?fromexitpurl.htmltohttp3A//www.uscourts.gov/
ttb/index.html
30
Library-editable URLBlock File
Not a substitute for honoring no robots
conventions!
31
Block can be a full URL, domain name or text
string
III-specified blocks for major aggregators
PURL.ACCESS.GPO.GOV
32
Trust-the-Government Workflow
1. Unblock GPO PURLs and run interactive report
monthly (e.g., after Marcive load)
33
2. Exclude working redirects, troubleshoot others
Must load entire report before excluding
redirects (slow)
34
WAIS Database searches reported as timeout errors
(-6)
35
WAM Proxy Rewrite URLs Not Checked
Host Unreachable (-5)
3rd-party link checkers report all proxy-rewrite
URLs OK even if nonexistent.
36
Fool-the-System Workflow
856 41u http//heinonline.org/HeinOnline/
CollectionIndex.pl? journal-cjtl z ltA
href"http//0-heinonline.org. columbo.law.cua.edu
/HeinOnline/CollectionIndex.pl?journalcjtl"gt
View via Hein Online lt/Agt
Underlying URL in u, PURL or proxy-rewrite URL
within anchor tag in z.
37
Multi-threading Rate
  • The number of simultaneous calls sent to
    servers at a given time
  • URL checker gt 100
  • 3rd-party link checkers 20-30 (often
    user-configurable)
  • At issue when many resources concentrated on a
    few servers
  • URL Checker activity may be perceived as an
    attack

38
Summary What URL Checker Checks
  • URLs in subfield u of 856 fields in Bib. Records
    (but not URLs in other subfields)
  • URLs in 956 fields in electronic reserves
    (Millennium Media) records

39
And What it Doesnt
  • URLs or domains in the URLBlock file
    (aggregators, etc)
  • Purls and other redirects
  • Proxy-rewrite URLs in WAM
  • Electronic journal issue URLs in checkin boxes
  • URLs in bibliographic record notes

40
Suggestions for Further Development Reports
Editing
  • Pre-configure large interactive reports (faster
    loading)
  • Allow minimization during report prep
  • Bypass summary of attached items
  • Improve copy paste, batch select replace.
  • Interactive checking of New URL column

41
Suggestions for Further Development
Functionality
  • Follow redirects to final destination
  • Honor page-level and server-level robot
    exclusions, and report with a unique status code
  • Customize multi-threading rate
  • Output report in CSV (comma-delimited) format

42
URL Checker Documentation
  • Millennium Manual (Rls. 2003)
  • Permissions (105370)
  • Reports (105371)
  • Edit/Replace capability (105372)
  • URLBlock (105373)

43
URLVerify Documentation
  • Innopac manual, pages 102151-102153
  • Maintaining Hyperlinks in the WebPac Tools and
    Tradeoffs (IUG 8, May 2000) http//www.du.edu/tt
    yler/iug2000/ctw/index.html
  • Tom Tylers freeware http//www.du.edu/ttyler/fre
    eware/

44
URL Display WWWOptions
  • DISPLAY_856 Defines the order and placement of
    subfields that form the hypertext link in an OPAC
    display (default is z then u)
  • Multiple subfields (including access and usage
    notes) display as a single underlined link.
    Enhancement request separate WWWoptions to
    control display of link and notes.

45
URL Display WWWOptions
  • LINK856TEXT Defines the phrase that appears
    above the hypertext link in a full display
    (Default is Click here to)
  • ICON_856LINK controls display of 856 link in a
    brief display
  • (Manual 102168)

46
Contact
  • Mary M. Strouse
  • DuFour Law Library,
  • Catholic University of America
  • strouse_at_law.cua.edu

Thank you!
Write a Comment
User Comments (0)
About PowerShow.com