Title: Through the Bytes Darkly,
1Through the Bytes Darkly,
Management Information and the Digital Library
Information Technology Interest Group ACRL, New
England Chapter
Joe Zucca Assessment, Planning and Publications
Librarian University of Pennsylvania Library
2Four Sections of This Presentation
1. Environmental Audit Key Factors That
Influence Our Ability to Measure Digital
Information Use 2. From Low Resolution to High
Resolution Data Mining the Server Logs 3. The
Data Farm Experiment Tools That Serve Access Can
Also Serve Measurement 4. Why the Data Are
Important
3Measuring Electronic Use at Penn Environmental
Influences
1. Organization and Culture
Strategic Focus Base planning, goal
setting/assessment on empirical evidence. From
1996- an element of Penns Strategic
Plan Operational Imperatives 1) Make
evaluation and measurement a component of each
program and project 2) Construct relays
that feed data to people who need quantitative
information to strategize and manage Experimen
tal Attitude Leverage the data you have usually
theyre good enough to validate organizational
experience and knowledge
4Measuring Electronic Use at Penn Environmental
Influences
2. Proliferation of Electronic Resources
Article indexes, e-journals and other full-text
resources
5Measuring Electronic Use at Penn Environmental
Influences
2.1. Growth of Expenditures for Electronic
Resources
Annual Growth of Expenditures for Electronic
Information Based on 1991
E-Resources as a percent of acquisitions budget
1991 1993 1996
1999 2000 2001 3.7
3.2 5.5 13.2
13.9 15.7
6Measuring Electronic Use at Penn Environmental
Influences
3. Technologys Hostility to Measurement
- Volatile metrics (The new system doesnt count
that way!) - Ever-changing data elements (sets are out
searches are in) - No common metrics (log-ins, sessions, searches,
browses, page hits) - No measurement standards (Whats a search?,
Whats a Web session?) - Non existent or inaccessible data (the vendor
problem) - Approximate hard to obtain statistics (lots of
data, no information) - Fleeting benchmarks
7From Low Resolution to High Resolution Data
Mining the Server Logs for Descriptive
Statistics
dial-123-130.dial. indiana.edu - - 04/ Feb/2001
001802 -0500 "GET /special/ photos/
theater/504.html HTTP/1.0" 200 3247
"http//www.library.upenn. edu /special/photos/
theater /503.html" "Mozilla/4.7 C-CCK MCD C-UDP
EBM-APPLE (Macintosh I PPC) dialin1085.
upenn.edu--04/Feb/ 20010018 04
-0500"GET/facilities/count_ use.html?resource
China20Economic20 Review method ejs url
http//www.sciencedirect.com/ science/journal/
1043951XHT TP/1.0" 200 2027 "http//
www.library.upenn.edu/webbin 5/ resources/ejspubl
ic5.cgi?homepagehttp// www. library.upenn.edu/li
pp incott/community Business" "Mozilla/ 4.0
(compatible MSIE 5.0 Windows 98 DigExt SPIKE
5) 203.197. 226.240 - - 04/Feb/2001001807
-0500 "GET /etext/sasia/aiis/ architecture/khajur
aho/ 010a.jpg HTTP/1.0" 200 89117
"http//www.library.upenn.edu/etext/sasia/
aiis/arch itecture/khajuraho/010.html"
"Mozilla/4.7 en (Win95 I)
8Low Resolution
Inputs
Records in locally-managed databases (including
the OPAC)26,332,138 Number of journal
article indexes full-text files (e.g. Academic
Index)....267 Number of e-journals (from
publishers such as Elsevier and free
sources)....6,608 Number of digital books
(locally created, aggregated and
licensed)....110,000 Number of locally
digitized and accessible images (e.g. fine art
slides, ms facsimiles)..82,356 Number of records
in the OPAC ........2,879,696
Number of pages, forms and directories
constituting the library web site.32,000
9Low Resolution
The Load on Our Machines
Web Pages Served 1995-2001 from
www.library.upenn.edu. 3-month moving average
10Low Resolution
Changing Machine Demand
BlackBoard
Pages Served by the Main Library Web Server
OPAC Server
25,000,000
OPAC
Web
20,000,000
15,000,000
10,000,000
5,000,000
0
2002
1996
1997
1998
1999
2000
2001
Projected
11Low Resolution
Search Activity Over Time
Annual Searches in Licensed Databases (e.g.,
MEDLINE), FY97-01
searches
12Correlation Matrix of Use Metrics Available for
Ovid Files
Pearson r for Sessions, Connect Time, Sets,
Documents Viewed
99 cases
Sessions Time Sets Docs.Viewed Sessions
1.00 Time .980 1.00 Sets
.905 .971 1.00 Documents Viewed .844
.932 .983 1.00
13Correlation Matrix of Use Metrics Available for
SilverPlatter Files
Pearson r for Sessions, Connect Time, Searches,
Documents Viewed
Sessions Time Searches Abs.
Viewed Sessions 1.00 Time
.975 1.00 Searches .899
.901 1.00 Abstracts Viewed .840
.870 .855 1.00
94 cases
14High Resolution Data User Input Good Program
Liaison and Knowledge Support Resource
Management, and Inform Basic Questions, e.g.
- Are we choosing the right information sources
for our audiences? - optimizing the delivery of electronic
information? - making access as easy and seamless as possible?
- spending our dollars wisely?
- able to detect and respond to change in the
patterns of resource use?
15Using the Architecture of the Web to Increase
Data Resolution
www.library.upenn.edu/facilities/count_use.html
16Beginning with a stream of unprocessed log data...
dial-123-130.dial. indiana.edu - -
04/Feb/2001001738-0500 "GET/special/photos
/theater/505.html HTTP/1.0" 200 3086
"http//www.library. upenn.edu/special/photos/thea
ter/504.html" "Mozilla/4.7C-CCK-MCD C-UDP
EBM-APPLE (Macintosh I PPC) recrawler
1.bos2.fastsearch.net - -04/Feb/200100 1821-
0500 "GET /etext/ sasia/skt-mss/1549 /15a.html
HTTP/1.0" 200 2736 "-" "FAST -WebCrawler/2.2-pre27
(crawler_at_ fast.no http//www .fast.no/faq/
faqfastweb search/faqfastwebcrawler.html)"
130.91.196.245.in-addr.arpa--04/Feb/200100
1740 -0500 "GET /facilities/count_use.html?reso
urce ABI/Inform 20 20Ovid method
Ovidurlhttp// www.abi-ovid.library.upenn.edu/ov
id web/ovidweb.cgi? TJS PAGE mainMODEovid
Dinfoz HTTP/1.1" 200 2039 "http//www.library.up
enn.edu/webbin5/resources/ databases.cgi?
business" "Mozilla/4.0 (compatible MSIE 5.5
Windows NT 4.0) 203.197.226.240 - -
04/Feb/2001001741 -0500 "GET
/etext/sasia/aiis/architecture /khajuraho/010.html
HTTP/1.0" 200 4427 "http//www.
library.upenn.edu/etext/ sasia/
aiis/architecture/ khajur aho/" "Mozilla/4.7 en
(Win95 I) 203.197.226. 240- -04/Feb/200
1001744 -0500 "GET /images/banner.
gifHTTP/1.0" 404 2814 "http//www.library. upenn.
edu/etext/sasi a/aiis/architecture
/khajuraho/010.html" "Mozilla /4.7 en (Win95
I)"pub237.lib.upenn.edu - - 04/Feb/
2001001748 -0500 "GET / HTTP/1.0" 200 8070
"-" "WebTrends Alert dial-123-130.dial.
indiana.edu - - 04/ Feb/2001 001802 -0500
"GET /special/ photos/ theater/504.html HTTP/1.0"
200 3247 "http//www.library.upenn. edu
/special/photos/ theater /503.html" "Mozilla/4.7
C-CCK MCD C-UDP EBM-APPLE (Macintosh I PPC)
dialin1085. upenn.edu--04/Feb/ 20010018 04
-0500"GET/facilities/count_use.html?resourceChin
a20Economic20 Review method ejs url
http//www.sciencedirect.com/ science/journal/
1043951XHT TP/1.0" 200 2027 "http//
www.library.upenn.edu/webbin 5/ resources/ejspubl
ic5.cgi?homepagehttp// www. library.upenn.edu/li
pp incott/community Business" "Mozilla/ 4.0
(compatible MSIE 5.0 Windows 98 DigExt SPIKE
5) 203.197. 226.240 - - 04/Feb/2001001807
-0500 "GET /etext/sasia/aiis/ architecture/khajur
aho/ 010a.jpg HTTP/1.0" 200 89117
"http//www.library.upenn.edu/etext/sasia/
aiis/arch itecture/khajuraho/010.html"
"Mozilla/4.7 en (Win95 I)
17and information culled from databases that
generate our Web pages...
Æ http//www.uqtr.uquebec.ca/AE/index.htmlWorld
History of ArtF-TNo07-16-1999
111110-25-2000 1130 ABA Bank
Compliance http//proquest.umi.com/pqdlink?Ver1
Exp07-01-2003REQ3PUB14954Cert0CEccdp7
aMS6kuCDmdhPNL2bQ2tTOLTrDEHAz2bYmHN172RUqZPCJ2Sv
ATX2bFGA7htIYkVlFVWSyawE0NvKlpBZ2bO2f2bLEWBnch
nwLT92b2fdGGHSlx0PO3dxUQd3g2S9QP2FghKaQ2ncl5EdDK
Bum2vykhvxsyRQutjuMGKfxAKHOA4-PennABI/InformB
usiness,FinanceF-TPI No03-13-2001
000103-14-2001 1131mw ABA
Journal http//proquest.umi.com/pqdlink?Ver1Exp
07-012003REQ3PUB27585CertPfySiFXf1
0i6kuCDmdhPNL2bQ2tTOLTrDEHAz2bYmHN172RUqZPCJ2SvA
TX2bFGA7ht1pGvDP2bFxrGwE0NvKlpBZ2bO2f2bLEWBnc
hnwLT92b2fdGGHSlx0PO3dxUQd3g2S9QP2FghKaQ2ncl5EdD
KBum2vykhvxsyRQutjuAyIsegc4Y7Y-PennABI/Inform
FinanceF-TPINo03-13-2001 0001mw ABI/Inform
http//www.umi.com/pqdautoPennBiomedical
Research,Management,Business,Clinical
Medicine,Clinical Medicine,Nursing, Econo mics,
Health Care Policy Management
F-TSDbNo07-16-1999 111102-09-2001 1214
18to extracting, parsing, storing, and mining for
significant content.
19Use of Licensed Resources
What Databases Do Our Clients Use at What Cost?
15 Most Frequently Used Index/Abstract/Full-text
Databases in FY 2001
Database
Log-ins Pct Total Cost Per Login
20Use of Licensed Resources
What Are the High Use E-Journals, Data for FY2001
Title
Log-ins Pct Total Log-ins
Log-ins
On Campus Off Campus
21Use of Licensed Resources
How Much Bang Do We Get on the Dollar For
E-Journals?
E-Journal Subscription Costs Per Log-In, FY2002
(July-April)
Publisher Log-ins Pct
of Total Cost Per Login
ScienceDirect 139,727 27.1 0.63 ECO
70,730 13.7 0.09 JSTOR 48,668 9.4 0.35 Wil
ey 38,255 7.4 0.09 ACS
31,865 6.2 0.12 Ideal 30,568 5.9 5.51 Blac
kwell/Munksgaard 28,940 5.6 0.27 Journals_at_Ovid
26,982 5.2 n/a Oxford 14,819 2.9 0.20 Sprin
gerLINK 13,507 2.6 n/a ABI/Inform
12,785 2.5 3.08 Project Muse
11,438 2.2 1.22 AIP 7,873 1.5 5.01 Cambrid
ge 7,835 1.5 n/a Annual Reviews
7,215 1.4 0.08 IEEE 7,132 1.4 6.73 RSC
5,661 1.1 n/a Others 11,451 2.2 Total 515
,451 100 11 publishers
22Use of Licensed Resources
How Does Use Scatter Across Databases
Use Measured in Log-ins for FY 2001
23Database Use by Penns Schools Centers
Use of Licensed Resources
School Pct of Log-ins
How Does Database Use Distribute By Communities?
Per Capita Use of Databases by Penns Schools and
Centers, FY 2001
55
50
45
40
35
30
Log-ins Per Capita
25
20
15
10
5
0
LAW
VET
ASC
MED
NUR
SAS
GSE
SSW
SEAS
GSFA
WHRT
ADM
DENTAL
School and Center Domains
Does not include resources licensed by the Law
Library for Law school affiliates
24Use of Licensed Resources
Database E-Journal Log-ins by Subject (based on
log samples from FY2001)
Subject focus
Human. Life Social Business Physical Total Sc
ience Science Science Administration 21.1 36.
5 13.9 07.0 21.6 100.0 Wharton 02.9 74.3
03.2 19.2 00.5 100.0 Annenberg
15.2 32.1 42.3 08.9 01.5 100.0 Medical 0
2.3 86.0 01.9 01.0 08.8 100.0 Dental 01.8
87.7 08.9 00.2 01.4 100.0 Veterinary 01.7
96.0 00.6 00.4 01.3 100.0 Dialin 08.5 63.
2 09.9 15.4 02.9 100.0 Education 24.6 13.1
61.5 00.8 00.0 100.0 Fine
Arts 29.0 18.5 45.7 5.6 01.2 100.0 Law 13
.0 26.6 20.9 37.0 02.4 100.0 Library 21.3
54.8 09.1 08.5 06.3 100.0 Nursing 15.9 73
.1 07.8 03.2 00.0 100.0 Student
Residences 18.9 57.0 12.6 09.0 02.5 100.0 A
rts and Sciences 08.2 26.3 5.7 09.9 49.9 100.
0 Engineering 0 1.5 29.5 2.3 01.2 65.6 10
0.0 Social Work 20.6 29.1 41.6 06.1 02.7 1
00.0 Unresolved 18.9 44.7 17.8 10.0 08.6 1
00.0 Total 14.7 50.7 11.9 8.6
14.1 100.0
Network Domain
25Use of Licensed Resources
Where Do Our Clients Access Information?
Database Log-ins by Domain, FY2001
Campus Residences 10
Off-Campus 15
In-Library 25
On-Campus Depts 50
26Use of Licensed Resources
Where Do Communities of Clients Work?
Database Log-ins from Off Campus as a Percent of
Total Log-ins, FY2001
Pct. of Log-ins
School or Center
27Use of Licensed Resources
When Are They Working?
Database Use by Time of Day, FY2001
28Use of Licensed Resources
How Does Audience Composition Change Through the
Day?
Database Use by hour, FY2001
29The Data Farm Experiment Tools That Serve
Information Access Can Also Serve Measurement
30Schematic of the Data Farm As of May 2002
31Scripts Server
Oracle
logs
Staff Client
Server array
Data Farm Processes
Voyager
DLXS
32Perils of the MIS Prototype Lessons Learned
Normalize the Data Regularize the Migration of
Logs from Production Machines Manage the
Storage Maintain the Scripts Standardize
Processes program modules, plug-in
scripts Optimize Usability
33Why Are the Data Important?
If you dont know where youre going, youll
probably end up somewhere else - Casey Stengel
To Demonstrate Accountability Is the library
spending the Schools money effectively?
(Pressures of Penns responsibility center
budget environment) To Understand and Describe
the Transfer of Technology Is the academic
information universe a digital universe (as some
at Penn believe)? Is the digital universe
more cost efficient than the paper one (as some
at Penn believe)? To Guide the Improvement of
Existing and the Development of New Services To
Ensure the Successful Fulfillment of Our Mission
34Through the Bytes Darkly,
Management Information and the Digital Library
Joe Zucca
University of Pennsylvania Library
zucca_at_pobox.upenn.edu
35 Return-Path ltolson_at_pobox.upenn.edugt Subject
Again, testing general databases To
sblack_at_asc.upenn.edu Date Wed, 10 Apr 2002
165411 -0400 (EDT) From olson_at_pobox.upenn.edu
Dear Sharon -- Just a second quick note begging
you, please, keep trying to look at those three
databases! Data farm usage logs indicate that
one-quarter of all database logins from Annenberg
IP addresses in 2001 were pointing to Academic
Index (followed by Lexis-Nexis and PsycInfo, both
with about 10-percent of all Annenberg database
logins). Also, 15-percent of all Academic Index
school-based logins last year came from Annenberg
IP addresses, more than from all schools except
Arts and Sciences (at 30-percent). Considering
how much Annenberg people use the general
database -- and you must know best how they can
raise Holy Ned over the least change, I hope that
you can find the time to check out the three
candidate databases. I'm happy to come over and
walk you through the log-in.
36Journal of the American Chemical Society Journal
of Organic Chemistry Tetrahedron Letters
Log-ins
Reshelves
37(No Transcript)
38(No Transcript)
39ScienceDirect Articles Viewed, FY 2001
40Academic Press (Ideal) Articles Viewed, FY 2001