Title: Nomadic%20Digital%20Library%20Research%20at%20Cornell
1The Impact of the Internet on Research
Universities Examples from Distance Education
Digital Libraries William Y. Arms Department of
Computer Science Cornell University
2Universities and Cost
In 1978, a Cornell education cost one Chevrolet
per year. In 2001, a Cornell education costs one
BMW per year. Every year, costs have gone up
faster than average income.
The costs of research universities are dominated
by personnel. Major reductions in unit costs
require different use of personnel.
3Technology in Education and Distance Education
By creative use of technology Can we teach more
students, to a high level, with less faculty per
student?
4Technology in Education
Technology Example Date History Time
sharing Dartmouth Basic 1964 Television Open
University 1972 Personal computers Apple
University Consortium 1984 Campus
networks Carnegie Mellon Andrew
1986 Current Internet Digital libraries 1991 Web
Distance learning 2000
5Course Web Sites
6eCornell
For profit, non-degree executive and professional
courses
7Technology in Education and Distance Education
Question 1 Quality Is it good education?
8Skepticism
In a recent survey by JSTOR of faculty in social
sciences and humanities, only 17 thought that
distance education was as good as conventional
campus-based education. (Preliminary data please
do not quote)
What is the evidence?
9The British Open University
Distance education Students at home, with
limited access to tutors, summer
schools. Technology used as appropriate Printed
materials, home experimental kits, videos,
computing, etc. Academic standards Full degree
programs, external control of quality.
Longevity First students in 1972.
10The Open University
Currently 215,000 students. Over 2
million students since 1972. Ranked in the
top 10 of all UK universities, for
teaching quality. Ranked after
Cambridge, York, Oxford, Imperial College,
London School of Economics, Warwick, University
College London, Durham and
Sheffield. Higher Education Funding Council,
1997
11Technology in Education and Distance Education
Question 2 Capital Intensive Education What are
the organizational options?
12Capital Intensive Education
Conventional course Major cost is
faculty time. Costs are repeated every
year. Technology in education and distance
education Course materials are a major
expense. Marginal cost of delivering course
is low. Consequences Economies of scale
Universities need access to capital
Course materials are an asset
13Columbia University Cambridge University
Press London School of Economics New York Public
Library University of Chicago University of
Michigan British Library American Film
Institute RAND Woods Hole Victoria and Albert
Museum Science Museum Natural History Museum
14(No Transcript)
15Technology in Education and Distance Education
Question 3 Ownership and Intellectual
Property If course materials are assets, who owns
them?
16Recommendations of a Cornell Committee
1. The university policies on intellectual
property should be independent of the media in
which ideas are expressed. 2. Creators of works
should have control over the intellectual output
resulting from their research, teaching, and
writing. 3. When there are multiple creators of
an individual work, the control should be shared
among the creators. 4. When the university
contributes substantial resources to the
development of specific materials, it has a right
to share in the control and returns.
17MIT to make nearly all course materials available
free on the World Wide Web Unprecedented step
challenges 'privatization of knowledge' CAMBRIDGE,
Mass. -- MIT President Charles M. Vest has
announced that the Massachusetts Institute of
Technology will make the materials for nearly all
its courses freely available on the Internet over
the next ten years. He made the announcement
about the new program, known as MIT
OpenCourseWare (MITOCW), at a press conference at
MIT on Wednesday, April 4th. MIT Press Release,
April 4, 2001
18Digital Libraries
By creative use of technology Can we build
libraries that are of high quality at much lower
costs?
19Research Libraries are Expensive
library materials
buildings facilities
staff
20The Open Access Web
Before the web Few people had access to
scientific, medical, legal information With the
web Much high quality information is
available with open access Free services
organize this information and provide access to it
"Please can I use the web? I don't do
libraries." Anonymous Cornell student, circa
1996.
21The Potential of Digital Libraries
open access
computers networks
materials
staff
22Digital Libraries
Question 1 Economic Models for Open Access Who
pays for open access?
23A False Assumption
Incorrect thinking The only incentive for
creating information is to make money --
royalties to authors and profits for
publishers Correct thinking Many creators do not
require revenue Marketing and
promotion Government information
Academic research
They want their materials to be used
24Examples
Old New Books in Print (subscription) Amazon.
com (advertising) Medline (pay-by-use) Grateful
Med (external) Journal (subscription) ePrint
archives (external) Westlaw (pay-by-use) Legal
Information Institute (external) Inspec
(subscription) Google (advertising)
25Before You Ask ...
The open access information is sometimes a
poor substitute Much good information
is not available with open access
But every year the proportion of important
information that is available with open access
increases
26Open Letter We support the establishment of an
online public library that would provide the full
contents of the published record of research and
scholarly discourse in medicine and the life
sciences in a freely accessible, fully
searchable, interlinked form. Establishment of
this public library would vastly increase the
accessibility and utility of the scientific
literature, enhance scientific productivity, and
catalyze integration of the disparate communities
of knowledge and ideas in biomedical sciences.
27Hypotheses for Scholarly Information
The dominant force is author pressure, which
emphasizes open access rather than closed access.
28Digital Libraries
Question 2 Quality What are the alternatives to
peer review?
29(No Transcript)
30Observations about Peer Review
At its best, it is superb. At its worst, it
validates junk. Some topics can be reviewed from
a paper, e.g., mathematics. Some topics cannot be
reviewed from a paper, e.g., computer systems.
"Whatever you do, write a paper. Some journal
will publish it." Advice to young faculty
member, University of Sussex, 1969.
31Quality without Peer Review
How can readers recognize good quality
materials? How can publishers maintain high
standards and let readers know? How can a
scientist build a reputation outside the
traditional peer-reviewed journals?
A sample of one William Y. Arms
32Digital Libraries
Question 3 Brute Force Computing How far can
computers be used for the skilled tasks of
professional librarianship?
33Brute Force Computing
Few people really understand Moore's Law --
Computing power doubles every 18 months --
Increases 100 times in 10 years -- Increases
10,000 times in 20 years Simple algorithms
immense computing power may outperform human
intelligence
34Brute Force Computing
Example Creators of the world champion chess
program (Deep Thought later Deep Blue) --
moderate chess players -- simple tree-search
algorithm -- very, very fast computer hardware
35Example Catalogs and Indexes
Catalog, index and abstracting records are very
expensive when created by skilled
professionals -- only available for certain
categories of material (e.g., monographs,
scientific journals) -- contain limited fields
of information (e.g., no contents page) --
restricted to static information
36Equivalent Services
Information discovery I used to be a heavy user
of Inspec. Now I use Google instead.
Why are web search services the most widely used
information discovery tools in universities
today?
37Thinking out of the Box
For information discovery, particularly with
untrained users automated indexing of full text
is at least as effective as manually
produced indexes and catalogs Demonstrated
repeatedly in experiments going back to the
original Cranfield experiments.
38Digital Libraries
Question 4 Automated Digital Libraries What is
the state of the art in automated digital
libraries?
39Automated Digital Libraries Examples
Automatic indexing Lycos, Infoseek, Altavista,
Google, ... Query matching Vector methods
(Salton) Ranking importance Google (Page and
Brin) Archiving Internet Archive
(Kahle) Collection development ResearchIndex
(Lawrence) Metadata extraction Informedia
(Wactlar)
40Digital Libraries
Question 5 A National Science Library (NSDL) Can
we build a very low cost national science library
using the methods of automated digital libraries?
41One of Six Core Integration Demonstration
Projects for the NSDL
42How Big might the NSDL be?
The NSDL aims to be comprehensive -- all
branches of science, all levels of education,
very broadly defined. Five year targets
1,000,000 different users 10,000,000 digital
objects 100,000 independent sites
Requires low-cost, scalable, technology
automated collection building and maintenance
43Levels of InteroperabilityMetadata Harvesting
Agreements on simple protocol and metadata
standard(s) Example Metadata harvesting
protocol of the Open Archives Initiative
(MHP) Moderate-quality services Low cost
of entry to participating sites Moderately large
numbers of loosely collaborating sites Promising
but still an emerging approach
44Levels of InteroperabilityGathering
Robots gather collections automatically with no
participation from individual sites Examples Web
search services (e.g., Google) CiteSeer (a.k.a.
ResearchIndex) Restricted but useful services
Zero cost of entry to gathered sites Very
large numbers of independent sites Only suitable
for open access collections
45Technology Demonstrations
1. One Library, Many Portals 2. Coherent
Services across Heterogeneous Collections 3.
Easy Integration of Participating Collections 4.
Variable Levels for Integrating Collections 5.
Tools to Create New Collections
46Some Light Reading
William Y. Arms, "Automated digital libraries."
D-Lib Magazine, July/August 2000.
http//www.dlib.org/dlib/july20/07contents.html
William Y. Arms, "Economic models for
open-access publishing." iMP, March 2000.
http//www.cisp.org/imp/march_2000/03_00arms.htm