Title: Database Applications -- The UC Berkeley Environmental Digital Library
1Database Applications -- The UC Berkeley
Environmental Digital Library
- University of California, Berkeley
- School of Information Management and Systems
- SIMS 257 Database Management
2Lecture Outline
- Review
- Database Administration
- Database Applications
- Berkeleys Environmental Digital Library
-
3Final Project Requirements
- See WWW site
- http//sims.berkeley.edu/courses/is257/f02/index.h
tml - Report on personal/group database including
- Database description and purpose
- Data Dictionary
- Relationships Diagram
- Sample queries and results (Web or Access tools)
- Sample forms (Web or Access tools)
- Sample reports (Web or Access tools)
- Application Screens (Web or Access tools)
4Final Presentations and Reports
- Specifications for final report are on the Web
Site under assignments - Presentations (1 on Nov. 28, Others on Nov 30,
Dec 5th and 7th (Full))
5Lecture Outline
- Review
- Database Administration
- Database Applications
- Berkeleys Environmental Digital Library
-
6Terms and Concepts (trad)
- Data Administration
- Responsibility for the overall management of data
resources within an organization - Database Administration
- Responsibility for physical database design and
technical issues in database management - These roles are often combined or overlapping in
some organizations
7Database System Life Cycle
Note this is a different version of this life
cycle than discussed previously
8Database Planning DA DBA functions
- Develop corporate database strategy (DA)
- Develop enterprise model (DA)
- Develop cost/benefit models (DA)
- Design database environment (DA)
- Develop data administration plan (DA)
9Database Analysis DA DBA functions
- Define and model data requirements (DA)
- Define and model business rules (DA)
- Define operational requirements (DA)
- Maintain corporate Data Dictionary (DA)
10Database Design DA DBA functions
- Perform logical database design (DA)
- Design external models (subschemas) (DBA)
- Design internal model (Physical design) (DBA)
- Design integrity controls (DBA)
11Database Implementation DA DBA functions
- Specify database access policies (DA DBA)
- Establish Security controls (DBA)
- Supervise Database loading (DBA)
- Specify test procedures (DBA)
- Develop application programming standards (DBA)
- Establish procedures for backup and recovery
(DBA) - Conduct User training (DA DBA)
12Operation and Maintenance DA DBA functions
- Monitor database performance (DBA)
- Tune and reorganize databases (DBA)
- Enforce standards and procedures (DBA)
- Support users (DA DBA)
13Growth Change DA DBA functions
- Implement change control procedures (DA DBA)
- Plan for growth and change (DA DBA)
- Evaluate new technology (DA DBA)
14Functions in Database Administration
- Planning and Design (we have already looked at
theses processes in detail) - Data Integrity
- Backup and Recovery
- Security Management
15Data Integrity
- Intrarecord integrity (enforcing constraints on
contents of fields, etc.) - Referential Integrity (enforcing the validity of
references between records in the database) - Concurrency control (ensuring the validity of
database updates in a shared multiuser
environment)
16Database Security
- Views or restricted subschemas
- Authorization rules to identify users and the
actions they can perform - User-defined procedures (and rule systems) to
define additional constraints or limitations in
using the database - Encryption to encode sensitive data
- Authentication schemes to positively identify a
person attempting to gain access to the database
17Database Backup and Recovery
- Backup
- Journaling (audit trail)
- Checkpoint facility
- Recovery manager
18Disaster Recovery Planning
From Toigo Disaster Recovery Planning
19Threats to Assets and Functions
- Water
- Fire
- Power Failure
- Mechanical breakdown or software failure
- Accidental or deliberate destruction of hardware
or software - By hackers, disgruntled employees, industrial
saboteurs, terrorists, or others
20Threats
- Between 1967 and 1978 fire and water damage
accounted for 62 of all data processing
disasters in the U.S. - The water damage was sometimes caused by fighting
fires - More recently improvements in fire suppression
(e.g., Halon) for DP centers has meant that water
is the primary danger to DP centers
21Kinds of Records
- Class I VITAL
- Essential, irreplaceable or necessary to recovery
- Class II IMPORTANT
- Essential or important, but reproducible with
difficulty or at extra expense - Class III USEFUL
- Records whose loss would be inconvenient, but
which are replaceable - Class IV NONESSENTIAL
- Records which upon examination are found to be no
longer necessary
22Offsite Storage of Data
- Early offsite storage facilities were often
intended to survive atomic explosions - PRISM International directory
- Mirror sites (Hot sites)
- E.g. Cantor-Fitzgerald
23Lecture Outline
- Review
- Database Administration
- Database Applications
- Berkeleys Environmental Digital Library
-
24Berkeley DL Project
- Object Relational Database Applications
- The Berkeley Digital Library Project
- Slides from RRL and Robert Wilensky, EECS
- Use of DBMS in DL project
25Overview
- What is an Digital Library?
- Overview of Ongoing Research on Information
Access in Digital Libraries
26Digital Libraries Are Like Traditional
Libraries...
- Involve large repositories of information
(storage, preservation, and access) - Provide information organization and retrieval
facilities (categorization, indexing) - Provide access for communities of users
(communities may be as large as the general
public or small as the employees of a particular
organization)
27Traditional Library System
28But Digital Libraries Are Different From
Libraries...
- Not a physical location with local copies
objects held closer to originators - Decoupling of storage, organization, access
- Enhanced Authoring (origination, annotation,
support for work groups) - Subscription, pay-per-view supported in addition
to free browsing. - Integration into user tasks.
29A Digital Library Infrastructure Model
30UC Berkeley Digital Library Project
- Focus Work-centered digital information
services - Testbed Digital Library for the California
Environment - Research Technical agenda supporting
user-oriented access to large distributed
collections of diverse data types. - Part of the NSF/NASA/DARPA Digital Library
Initiative (Phases 1 and 2)
31UCB Digital Library Project Research
Organizations
- UC Berkeley EECS, SIMS, CED, IST
- UCOP/CDL
- Xerox PARCs Document Image Decoding group and
Work Practices group - Hewlett-Packard
- NEC
- SUN Microsystems
- IBM Almaden
- Microsoft
- Ricoh California Research
- Philips Research
32Testbed An Environmental Digital Library
- Collection Diverse material relevant to
Californias key habitats. - Users A consortium of state agencies,
development corporations, private corporations,
regional government alliances, educational
institutions, and libraries. - Potential Impact on state-wide environmental
system (CERES )
33The Environmental Library -Users/Contributors
- California Resources Agency, California
Environment Resources Evaluation System (CERES) - California Department of Water Resources
- The California Department of Fish Game
- SANDAG
- UC Water Resources Center Archives
- New Partners CDL and SDSC
34The Environmental Library - Contents
- Environmental technical reports, bulletins, etc.
- County general plans
- Aerial and ground photography
- USGS topographic maps
- Land use and other special purpose maps
- Sensor data
- Derived information
- Collection data bases for the classification and
distribution of the California biota (e.g.,
SMASCH) - Supporting 3-D, economic, traffic, etc. models
- Videos collected by the California Resources
Agency
35The Environmental Library - Contents
- As of late 2002, the collection represents over
one terabyte of data, including over 183,000
digital images, about 300,000 pages of
environmental documents, and over 2 million
records in geographical and botanical databases.
36Botanical Data
- The CalFlora Database contains taxonomical and
distribution information for more than 8000
native California plants. The Occurrence Database
includes over 600,000 records of California plant
sightings from many federal, state, and private
sources. The botanical databases are linked to
the CalPhotos collection of California plants,
and are also linked to external collections of
data, maps, and photos.
37Geographical Data
- Much of the geographical data in the collection
has been used to develop our web-based GIS
Viewer. The Street Finder uses 500,000 Tiger
records of S.F. Bay Area streets along with the
70,000-records from the USGS GNIS database.
California Dams is a database of information
about the 1395 dams under state jurisdiction. An
additional 11 GB of geographical data represents
maps and imagery that have been processed for
inclusion as layers in our GIS Viewer. This
includes Digital Ortho Quads and DRG maps for the
S.F. Bay Area.
38Documents
- Most of the 300,000 pages of digital documents
are environmental reports and plans that were
provided by California state agencies. This
collection includes documents, maps, articles,
and reports on the California environment
including Environmental Impact Reports (EIRs),
educational pamphlets, water usage bulletins, and
county plans. Documents in this collection come
from the California Department of Water Resources
(DWR), California Department of Fish and Game
(DFG), San Diego Association of Governments
(SANDAG), and many other agencies. Among the most
frequently accessed documents are County General
Plans for every California county and a survey of
125 Sacramento Delta fish species.
39Testbed Success Stories
- LUPIN CERES Land Use Planning Information
Network - California Country General Plans and other
environmental documents. - Enter at Resources Agency Server, documents
stored at and retrieved from UCB DLIB server. - California flood relief efforts
- High demand for some data sets only available on
our server (created by document recognition). - CalFlora Creation and interoperation of
repositories pertaining to plant biology. - Cloning of services at Cal State Library, FBI
40Research Highlights
- Documents
- Multivalent Document prototype
- Page images, structured documents, GIS data,
photographs - Intelligent Access to Content
- Document recognition
- Vision-based Image Retrieval stuff, thing, scene
retrieval - Natural Language Processing categorizing the
web, Cheshire II, TileBar Interfaces
41Multivalent Documents
- MVD Model
- radically distributed, open, extensible
- behaviors and layers
- behaviors conform to a protocol suite
- inter-operation via IDEG
- Applied to enlivening legacy documents
- various nice behaviors, e.g., lenses
42Document Presentation
- Problem Digital libraries must deliver digital
documents -- but in what form? - Different forms have advantages for particular
purposes - Retrieval
- Reuse
- Content Analysis
- Storage and archiving
- Combining forms (Multivalent documents)
43Spectrum of Digital Document Representations
Adapted from Fox, E.A., et al. Users, User
Interfaces and Objects Evision, an Electronic
Library, JASIS 44(8), 1993
44Document Representation Multivalent Documents
- Primary user interface/document model for UCB
Digital Library (Wilensky Phelps) - Goal An approach to new document representations
and their authoring. - Supports active, distributed, composable
transformations of multimedia documents. - Enables sophisticated annotations, intelligent
result handling, user-modifiable interface,
composite documents.
45Multivalent Documents
46(No Transcript)
47(No Transcript)
48MVD availability
- The MVD Browser is now available as open source
on SourceForge - http//sourceforge.net/project/showfiles.php?group
_id44509 - See also
- http//http.cs.berkeley.edu/phelps/Multivalent/
49GIS in the MVD Framework
- Layers are georeferenced data sets.
- Behaviors are
- display semi-transparently
- pan
- zoom
- issue query
- display context
- spatial hyperlinks
- annotations
- Written in Java
50GIS Viewer Features
- Annotation and saving
- points, rectangles (w. labels and links), vectors
- saving of annotations as separate layer
- Integration with address, street finding,
gazetteer services - Application to image viewing tilePix
- Castanet client
51(No Transcript)
52(No Transcript)
53(No Transcript)
54GIS Viewer Example
http//elib.cs.berkeley.edu/annotations/gis/buildi
ngs.html
55Geographic Information Plans and Ideas
- More annotations, flexible saving
- Support for large vector data sets
- Interoperability
- On-the-fly
- conversion of formats
- generation of catalogs
- Via OGDI/GLTP
- Experimenting with various CERES servers
56Documents Information from scanned documents
- Built document recognizers for some important
documents, e.g. Bulletin 17. TR-9. - Recognized document structure, with order
magnitude better OCR. - Automatically generated 1395 item dam relational
data base. - Enabled access via forms, map interfaces.
- Enable interoperation with image DB.
57(No Transcript)
58(No Transcript)
59(No Transcript)
60Document Recognition Ongoing Work
- Document recognizers for dozen document types
- Development and integration of mathematical OCR
and recognition. - Eventually produce document recognizer generator,
i.e., make it easier to write recognizers.
61Vision-Based Image Retrieval
- Stuff-based queries blobs
- Basic blobs colors, sizes, variable number
- demonstrated utility for interesting queries
- Blob world Above plus texture, applied to
- retrieving similar images
- successful learning scene classifier
- Thing-finding Successfully deployed detectors
adding body plans (adding shape, geometry and
kinematic constraints)
62Image Retrieval Research
- Finding Stuff vs Things
- BlobWorld
- Other Vision Research
63(Old stuff-based image retrieval Query)
64(Old stuff-based image retrieval Result)
65Blobworld use regions for retrieval
- We want to find general objects? Represent
images based on coherent regions
66(No Transcript)
67(No Transcript)
68(Thing-based image retrieval using body
plans Result)
69Natural Language Processing
Automatic Topic Assignment
- Developed automatic categorization/disambiguation
method to point where topic assignment (but not
disambiguation) appears feasible. - Ran controlled experiment
- Took Yahoo as ground truth.
- Chose 9 overlapping categories took 1000 web
pages from Yahoo as input. - Result 84 precision 48 recall (using top 5
of 1073 categories)
70Further Information
- Berkeley DL web site
- http//elib.cs.berkeley.edu