Title: Distributed Databases and Applications
1Distributed Databases and Applications
- John Wieczorek
- Museum of Vertebrate Zoology, UC Berkeley
2Distributed Databases
- Multiple sources of data
- under local control,
- with concepts in common
- and a desire to deliver data as part of a
community.
3Distributed Databases
- The Species Analyst (TSA)
- The Integrated Taxonomic Information System
(ITIS) - FishNet
- The Mammal Networked Information System (MaNIS)
- HerpNET
- The Ornithological Information System (ORNIS)
4Distributed Databases
- European Natural History Science Information
Network (ENHSIN) - Biological Collection Access for Europe (BioCASE)
- Australia Virtual Herbarium (AVH)
- Red Mundial de Información Sobre Biodiversidad,
Comisión Nacional para el Conocimiento y Uso de
la Biodiversidad (REMIB, CONABIO)
5Distributed Databases
- Mountain and Plains Spatio-Temporal
Database-Informatics (MaPSTeDI) - Ocean Biogeographic Information System (OBIS)
- Pacific Basin Information Node, National
Biological Information Infrastructure (PBIN,
NBII) - Species Link, Centro de Referência em Informação
Ambiental (Species Link, CRIA) - A Virtual Herbarium of the Chicago Region
(vPlants) - Spatial Analysis of Local Vegetation Inventories
Across Scales (SALVIAS)
6Distributed Databases
- Berkeley Natural History Museums (BNHM)
- Association of Biological Collections, UC Davis
7Distributed Databases
- LifeMapper
- Global Biodiversity Information Facility (GBIF)
8Distributed vs. centralized
- Multiple sources of data
- under local control,
- with concepts in common
- and a desire to deliver data as part of a
community
9Distributed vs. centralized
- In other words, distribute the headache rather
than have one central migraine.
10DiGIRDistributed Generic Information Retrieval
- John Wieczorek, Stan Blum, Dave Vieglais, P.J.
Schwartz
11Project Rationale
- To avoid multiple incongruous development efforts
- To pool resources and create a community of
experts - To solve the problem of scalability
12Project Goals
- To define a protocol for retrieving structured
data from multiple, heterogeneous databases
across the Internet - To build a reference implementation of both
provider and portal software using said protocol
13Design Goals
- To use open protocols and standards, such as HTTP
and XML - To decouple the protocol, software and semantics
- To make new data provider installations as easy
as possible - To have open source development and GNU General
Public Licensing
14DiGIR Architecture
- User Interface
- Protocol
- Portal Engine
- Provider
15DiGIR Architecture
16DiGIR Architecture
17DiGIR Architecture
18DiGIR Architecture
19DiGIR Architecture
20DiGIR Architecture
- User Interface
- Protocol
- Portal Engine
21DiGIR Architecture
- User Interface
- Protocol
- Portal Engine
- Protocol
- Provider
22DiGIR Architecture
- User Interface
- Protocol
- Portal Engine
- Protocol
- Provider
23DiGIR Architecture
- User Interface
- Protocol
- Portal Engine
24DiGIR Component Summary
25DiGIR Protocol
- Defines request and response message formats for
communication between provider, portal engine,
and user interfaces - Metadata requests
- Search requests
- Inventory requests
- Remains unfettered by the structure of the data
it transfers
26Portal Engine
- The entry point for a user
- Can query a registry for potential providers
- Can determine, based on provider metadata,
whether a provider should be queried - Can send requests to multiple providers
- Communicates via protocol compliant messaging only
27Portal Engine, continued
- Assembles responses from providers
- Returns packaged results to the user
- Logs activity
28Provider
- Receives requests
- Retrieves data from database
- Sends results to requestor
- Supplies metadata to describe data classification
and availability - Logs requests
29Registry
- Supports provider advertising
- May be global and open
- May be private
- Need not be used at all
- Example Universal Description, Discovery and
Integration (UDDI)
30User Interfaces
- Must be able to assemble and send a request
document to a portal - Must be able to receive and interpret a response
document from the portal - This is where the real fun is!
31Example Network Configurations
32BNHM Network Configuration
BNHM DiGIR Portal
BNHM Presentation Layer
33MaNIS Network Configuration
MaNIS DiGIR Portal
MaNIS DiGIR Portal
MaNIS DiGIR Portal
MaNIS Presentation Layer
MaNIS Presentation Layer
MaNIS Presentation Layer
34MaNIS Network Configuration
MaNIS DiGIR Portal
MaNIS DiGIR Portal
MaNIS DiGIR Portal
MVZ-MaNIS Presentation Layer
LACM-MaNIS Presentation Layer
UWBM-MaNIS Presentation Layer
35MaNIS Network Configuration
MaNIS DiGIR Portal
MaNIS DiGIR Portal
MaNIS DiGIR Portal
MVZ-MaNIS Presentation Layer
LACM-MaNIS Presentation Layer
UWBM-MaNIS Presentation Layer
36MaNIS Network Configuration
MaNIS DiGIR Portal
MaNIS DiGIR Portal
MaNIS DiGIR Portal
MVZ-MaNIS Presentation Layer
LACM-MaNIS Presentation Layer
UWBM-MaNIS Presentation Layer
37MaNIS Network Configuration
MaNIS DiGIR Portal
MaNIS DiGIR Portal
MaNIS DiGIR Portal
MVZ-MaNIS Presentation Layer
LACM-MaNIS Presentation Layer
UWBM-MaNIS Presentation Layer
38MaNIS Network Configuration
MaNIS DiGIR Portal
MaNIS DiGIR Portal
MaNIS DiGIR Portal
MVZ-MaNIS Presentation Layer
LACM-MaNIS Presentation Layer
UWBM-MaNIS Presentation Layer
39Other Network Configurations
40DiGing a little deeper
41Provider Installation
- Web server (Apache, IIS, etc.)
- PHP Hypertext Preprocessor (PHP)
- Provider software (DiGIR)
- Configuration tool
- Testing scripts
- Provider scripts
- Provider manual (DiGIR)
42Provider Configuration Tool
- Provider metadata
- Resources
- Database connection
- Establishing table relationships
- Concept to column (i.e., field, attribute) mapping
43Portal Configuration
- Web server (Apache, IIS, etc.)
- Sun Java 2 (JDK 1.4)
- Tomcat (Apache)
- Portal software (DiGIR)
- Portal installation documentation (DiGIR)
44Portal Installation
- Engine configuration file (finding providers)
- Presentation configuration file (defining the
Information Domain) - Presentation customization
- Engine start and stop scripts
- Presentation start and stop scripts
45Portal Demonstrations
46DiGIR Project Information
- The DiGIR project is a collaborative effort
- DiGIR is currently established as an open source
development project on SourceForge
(https//sourceforge.net/projects/digir). - Further documentation is available on the DiGIR
web site (http//digir.net).