MARIAN: Searching and Querying Across Heterogeneous Federated Digital Libraries - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

MARIAN: Searching and Querying Across Heterogeneous Federated Digital Libraries

Description:

harvester. data provider. repository. Supporting protocol requests: Identify. ListMetadataFormats ... Schedule-driven Harvester. SDI / Filtering for NDLTD ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 30
Provided by: Mar5683
Category:

less

Transcript and Presenter's Notes

Title: MARIAN: Searching and Querying Across Heterogeneous Federated Digital Libraries


1
MARIANSearching and Querying Across
Heterogeneous FederatedDigital Libraries
  • Marcos André Gonçalves
  • Robert K. France
  • Edward A. Fox
  • Tamas E. Doszkocs
  • Work performed at Virginia Tech, Blacksburg, VA
    USA
  • Support provided in part by NSF National
    Library of Medicine.

2
JCDL 2001
  • First Joint ACM/IEEE Conference on
    Digital Libraries ( NSF DLI-2 PI mtg)
  • http//www.jcdl.org
  • June 24-28, 2001 in Roanoke, VA
  • Conference Committee
  • General Chair Edward A. Fox, Virginia Tech
  • Program Chair Christine Borgman, UCLA
  • Treasurer Neil Rowe, Naval Postgraduate School
  • Posters Chair Craig Nevill-Manning, Rutgers U.

3
Outline
  • NDLTD
  • Harvesting Strategies and the OAI
  • MARIAN Middleware
  • Generating Digital Libraries with 5SL
  • Future Directions

4
NDLTD (1 of 3)
  • Context Networked Digital Library of Theses and
    Dissertations, www.ndltd.org, www.theses.org
  • Please join! Submit your (students) works!
  • International federation of universities,
    libraries, supporting institutions (e.g., VTLS
    union catalog)
  • Extremely heterogeneous
  • Autonomy of management and decentralization
  • Disparate protocols, metadata, repositories
    (e.g., UMI, OCLCs WorldCat), language,
    encodings, user characteristics and preferences

5
NDLTD (2 of 3)
  • Worldwide organization educational/social
    context
  • National/regional projects in Australia,
    Catalunya, Germany, India, Latin America
    (UNESCO/OAS/ISTEC), South Africa (Mellon), USA
    (including OhioLINK),
  • International conference (225 in March 2000, more
    expected for next, at Caltech)
  • Steering committee representing supporting groups
    as well as the hundreds of universities

6
NDLTD (3 of 3)
  • Unique collection discipline/document context
  • Multilingual and multimedia content
  • Large book-size documents
  • Full-content in several formats (XML, PDF, etc.)
  • Large number of bibliographic references
  • Several sets of metadata with different ranges of
    quality, that can fit with the Open Archives
    Initiative (www.openarchives.org)

7
Harvesting Strategies
  • Harvesting vs. Federated Search
  • Harvesting plus Federated Search
  • Plus local collections
  • The NDLTD Union Collection
  • Multiple Harvesting Protocols
  • Harvest System
  • Z39.50
  • Dienst
  • OAI

8
Union Collection Architecture
9
Open Archives Initiative (OAI)
  • Interoperability Standards Released - Jan/Feb
  • Data Service Providers
  • Metadata Harvesting Protocol
  • Unique identifiers (URNs) for each record
  • Date-stamp for each record when last
    modified/created/deleted
  • HTTP server with scripting capabilities
  • 6 Service requests (verbs)
  • Identify, ListMetaFormats, ListSets
  • ListIdentifiers, GetRecord, ListRecords

10
low-barrier interop umbrella
metadata
herbert van de sompel
11
OAI harvesting tools
service provider harvester
data provider repository
Datestamp Identifier Set
Records
herbert van de sompel
12
OAI harvesting tools
service provider harvester
data provider repository
  • Supporting protocol requests
  • Identify
  • ListMetadataFormats
  • ListSets
  • Harvesting protocol requests
  • ListRecords
  • ListIdentifiers
  • GetRecord

herbert van de sompel
13
Design Features
  • Combined Harvesting, Federated Search, and Local
    Collections
  • Object-Oriented Information Graph Representation
  • 5S Model and 5SL Specification Language

14
MARIAN Middleware
  • Flexible Representation Model
  • Information Graph
  • Class Hierarchies
  • Weights and Weighted Sets (w. lazy eval)
  • Class-Based Search
  • Unified Searcher API
  • Combining Heterogeneous Information
  • Structural Matching
  • Synthetic Superclasses

15
Information Graph Model (1/2)
  • Each Information Object is a Node.
  • Structure exposed through Links
  • Features of interest can become Nodes
  • or can remain Hidden within Node Class Search
    Methods.

16
Information Graph Model (2/2)
17
Class-Based Search
  • Common Search Methods
  • Text
  • Link / Weighted Link
  • Node in Context
  • Common Searcher Operations
  • Match Best (weighted maximum)
  • Match Most (summative union)

18
Class-Based Search
  • public interface ClassManager
  • public WtdObjSet match(InfoDesc description)
  • public boolean isInClass(FullID id)
  • public Object idToObject(FullID id)
  • public Vector idsToObjects(Vector ids)

19
Class-Based Search
20
Combining Sources of Information
  • Structural Matching
  • Extends Weighted Retrieval to include Best Match
    to Document Structure
  • Recursive, Extensible
  • Collection Views
  • Simple Interface to Complex Collections
  • Common Interface to Diverse Collections
  • Weighted Interface to Collections of Varying
    Quality

21
NDLTD Collection View (part)
ThesisDissertation
ThesisDissertation
HasAuthor
Individual
HasAuthor
Individual
title
title
SubClasses
SubClasses
description
description
HasSubject
HasSubject
Subject
Subject
SubClasses
SubClasses
SubClasses
SubClasses
SubClasses
SubClasses
1.0
0.8
1.0
0.8
0.8
0.8
0.8
0.8
1.0
0.9
1.0
0.9
1.0
0.9
1.0
1.0
0.9
1.0
0.8
0.8
PhysDis-ETD (SOIF)
PhysDis-ETD (SOIF)
Dc.creator
HasDcCreator
Dc.creator
HasDcCreator
dc.title
dc.title
crawlerTitle
crawlerTitle
HasCrawlerAuthor
Individual
HasCrawlerAuthor
Individual
dc.description
dc.description
Dc.Subject
Dc.Subject
crawlerDescription
crawlerDescription
HasDcSubject
HasDcSubject
Headings
Headings
body
body
HasHeadings
HasHeadings
HasKeywords
HasKeywords
Keywords
Keywords
22
5S Model for Digital Libraries (1/2)
  • Formal Model
  • Streams
  • Structures
  • Spaces
  • Services
  • Societies

23
5S Model for Digital Libraries (2/2)
  • NDLTD / MARIAN Example
  • Document (presentable, indexable information
    object)
  • Weighted Set (e.g., of results to a match
    operation)
  • Collection Graph Inheritance Lattice Measure
    Space
  • Adaptive Search Query History Maintenance
  • Library End-Users DL Builders
  • Formal Model
  • Streams
  • Structures
  • Spaces
  • Services
  • Societies

24
5SL
  • Generates Digital Library (Components)

25
Generating Digital Libraries XML
26
Interoperability with 5S and 5SL
  • Reductionist / Constructivist Approach
  • Compositional mappings between DLs
  • Composition of S-based constructs
  • Mapping language

27
Student Projects to Integrate
  • Schedule-driven Harvester
  • SDI / Filtering for NDLTD
  • MARIAN-Phronesis (Spanish Monterrey) and work
    with German (Oldenburg / DFG), Portuguese,
    Chinese, Japanese, Korean
  • TREC data formatted for loading

28
Future Work
  • Fusion on hybrid architecture
  • Incorporation of belief networks
  • Using 5SL to generate wrappers
  • New services/ functionalities
  • Personalization (e.g., history, folders)
  • Visualization (e.g., Envision applet)
  • Integration with PetaPlex (100 nodes, 2.5 Tbytes
    disk capacity, gt 300 Mbps to campus backbone,
    Sornil inversion)

29
Conclusions
  • NDLTD provides a real, fertile, DL testbed.
  • Harvesting strategies and the OAI
  • MARIAN middleware graphs, classes, views
  • Generating Digital Libraries with 5SL
  • Future high performance services, experimental
    comparisons
Write a Comment
User Comments (0)
About PowerShow.com