Title: HighPerformance Digital Library Classification Systems:
1High-Performance Digital Library Classification
Systems
From Information Retrieval to Knowledge Management
PI Hsinchun Chen, The University of Arizona
DLI-2 All-Projects Meeting
Cornell, October 18-19, 1999
2Research Goals
- Automatic generation of large-scale
classification systems (CL)
- Integration of system and human-generated
classification systems
- High-performance simulation and visualization of
Object Oriented Hierarchical Automatic Yellowpage
(OOHAY)
3Testbed
- Geoscience Georef and Petroleum Abstracts
(800K) and Georef thesaurus (26K terms)
- Medicine CancerLit (1M) and
UMLS (250K concepts)
- The Web Indexable pages (10M)
and Yahoo directory (250K nodes)
4Partners
5The Field
Knowledge Management/Knowledge Networking
Definition
The Knowledge Networking (KN) initiative focuses
on the integration of knowledge from different
sources and domains across space and time... KN
research aims to move beyond connectivity to
achieve new levels of interactivity, increasing
the semantic bandwidth, knowledge bandwidth,
activity bandwidth, an cultural bandwidth among
people, organizations, and communities.
6The Field
Knowledge Management Functionality
(Source GartnerGroup, 1998)
Concept Yellow Pages
Retrieved Knowledge
- Clustering categorization table of contents
- Semantic Networks index
- Dictionaries
- Thesauri
- Linguistic analysis
- Data extraction
- Collaborative filters
- Communities
- Trusted advisor
- Expert identification
Semantic
Value Recommendation
Collaboration
7Automatic Generation of CL Foundation from
NSF/DARPA/NASA Digital Library Initiative-1
8Automatic Generation of CL Foundation from
NSF/DARPA/NASA Digital Library Initiative-1
9Automatic Generation of CL Foundation from
NSF/DARPA/NASA Digital Library Initiative-1
10Automatic Generation of CL Foundation from
NSF/DARPA/NASA Digital Library Initiative-1
11Automatic Generation of CL Foundation from
NSF/DARPA/NASA Digital Library Initiative-1
12Automatic Generation of CL Foundation from
NSF/DARPA/NASA Digital Library Initiative-1
13Automatic Generation of CL Foundation from
NSF/DARPA/NASA Digital Library Initiative-1
14Automatic Generation of CL
15Automatic Generation of CL (Continued)
- Entity Extraction and Co-reference based on TREC
and MUG
- Text segmentation and summarization based on
Textile and Wavelets
- Visualization techniques based on Fisheye,
Fractal, and Spotlight
16Integration of CL
- Lexicon-enhanced indexing (e.g., UMLS Specialist
Lexicon)
- Ontology-enhanced query expansion (e.g.,
WordNet, UMLS Metathesaurus)
- Ontology-enhanced semantic tagging (e.g., UMLS
Semantic Nets)
- Spreading-activation based term suggestion
(e.g., Hopfield net)
17High-performance Simulation and Visualization
- Algorithmic optimization and parallelization on
NCSA supercomputers (time machine)
- Advanced, interactive 2D/3D visualization via
Java, VRML, and OpenGL
18From YAHOO! To OOHAY?
Y
A
H
O
O
!
Object
Oriented
Hierarchical
Automatic
Yellowpage
?
19OOHAY Visualizing the Web
20OOHAY Visualizing the Web
21For project information and free download
http//ai.bpa.arizona.edu
OOHAY CI Spider, Meta Spider, Med Spider
1. Enter Starting URLs and Key Phrases to be
searched
2. Search results from spiders are displayed
dynamically
22For project information and free download
http//ai.bpa.arizona.edu
OOHAY CI Spider, Meta Spider, Med Spider
3. Noun Phrases are extracted from the web ages
and user can selected preferred phrases for
further summarization.
4. SOM is generated based on the phrases
selected. Steps 3 and 4 can be done in iterations
to refine the results.
23Digital Library Research on New York Times, Cover
article, Sep 30, 1999
24DL Special Issues and Activities
- Second Asia DL Workshop, November 8-9, 1999,
Taipei, Taiwan
- JASIS, 2000, forthcoming (Chen)
Berkeley (Wilensky), UCSB (Hill/Smith), Maryland
(Greene/Shneiderman), Xerox PARC (Baldonado), IBM
(Liu), Texas AM (Shipman/Furuta), NASA (Kaplan)