Organizing the Web: Semiautomatic Construction of a Faceted Scheme - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Organizing the Web: Semiautomatic Construction of a Faceted Scheme

Description:

scale-up problem of machine learning approaches ... establish citation order for combining facets ... modify or adapt citation order to create customized ... – PowerPoint PPT presentation

Number of Views:105
Avg rating:3.0/5.0
Slides: 19
Provided by: kiduk
Category:

less

Transcript and Presenter's Notes

Title: Organizing the Web: Semiautomatic Construction of a Faceted Scheme


1
Organizing the Web Semi-automatic
Constructionof a Faceted Scheme
  • Kiduk Yang, Elin K. Jacob, Aaron Loehrlein,
    Seungmin Lee, Ning Yu
  • School of Library and Information ScienceIndiana
    University, USA

2
Outline
  • Introduction
  • Construction of a Faceted Scheme
  • Generalized Approach to Constructing a Faceted
    Vocabulary

3
Introduction Background
  • WIDIT (Web Information Discovery Integrated Tool)
  • IU-SLIS research group
  • Research area
  • Information Retrieval, Classification, Fusion
  • Major Projects
  • CSKD (Classification-based Search Knowledge
    Discovery)
  • DGov (Digital Government)
  • TREC (Text REtrieval Conference)
  • VCoB (Virtual Collection Builder)
  • http//widit.slis.indiana.edu/

4
Introduction CSKD Overview
  • Aim
  • Dynamic, flexible, and effective information
    retrieval and knowledge discovery
  • Assumptions
  • Path to knowledge is not deterministic
  • Individual weakness complementary strengths of
    evidences, methods, processes
  • Approaches
  • Leveraging of multiple sources of evidence
  • Integration of information retrieval and
    knowledge organization methods
  • Combination of automatic and manual processes

5
Introduction CSKD Architecture
Metadata Scheme Construction
Metadata Indexing
Content Indexing
Faceted Vocabulary Construction
Manual Classification
HeuristicsDiscovery
Inverted Index Creation
RDF Scheme Creation
Knowledge Base Harvesting
Automatic Classification
Hybrid Classification
Free Text Search
Database Search
Static Ontology Search Browse
Dynamic Ontology Search Browse
Dynamic Query RefinementIntegrated Search
Browse Flexible Knowledge Organization
6
Faceted Scheme Intro
  • Problems with unstructured Web search
  • too many or too few search results
  • failure of free-text searching to retrieve by
    concept
  • Renewed interest in traditional systems of
    representation and organization
  • classification/categorization
  • thesauri/controlled vocabularies
  • metadata (e.g. digital libraries)
  • ontologies (e.g. Semantic Web)

7
Faceted Scheme Intro
  • Impossibility of organizing entire Web
  • quantity, diversity, and dynamic nature of
    resources
  • scale-up problem of machine learning approaches
  • text categorization based on static
    classification scheme
  • Requires an organizational approach that
  • provides for flexibility of representation
  • accommodates dynamic nature of human knowledge
  • responds to the information needs of diverse and
    interdisciplinary searchers

8
Faceted Scheme Intro
  • Problems with traditional enumerative systems
  • top-down (data-independent) approach
  • fixed groupings (definition, membership)
  • inability to respond to dynamic nature of Web
    collections
  • Advantages of faceted systems
  • bottom-up (data-driven) approach
  • groupings created on an as-needed basis
  • dynamic and responsive to change

9
Faceted Scheme Construction
  • Overview of Faceted Scheme construction
  • Faceted Vocabulary
  • identify characteristics (color) and values (red,
    green) relevant to a domain
  • organize characteristics (facets) and associated
    values (isolates) as independent concept
    hierarchies
  • determine relationships between concept
    hierarchies
  • Faceted Classification
  • establish citation order for combining facets
  • create classes (and class structure) based on
    resource collection
  • Dynamic Faceted Classification
  • modify or adapt citation order to create
    customized classes (class structure) based on
    users immediate information need

10
Faceted Scheme Construction
  • Obstacles to Faceted Scheme construction
  • time-consuming
  • intellectually demanding
  • lack of standardized procedures
  • Research objectives
  • to reduce intellectual resources required to
    construct faceted schemes
  • to identify faceted scheme construction
    procedures that can be standardized
  • Approach
  • to develop semi-automatic process that augments
    the cognitive strengths of the human with the
    automatic processing capabilities of the machine

11
Faceted Vocabulary Methodology
  • Create lexicon base
  • Manually construct the faceted vocabulary
  • Analyze the manual construction process to
    identify cognitive strategies used
  • Construct heuristics for automating cognitive
    strategies
  • Suffix Heuristic
  • WordNet Heuristic
  • Concept Pairs Heuristic
  • Evaluate, modify, and validate heuristics
  • Devise a semi-automatic approach to constructing
    the faceted vocabulary that combines automatic
    (machine) and manual (human) processes.

12
Faceted Vocabulary Heuristics
  • Input lexicon base
  • Output groupings of conceptually related terms
  • Suffix Heuristic
  • organizes terms based on common word endings
  • steps
  • identify suffixes and meanings in dictionary
  • identify domain-specific suffixes and meanings
  • organize and conflate suffixes by meaning
  • apply suffix structure to generate groupings
  • validate output and refine heuristic if needed

example
13
Faceted Vocabulary Heuristics
  • WordNet Heuristic
  • groups terms by their position in the WordNet
    category hierarchy
  • act ? action ? change ? change of magnitude
    activity ? occupation ? accountancy
  • steps
  • submit terms from lexicon base to WordNet
  • group terms based on common WordNet category
  • validate output and refine heuristic if needed

14
Faceted Vocabulary Heuristics
  • Concept Pairs Heuristic
  • groups pairs of terms from noun phrases that
    share a common term
  • air pollution, water pollution, soil pollution ?
    air, water, soil
  • pollution control, pollution monitoring ?
    control, monitoring
  • steps
  • identify noun phrases from the lexicon base
  • group noun phrases sharing a common term based on
    position
  • strip out the common term
  • validate output and refine heuristic if needed

15
Faceted Vocabulary Construction Generalized
Model
16
Questions?
17
Organize suffixes/termsby specific meaning
  • entities
  • chemicals, chemical compounds
  • Carbon
  • Chlorofluorocarbons
  • Hydrochlorofluorocarbons
  • binary chemical compounds, compounds regarded as
    binary
  • Bromide
  • Chloride
  • Cyanide
  • Cyanides
  • Monoxide
  • Oxides
  • Radionuclides
  • Ride
  • chemical elements, chemical radicals, ions having
    a positive charge
  • Cadmium
  • Uranium
  • chemical radicals
  • Biphenyls
  • Butyl
  • Methyl
  • unsaturated carbon compounds
  • Benzene
  • Ethylbenzene
  • Scene
  • Styrene
  • Toluene
  • unsaturated hydrocarbons, bivalent radicals
  • Dichloroethylene
  • Perchloroethylene
  • Tetrachloroethylene
  • Trichloroethylene

18
Conflate suffixes/termsby general meaning
back
  • entities
  • chemicals, chemical compounds
  • Benzene
  • Biphenyls
  • Bromide
  • Butyl
  • Cadmium
  • Carbon
  • Chloride
  • Chlorofluorocarbons
  • Cyanide
  • Cyanides
  • Dichloroethylene
  • Ethylbenzene
  • Hydrochlorofluorocarbons
  • Methyl
  • Monoxide
  • Oxides
  • Perchloroethylene
  • Radionuclides
  • Ride
  • Scene
  • Styrene
  • Tetrachloroethylene
  • Toluene
  • Trichloroethylene
  • Uranium
Write a Comment
User Comments (0)
About PowerShow.com