Automatic Facets: Faceted Navigation and Entity Extraction - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Automatic Facets: Faceted Navigation and Entity Extraction

Description:

Automatic Facets: Faceted Navigation and Entity Extraction Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services – PowerPoint PPT presentation

Number of Views:120
Avg rating:3.0/5.0
Slides: 27
Provided by: TomR97
Category:

less

Transcript and Presenter's Notes

Title: Automatic Facets: Faceted Navigation and Entity Extraction


1
Automatic FacetsFaceted Navigation and Entity
Extraction
  • Tom ReamyChief Knowledge Architect
  • KAPS Group
  • Knowledge Architecture Professional Services
  • http//www.kapsgroup.com

2
Agenda
  • Introduction Elements
  • Facets, Taxonomies, Software, People
  • 3 Environments
  • E-Commerce, Enterprise, Internet
  • Design Issues Facets and Entities
  • Conclusion Integrated Solution

3
KAPS Group General
  • Knowledge Architecture Professional Services
  • Virtual Company Network of consultants 12-15
  • Partners Inxight, FAST, etc.
  • Consulting, Strategy, Knowledge architecture
    audit
  • Taxonomies Enterprise, Marketing, Insurance,
    etc.
  • Services
  • Taxonomy development, consulting, customization
  • Technology Consulting Search, CMS, Portals,
    etc.
  • Metadata standards and implementation
  • Knowledge Management Collaboration, Expertise,
    e-learning
  • Applied Theory Faceted taxonomies, complexity
    theory, natural categories

4
Elements
  • Facet orthogonal dimension of metadata
  • Entity / Noun Phrase metadata value of a facet
  • Entity extraction feeds facets, signature,
    ontologies
  • Taxonomy and categorization rules
  • Auto-categorization aboutness, subject facets
  • People tagging, evaluating tags, fine tune
    rules and taxonomy

5
Essentials of Facets
  • Facets are not categories
  • Categories are what a document is about limited
    number
  • Entities are contained within a document any
    number
  • Facets are orthogonal mutually exclusive
    dimensions
  • An event is not a person is not a document is not
    a place.
  • Facets variety of units, of structure
  • Numerical range (price), Location big to small
  • Alphabetical, Hierarchical taxonomic
  • Facets are designed to be used in combination
  • Wine where color red, price excessive,
    location Calirfornia,
  • And sentiment snotty

6
Advantages of Faceted Navigation
  • More intuitive easy to guess what is behind
    each door
  • Simplicity of internal organization
  • 20 questions we know and use
  • Dynamic selection of categories
  • Allow multiple perspectives
  • Ability to Handle Compound Subjects
  • Systematic Advantages fewer elements
  • 4 facets of 10 nodes 10,000 node taxonomy
  • Ability to Handle Compound Subjects
  • Flexible can be combined with other navigation
    elements

7
Essentials of TaxonomiesInternal Organization
  • Formal Taxonomy parent child relationship
  • Is-A-Kind-Of ---- Animal Mammal Zebra
  • Partonomy Is-A-Part-Of ---- US-California-Oaklan
    d
  • Browse Classification cluster of related
    concepts
  • Food and Dining Catering Restaurants
  • Taxonomies deal with complex, not compound
  • Conceptual relationships category membership
  • Contextual relationships Computers Software
  • Taxonomies deal with semantics documents
  • Multiple meanings and purposes
  • Essential attributes of documents are not single
    value

8
Developing Facets Tools and TechniquesSoftware
Tools
  • Text Analytics Taxonomy management, entity
    extraction, categorization, sentiment
  • Search Integrated features, at index, Internet
    sources
  • CM Enterprise environment, taggers and policy
  • Programmable Rules
  • Business and Subject matter expertise
  • Auto-populate variety of metadata author,
    title, date, etc.
  • Relevance best bets to weights and classes of
    documents
  • People refine, monitor its not automatic

9
Developing Facets Tools and TechniquesSoftware
Tools Auto-categorization
  • Auto-categorization
  • Training sets Bayesian, Vector Machine
  • Terms literal strings, stemming, dictionary of
    related terms
  • Rules simple position in text (Title, body,
    url)
  • Advanced saved search queries (full search
    syntax)
  • NEAR, SENTENCE, PARAGRAPH
  • Boolean X NEAR Y and Not-Z
  • Advanced Features
  • Facts / ontologies /Semantic Web RDF
  • Sentiment Analysis positive, negative, neutral

10
Developing Facets Tools and TechniquesSoftware
Tools Entity Extraction
  • Dictionaries variety of entities, coverage,
    specialty
  • Cost of update service or in-house
  • Inxight 50 predefined entity types
  • Nstein 800,000 people, 700,000 locations,
    400,000 organizations
  • Rules
  • Capitalization, text Mr., Inc.
  • Advanced proximity and frequency of actions,
    associations
  • Need people to continually refine the rules
  • Entities and Categorization
  • Total number and pattern of entities a type of
    aboutness of the document Bar Code, Fingerprint

11
Elements People
  • Programmers, Librarians, Taxonomists, Metadata
    specialist
  • Integrate, design, develop rules, monitor
    activity quality
  • Authors, Subject Matter Experts
  • Input into design (important facets), rules,
    activity meaning
  • Users Web 2.0
  • Feedback quality and usability
  • Suggestions missing terms, bad categorization
    entity
  • Tags Clouds folksonomy for social networking
    features, not for information retrieval

12
Three Environments
  • E-Commerce
  • Catalogs, small uniform collections of entities
  • Uniform behavior buy this
  • Enterprise
  • More content, more types of content
  • Enterprise Tools Search, ECM
  • Publishing Process tagging, metadata standards
  • Internet
  • Wildly different amount and type of content, no
    taggers
  • General Purpose Flickr, Yahoo
  • Vertical Portal selected content, no taggers

13
Three Environments E-Commerce
14
Three Environments E-Commerce
15
Enterprise Environment When and how add metadata
  • Enterprise Content different world than
    eCommerce
  • More Content, more kinds, more unstructured
  • Not a catalog to start less metadata and
    structured content
  • Complexity -- not just content but variety of
    users and activities
  • Combination of human and automatic metadata ECM
  • Software aided - suggestions, entities,
    ontologies
  • Enterprise Question of Balance / strategy
  • More facets more findability (up to a point)
  • Fewer facets lower cost to tag documents
  • Issues
  • Not enough facets
  • Wrong set of facets business not information
  • Ill-defined facets too complex internal
    structure

16
Facets and Taxonomies Enterprise Environment
Case One Taxonomy, 7 facets
  • Taxonomy of Subjects / Disciplines
  • Science gt Marine Science gt Marine microbiology gt
    Marine toxins
  • Facets
  • Organization gt Division gt Group
  • Clients gt Federal gt EPA
  • Instruments gt Environmental Testing gt Ocean
    Analysis gt Vehicle
  • Facilities gt Division gt Location gt Building X
  • Methods gt Social gt Population Study
  • Materials gt Compounds gt Chemicals
  • Content Type Knowledge Asset gt Proposals

17
External Environment Text Mining, Vertical
Portals
  • Internet Content
  • Scale impacts design and technology speed of
    indexing
  • Limited control Association of publishers to
    selection of content to none
  • Major subtypes different rules metadata and
    results
  • Complex queries and alerts
  • Terrorism taxonomy geography people
    organizations
  • Text Mining
  • General or specific content and facets and
    categories
  • Dedicated tools or component of Portal internal
    or external
  • Vertical Portal
  • Relatively homogenous content and users
  • General range of questions

18
Internet Design
  • Subject Matter taxonomy Business Topics
  • Finance gt Currency gt Exchange Rates
  • Facets
  • Location gt Western World gt United States
  • People Alphabetical and/or Topical -
    Organization
  • Organization gt Corporation gt Car Manufacturing gt
    Ford
  • Date Absolute or range (1-1-01 to 1-1-08, last
    30 days)
  • Publisher Alphabetical and/or Topical
    Organization
  • Content Type list newspapers, financial
    reports, etc.

19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22
Integrated Facet ApplicationDesign Issues -
General
  • What is the right combination of elements?
  • Faceted navigation, metadata, browse, search,
    categorized search results, file plan
  • What is the right balance of elements?
  • Dominant dimension or equal facets
  • Browse topics and filter by facet
  • When to combine search, topics, and facets?
  • Search first and then filter by topics / facet
  • Browse/facet front end with a search box

23
Integrated Facet ApplicationDesign Issues -
General
  • Homogeneity of Audience and Content
  • Model of the Domain broad
  • How many facets do you need?
  • More facets and let users decide
  • Allow for customization cant define a single
    set
  • User Analysis tasks, labeling, communities
  • Issue labels that people use to describe their
    business and label that they use to find
    information
  • Match the structure to domain and task
  • Users can understand different structures

24
Automatic Facets Special Issues
  • Scale requires more automated solutions
  • More sophisticated rules
  • Rules to find and populate existing metadata
  • Variety of types of existing metadata
    Publisher, title, date
  • Multiple implementation Standards Last Name,
    First / First Name, Last
  • Issue of disambiguation
  • Same person, different name Henry Ford, Mr.
    Ford, Henry X. Ford
  • Same word, different entity Ford and Ford
  • Number of entities and thresholds per results set
    / document
  • Usability, audience needs
  • Relevance Ranking number of entities, rank of
    facets

25
Putting it all together Infrastructure Solution
  • Facets, Taxonomies, Software, People
  • Combine formal power with ability to support
    multiple user perspectives
  • Facet System interdependent, map of domain
  • Entity extraction feeds facets, signatures,
    ontologies
  • Taxonomy Auto-categorization aboutness,
    subject
  • People tagging, evaluating tags, fine tune
    rules and taxonomy
  • The future is the combination of simple facets
    with rich taxonomies with complex semantics /
    ontologies

26
Questions?
  • Tom Reamytomr_at_kapsgroup.com
  • KAPS Group
  • Knowledge Architecture Professional Services
  • http//www.kapsgroup.com
Write a Comment
User Comments (0)
About PowerShow.com