Title: Automatic Facets: Faceted Navigation and Entity Extraction
1Automatic FacetsFaceted Navigation and Entity
Extraction
- Tom ReamyChief Knowledge Architect
- KAPS Group
- Knowledge Architecture Professional Services
- http//www.kapsgroup.com
2Agenda
- Introduction Elements
- Facets, Taxonomies, Software, People
- 3 Environments
- E-Commerce, Enterprise, Internet
- Design Issues Facets and Entities
- Conclusion Integrated Solution
3KAPS Group General
- Knowledge Architecture Professional Services
- Virtual Company Network of consultants 12-15
- Partners Inxight, FAST, etc.
- Consulting, Strategy, Knowledge architecture
audit - Taxonomies Enterprise, Marketing, Insurance,
etc. - Services
- Taxonomy development, consulting, customization
- Technology Consulting Search, CMS, Portals,
etc. - Metadata standards and implementation
- Knowledge Management Collaboration, Expertise,
e-learning - Applied Theory Faceted taxonomies, complexity
theory, natural categories
4Elements
- Facet orthogonal dimension of metadata
- Entity / Noun Phrase metadata value of a facet
- Entity extraction feeds facets, signature,
ontologies - Taxonomy and categorization rules
- Auto-categorization aboutness, subject facets
- People tagging, evaluating tags, fine tune
rules and taxonomy
5Essentials of Facets
- Facets are not categories
- Categories are what a document is about limited
number - Entities are contained within a document any
number - Facets are orthogonal mutually exclusive
dimensions - An event is not a person is not a document is not
a place. - Facets variety of units, of structure
- Numerical range (price), Location big to small
- Alphabetical, Hierarchical taxonomic
- Facets are designed to be used in combination
- Wine where color red, price excessive,
location Calirfornia, - And sentiment snotty
6Advantages of Faceted Navigation
- More intuitive easy to guess what is behind
each door - Simplicity of internal organization
- 20 questions we know and use
- Dynamic selection of categories
- Allow multiple perspectives
- Ability to Handle Compound Subjects
- Systematic Advantages fewer elements
- 4 facets of 10 nodes 10,000 node taxonomy
- Ability to Handle Compound Subjects
- Flexible can be combined with other navigation
elements
7Essentials of TaxonomiesInternal Organization
- Formal Taxonomy parent child relationship
- Is-A-Kind-Of ---- Animal Mammal Zebra
- Partonomy Is-A-Part-Of ---- US-California-Oaklan
d - Browse Classification cluster of related
concepts - Food and Dining Catering Restaurants
- Taxonomies deal with complex, not compound
- Conceptual relationships category membership
- Contextual relationships Computers Software
- Taxonomies deal with semantics documents
- Multiple meanings and purposes
- Essential attributes of documents are not single
value
8Developing Facets Tools and TechniquesSoftware
Tools
- Text Analytics Taxonomy management, entity
extraction, categorization, sentiment - Search Integrated features, at index, Internet
sources - CM Enterprise environment, taggers and policy
- Programmable Rules
- Business and Subject matter expertise
- Auto-populate variety of metadata author,
title, date, etc. - Relevance best bets to weights and classes of
documents - People refine, monitor its not automatic
9Developing Facets Tools and TechniquesSoftware
Tools Auto-categorization
- Auto-categorization
- Training sets Bayesian, Vector Machine
- Terms literal strings, stemming, dictionary of
related terms - Rules simple position in text (Title, body,
url) - Advanced saved search queries (full search
syntax) - NEAR, SENTENCE, PARAGRAPH
- Boolean X NEAR Y and Not-Z
- Advanced Features
- Facts / ontologies /Semantic Web RDF
- Sentiment Analysis positive, negative, neutral
10Developing Facets Tools and TechniquesSoftware
Tools Entity Extraction
- Dictionaries variety of entities, coverage,
specialty - Cost of update service or in-house
- Inxight 50 predefined entity types
- Nstein 800,000 people, 700,000 locations,
400,000 organizations - Rules
- Capitalization, text Mr., Inc.
- Advanced proximity and frequency of actions,
associations - Need people to continually refine the rules
- Entities and Categorization
- Total number and pattern of entities a type of
aboutness of the document Bar Code, Fingerprint
11Elements People
- Programmers, Librarians, Taxonomists, Metadata
specialist - Integrate, design, develop rules, monitor
activity quality - Authors, Subject Matter Experts
- Input into design (important facets), rules,
activity meaning - Users Web 2.0
- Feedback quality and usability
- Suggestions missing terms, bad categorization
entity - Tags Clouds folksonomy for social networking
features, not for information retrieval
12Three Environments
- E-Commerce
- Catalogs, small uniform collections of entities
- Uniform behavior buy this
- Enterprise
- More content, more types of content
- Enterprise Tools Search, ECM
- Publishing Process tagging, metadata standards
- Internet
- Wildly different amount and type of content, no
taggers - General Purpose Flickr, Yahoo
- Vertical Portal selected content, no taggers
13Three Environments E-Commerce
14Three Environments E-Commerce
15Enterprise Environment When and how add metadata
- Enterprise Content different world than
eCommerce - More Content, more kinds, more unstructured
- Not a catalog to start less metadata and
structured content - Complexity -- not just content but variety of
users and activities - Combination of human and automatic metadata ECM
- Software aided - suggestions, entities,
ontologies - Enterprise Question of Balance / strategy
- More facets more findability (up to a point)
- Fewer facets lower cost to tag documents
- Issues
- Not enough facets
- Wrong set of facets business not information
- Ill-defined facets too complex internal
structure
16Facets and Taxonomies Enterprise Environment
Case One Taxonomy, 7 facets
- Taxonomy of Subjects / Disciplines
- Science gt Marine Science gt Marine microbiology gt
Marine toxins - Facets
- Organization gt Division gt Group
- Clients gt Federal gt EPA
- Instruments gt Environmental Testing gt Ocean
Analysis gt Vehicle - Facilities gt Division gt Location gt Building X
- Methods gt Social gt Population Study
- Materials gt Compounds gt Chemicals
- Content Type Knowledge Asset gt Proposals
17External Environment Text Mining, Vertical
Portals
- Internet Content
- Scale impacts design and technology speed of
indexing - Limited control Association of publishers to
selection of content to none - Major subtypes different rules metadata and
results - Complex queries and alerts
- Terrorism taxonomy geography people
organizations - Text Mining
- General or specific content and facets and
categories - Dedicated tools or component of Portal internal
or external - Vertical Portal
- Relatively homogenous content and users
- General range of questions
18Internet Design
- Subject Matter taxonomy Business Topics
- Finance gt Currency gt Exchange Rates
- Facets
- Location gt Western World gt United States
- People Alphabetical and/or Topical -
Organization - Organization gt Corporation gt Car Manufacturing gt
Ford - Date Absolute or range (1-1-01 to 1-1-08, last
30 days) - Publisher Alphabetical and/or Topical
Organization - Content Type list newspapers, financial
reports, etc.
19(No Transcript)
20(No Transcript)
21(No Transcript)
22Integrated Facet ApplicationDesign Issues -
General
- What is the right combination of elements?
- Faceted navigation, metadata, browse, search,
categorized search results, file plan - What is the right balance of elements?
- Dominant dimension or equal facets
- Browse topics and filter by facet
- When to combine search, topics, and facets?
- Search first and then filter by topics / facet
- Browse/facet front end with a search box
23Integrated Facet ApplicationDesign Issues -
General
- Homogeneity of Audience and Content
- Model of the Domain broad
- How many facets do you need?
- More facets and let users decide
- Allow for customization cant define a single
set - User Analysis tasks, labeling, communities
- Issue labels that people use to describe their
business and label that they use to find
information - Match the structure to domain and task
- Users can understand different structures
24Automatic Facets Special Issues
- Scale requires more automated solutions
- More sophisticated rules
- Rules to find and populate existing metadata
- Variety of types of existing metadata
Publisher, title, date - Multiple implementation Standards Last Name,
First / First Name, Last - Issue of disambiguation
- Same person, different name Henry Ford, Mr.
Ford, Henry X. Ford - Same word, different entity Ford and Ford
- Number of entities and thresholds per results set
/ document - Usability, audience needs
- Relevance Ranking number of entities, rank of
facets
25Putting it all together Infrastructure Solution
- Facets, Taxonomies, Software, People
- Combine formal power with ability to support
multiple user perspectives - Facet System interdependent, map of domain
- Entity extraction feeds facets, signatures,
ontologies - Taxonomy Auto-categorization aboutness,
subject - People tagging, evaluating tags, fine tune
rules and taxonomy - The future is the combination of simple facets
with rich taxonomies with complex semantics /
ontologies
26 Questions?
- Tom Reamytomr_at_kapsgroup.com
- KAPS Group
- Knowledge Architecture Professional Services
- http//www.kapsgroup.com