A Web of Concepts - PowerPoint PPT Presentation

About This Presentation
Title:

A Web of Concepts

Description:

Transform hyperlinked bags of words into semantically rich ... 'Web of concepts is not a one time affair' Wrapper problems. Concept updates. Relevance Measures ... – PowerPoint PPT presentation

Number of Views:12
Avg rating:3.0/5.0
Slides: 22
Provided by: Andr981
Learn more at: https://www.deg.byu.edu
Category:
Tags: bags | concepts | web

less

Transcript and Presenter's Notes

Title: A Web of Concepts


1
A Web of Concepts
  • Dalvi, et al.
  • Presented by Andrew Zitzelberger

2
Vision
  • Transform hyperlinked bags of words into
    semantically rich aggregate view of information
    on the web.

3
Concept
  • Things of interest
  • Searching for information
  • Accomplishing a task
  • Reservations, etc.

4
Instances
  • Record of a concept
  • Restaurant
  • Gochi (19980 Homestead Rd Cupertino CA)
  • Academia?
  • Publications, research institutions

5
Instance Representation
  • Loosely-structured record (lrec)
  • Attribute-key, value pairs
  • Unique id field
  • Entity matching problem
  • Metadata
  • Attribute list

6
Domain
  • Set of related concepts
  • Academic community domain publications,
    people, conferences

7
Usage StudyInstance vs. Concept Search
  • yelp.com
  • Month of queries resulting in a click
    (restaurants)
  • 59 specific business URL
  • 19 search URL either specific business or group
  • 11 specific group URL

8
Usage StudyConcept Attribute Search
  • Remove restaurant name and location information
    from query
  • Co-occuring words
  • Menu (3), coupons (1.8), online, weekly
    specials, locations (1.5)
  • Nutrition, to go, delivery, careers, cod

9
Usage StudyAggregation Value
  • 59 clicked on at least one other URL
  • 35 clicked on at least two other URLs
  • Small manual evaluation indicates pages are often
    about the same business.

10
Usage StudyConcepts vs. Browsing
  • 42 of homepage visits are from search engine
  • Immediately following URL
  • 11.5 location
  • 9 menu
  • 1 coupons
  • 10.5 of user trails contain more than one
    distinct instance of the restaurant concept

11
Extraction
  • Create new records from the web
  • Information extraction
  • Linking
  • Analysis
  • Meta-data tagging (cuisine type)

12
Domain-centric vs. Site-centric Extraction
  • Site-centric extraction
  • Wrappers for page structure
  • Probabilistic models (CRF)
  • Domain-centric extraction
  • Fields of interest
  • Statistical properties (single zip code, etc.)
  • Structure components (lists, link relationships)

13
Domain-centric Extraction
  • Aggregator mining
  • Learn from extracted knowledge (similar menus)
  • Matching
  • Text is about a record (restaurant review)

14
ApplicationAggregation
15
ApplicationSession Optimization
  • User understanding
  • Historical modeling
  • Session modeling
  • Content understanding
  • Example Birks
  • Birks and Mayors (luxury Jewelers) vs. Birks
    Steakhouse

16
ApplicationBrowse Optimization
  • Alternatives (Restaurants)
  • Similar type of cuisine
  • Similar location
  • Similar quality
  • Augmentations (Camera)
  • Batteries
  • Memory cards

17
Concept Search
Result Pages shows multiple records Concept
Pages information about an instance Article
Pages a piece of authored text
18
Advertising
  • Increase in targeted advertisements
  • Target concepts rather than keywords

19
Challenges
  • Transfer learning
  • Transfer extractor knowledge
  • Tracking uncertainty
  • Accuracy issues
  • Web of concepts is not a one time affair
  • Wrapper problems
  • Concept updates
  • Relevance Measures
  • User satisfaction

20
Related Work
  • Information Extraction/Integration Systems
  • Dataspace Systems
  • Semantic Web

21
Future Work
  • Enrich representation model
  • Path storage to data
  • Provenance, versions, uncertainty
  • Hierarchal relationships (containment or
    inheritance)
  • Ranking of disparate sources
Write a Comment
User Comments (0)
About PowerShow.com