YouTube http://www.youtube.com/ Internet-based informatio - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

YouTube http://www.youtube.com/ Internet-based informatio

Description:

YouTube http://www.youtube.com/ Internet-based information sharing ... VR Cluster of data units with similar topics. Need an on-line one pass clustering model ... – PowerPoint PPT presentation

Number of Views:88
Avg rating:3.0/5.0
Slides: 14
Provided by: wwwdbC
Category:

less

Transcript and Presenter's Notes

Title: YouTube http://www.youtube.com/ Internet-based informatio


1
One Table Stores All Enabling Painless
Free-and-Easy Data Publishing and Sharing
  • Bei Yu1, Guoliang Li2, Beng Chin Ooi1, Li-zhu
    Zhou2
  • 1National University of Singapore
  • 2Tsinghua University

2
Folksonomy (folktaxonomy)
  • Examples
  • Delicious http//del.icio.us/
  • Flickr http//www.flickr.com/
  • Google Base http//base.google.com/
  • YouTube http//www.youtube.com/
  • Internet-based information sharing methodology
  • Users collaboratively publish information
    resources, e.g., webpages, photos, using
    self-defined metadata
  • Users collaborative behavior decides the data
    semantics
  • System categorize information resources based on
    user-defined metadata, to facilitate searching,
    browsing, etc..

3
Our Attempt
  • Devise a general system framework for supporting
    folksonomy-based data sharing
  • Allows rich and flexible structure of the
    metadata (called data units) for describing
    published resources
  • Categorize data units
  • Efficiently store all data units
  • Provide browsing and querying services

4
Data Units
  • The metadata, called data unit, consists of
    user-created title, fields (attributes and
    values), tags

5
Data Model
  • A generic relational table for storing all data
    units, e.g.
  • A set of virtual relations (VR) as views over the
    generic table, as querying interface, e.g.

VR1
VR2
6
System Framework
queries
7
Data Units Categorizer
  • Constructs and maintains VRs dynamically as data
    units are published constantly
  • Clustering based on attributes and tags
  • VR Cluster of data units with similar topics
  • Need an on-line one pass clustering model
  • Accepts a data unit u, and extracts its
    attributes and tags
  • Compare u with existing VRs, and assigns it to
    the ones that results in a match
  • If no suitable VR for u, create a new VR with u
    as the only tuple

8
Challenges for Categorizing
  • Uncontrolled vocabulary for both attributes and
    tags
  • Large portion of noise, very infrequent
  • The number of unique attributes and tags keeps
    growing
  • Problems with synonyms, polysemy, etc.

9
Our Current Approach
  • Characterize each VR with sets of popular
    attributes (PAS) and tags (PTS), for representing
    the dominating features
  • Compare new data units with PAS and PTS, for
    limiting the affect of noise
  • Maintain PAS and PTS when assigning each new data
    unit

10
Storage Manager
  • Function
  • Store and index the generic table (very sparse)
  • maintain mappings with VRs
  • Challenge
  • Space efficiency
  • Scalable over the number of attributes and data
    volume
  • Be efficient for both retrieval and update

11
Storage with Sparse Table
  • Only storing non-null values for each tuple
  • Build inverted index over attributes for
    processing attribute-based queries
  • Build inverted index over keywords for processing
    keyword queries
  • Other approaches? Bitmap index?

12
Browsing and Query Processing
  • The VRs are ordered based on popularity for
    browsing
  • May be presented in different views, e.g., based
    on attributes or based on tags
  • Support both keyword query and structured query
  • Inverted index
  • Effective ranking

13
Conclusion
  • We have presented the design for a
    folksonomy-based data sharing system
  • We devise a generic table data model for
    representing and storing the data units
  • Future work
  • Port the system into P2P networks
Write a Comment
User Comments (0)
About PowerShow.com