Title: Autonomous Ontology Extraction Using Hierarchical UNSO Hypercube Graph
1Autonomous Ontology Extraction Using Hierarchical
UNSO Hypercube Graph
- Yosi Ben-Asher
- Shlomo Berkovsky
- Yaniv Eytani
2Outline
- Data Integration
- UNSpecified Ontologies
- Hierarchical UNSO
- Autonomous Maintenance Operations
- Ontology Extraction
3Data Integration Motivation
- Consider two E-Commerce ads
- Selling red BMW car, 2000, good condition,
10,000 miles - Wanna buy second hand sports car with leather
seats - Different descriptions
- Might refer to the same object
Smart mechanism for integration of heterogeneous
data is needed
4Data Integration - Ontology
- Ontology captures the semantic relationships
between objects - Definition
- A shared formalization of a conceptualization of
a domain
5Semantic Data Management HuperCup
Schlosser et al.
- Implemented ontology-based
- data management over hypercube
- topology
- Objects are mapped to the hypercube graph
- using a predefined ontology.
- Efficient broadcast and search
- Achieved within a logarithmic in N (number of
users) steps
6Unspecified Ontology (UNSO)
- UNSO generalizes HyperCup by defining the notion
of UNSpecified Ontology - Ontology which is not specified
- i.e, no master ontology, each user specifies
his own ontology - UNSO acts as a classification method
- Similar objects that are mapped to the UNSO graph
are in a close proximity - i.e, in near vertices (nodes) in the hypercube
topology
7UNSO Unspecified Descriptions
- Each object is described as a vector of
(propval) pairs - Selling red BMW car, 2005, good condition, 10000
miles - becomes
- productcar, manufacturerBMW, colorred,
conditiongood, production_year2005,
mileage10000 - Each vector is projected on the UNSO graph
8UNSO Mapping Description vectors
- Description are mapped using hashing
- Different props are hashed to different
hypercube dimensions - vals are hashed to numeric values within the
respective dimension - How to handle ambiguity?
- Similar patterns for objects descriptions assumed
- Common sense
- Zipfs law
- props and vals undergo simple standardization
using lexical reference system (e.g., WordNet) - e.g., car auto automobile
9UNSO A Mapping Example
- Consider a simple ontology for cars domain
- Color dark0, bright1
- Gearbox automatic0, manual1
- Size small0, large1
- White manual Mini-Minor
- Is mapped to a vector (1,1,0)
gearbox
(1,1,1)
size
color
(0,0,0)
10UNSO Multi-Layered Hypercube
- MLH-UNSO is comprised of nodes recursively
- containing hypercubes
- Description vectors are
- partitioned between different layers
- Example for 3-dimensional hypercubes
- Higher layer (0,1,1)
- Lower layer (0,0,1)
(0,1,1,0,0,1)
11UNSO Pros and Cons
- No predefined ontology
- Unlimited range of props and vals
- Locality similar objects are mapped to adjacent
locations - All the props are of the same importance
- Un-weighted flat ontology
- Sparse and unbalanced distribution in the graph
- Queries processing complexity is fixed
- Inefficient processing of typical search queries
12HUNSO Hierarchical UNSO
- Some props are more significant
- manufacturer, color, production_year
- seats_type, wheels_type
- Significance of props is assumed to be
correlated with statistical frequency of - Appearance in objects descriptions
- Appearance in search queries descriptions
13Real-life data (130 E-Commerce ads)
14HUNSO - Layers
- Statistical frequencies of the properties are
collected over time - For example during users interactions
- Significant properties are located in the higher
layers, insignificant in the lower ones - Advantages
- Variable-length search operation
- Fast processing of popular queries
- Allows for self-management and load-balancing
- Autonomous ontology extraction
15HUNSO Autonomous Maintenance
- Ontology Evolves
- New types of objects, props and vals
- Old type become obsolete
- To maintain a dense ordered structure in the
hypercube, we use three autonomous operations - EXPAND convert a dense node into lower-layer
hypercube - SHRINK convert a sparse hypercube into
higher-level node - SWAP exchange between properties of 2
adjacent layers - Due to inconsistency of properties significance
order
16HUNSO EXPAND A Node
- Performed when
- A node gets overloaded we objects descriptions
- No single property dominates
- i.e. all props appear roughly with the same
frequency - Basic stages
- Choose K most frequent properties to form the
lower-layer hypercube dimensions - Remap the descriptions from the single node in
the higher-layer hypercube to the nodes of the
new lower-layer hypercube
17HUNSO EXPAND (for K3)
After
Before
18HUNSO SHRINK A Node
- Performed when
- A lower-layer hypercube is sparse
- Basic stages
- Remap the descriptions from the nodes of the
lower-layer hypercube to a single node of
higher-layer hypercube
19HUNSO Splitting (example)
After
Before
20HUNSO SWAP
- Performed when
- Lower hypercubes gets overloaded with objects
descriptions - props significance order is inconsistent between
layers - A prop in the lower layer appears more than a
prop in the higher-layer - Basic stages
- Find the dominating property in all lower-level
hypercubes - Find the least frequent property in higher-level
hypercube - Swap the respective dimensions between the layers
- Remap the descriptions from the nodes of the
lower-layer hypercubes to the higher-layer
hypercube and vice-versa
21HUNSO SWAP (example)
22Ontology Extraction
- Objects properties for each domain are ordered
in an hierarchical structure - Lower layers represents a specification of higher
layers - Consider red 2000 BMW vs. red 2000 BMW,
leather seats, manual, good condition - Domain ontology constantly evolves
- No data engineering by human experts is
required - Flexible ontologies, sensitive to dynamic changes
in the objects
HUNSO automatically captures the commonalities in
the underlying objects descriptions
23Future Work
- Using NLP tools for recognition of terms affinity
- Calculate weights and distances between the
properties and particular values - Ranking results
- Failed match, k nearest neighbors
- Using world facts from external knowledge
repositories
24Q A
Thank You!