ARO Workshop on Abductive Systems - PowerPoint PPT Presentation

1 / 26

About This Presentation

Title:

ARO Workshop on Abductive Systems

Description:

A random sample of homogeneous objects from single relation ... Links between references indicate relations between the entities ... – PowerPoint PPT presentation

Number of Views:29

Avg rating:3.0/5.0

Slides: 27

Provided by: get76

Category:

more less

Transcript and Presenter's Notes

Title: ARO Workshop on Abductive Systems

1
Statistical Relational Learning for Abductive
Reasoning in Heterogeneous Environments

Lise Getoor
University of Maryland, College Park

2
What is SRL?

Traditional statistical machine learning
approaches assume
A random sample of homogeneous objects from
single relation
Traditional relational learning approaches
assume
No noise or uncertainty in data
Real world data sets
Multi-relational and heterogeneous
Noisy and uncertain
Statistical Relational Learning (SRL)
newly emerging research area at the intersection
of statistical models and relational
learning/inductive logic programming
Sample Domains
web data, social networks, biological data,
communication data, customer networks, sensor
networks, natural language, vision,

3
SRL Theory

Methods that combine expressive knowledge
representation formalisms such as relational and
first-order logic with principled probabilistic
and statistical approaches to inference and
learning
Directed Approaches
Semantics based on Bayesian Networks
Frame-based Directed Models
Rule-based Directed Models
Undirected Approaches
Semantics based on Markov Networks
Frame-based Undirected Models
Rule-based Undirected Models

4
SRL Tasks

Entity Resolution
Link Prediction
Collective Classification
Information Diffusion
Community Discovery/Group Detection
Ontology Alignment

5
Entity Resolution
James Smith
John Smith
John Smith
Jim Smith
J Smith
James Smith
Jon Smith
Jonathan Smith
J Smith
Jonthan Smith

Issues
Identification
Disambiguation

6
Collective Entity Resolution

Relational Resolution References not observed
independently use relations to improve
identification disambiguation
Links between references indicate relations
between the entities
Co-author relations for bibliographic data
To, cc lists for email
Collective Resolution jointly determining the
entities and mappings

Pasula et al. 03, Ananthakrishna et al. 02,
Bhattacharya Getoor 04,06,07, McCallum
Wellner 04, Li, Morie Roth 05, Culotta
McCallum 05, Kalashnikov et al. 05, Chen, Li,
Doan 05, Singla Domingos 05, Dong et al. 05
7
Link Prediction
Node 1
Node 2
Email
chris_at_enron.com
liz_at_enron.com
IM
chris37
lizs22
TXT
555-450-0981
555-901-8812
8
? Links in Information Graph
Node 1
Node 2
Manager
Chris
Elizabeth
Father
Tim
Steve
9
Collective Classification

Relational Classification predicting the
category of an object based on its attributes and
its links and attributes of linked objects
Collective Classification jointly predicting the
categories for a collection of connected,
unlabelled objects

Neville Jensen 00, Taskar , Abbeel Koller 02,
Lu Getoor 03, Neville, Jensen Galliger 04,
Sen Getoor TR07, Macskassy Provost 07, Gupta,
Diwam Sarawagi 07, Macskassy AAAI07, McDowell,
Gupta Aha AAAI07
10
Graph Identification
Data Graph ? Information Graph

Entity Resolution mapping email addresses to
people
Link Prediction predicting social relationship
based on communication
Collective Classification labeling nodes in the
constructed social network

HP Labs, Huberman Adamic
11
Putting it all together

Requires collective inference
Data is not IID
Entity resolution, link prediction and
classification decisions cannot be made
independently!
Much interesting research within the machine
learning community currently in how to put these
together effectively

12
Abductive SRL

Need to be able to use query and observations to
guide the construction of the SRL model
Need to reason about relevance, ambiguity and
costs in order to decide what information to
acquire
Using both relational background knowledge
And statistical/probabilistic models
Need computational mechanisms that make the value
of information computation in these rich domains
tractable

13
Some first steps.

Query-time Entity Resolution
Bhattacharya Getoor, KDD06, AAAI06, JAIR to
appear
Cost-sensitive Markov Networks
Sen Getoor, ICML06, DMKD to appear
VOILA Efficient Feature-value Acquisition for
Classification
Bilgic Getoor, AAAI07

14
Query-time ER

Simple approach for resolving queries
Use attributes
Quick but not accurate
Use best techniques available
Collective resolution using relationships
How can localize collective resolution?
Two-phase collective resolution for query
Extract minimal set of relevant records
Collective resolution on extracted records

15
Extracting Relevant Records
Name expansion
Name expansion
Hyper-edge expansion
Query
Level 0
Level 1
Level 2
S. Johnson
P4 Stephen C. Johnson P5 S.
Johnson P2 S. Johnson P1 S. Johnson
P4 Alfred V. Aho P5 A. Aho P4
Jefferey D. Ullman P5 J. Ullman P2 K.
McManus P2 C. Walshaw P1 C. Walshaw
P A. Aho P Alfred V. Aho P J.
Ullman P Jefferey D. Ullman P K.
McManus P K. McManus P C. Walshaw P C.
Walshaw
Start with query name or record

Alternate between
Name expansion For any relevant record, include
other records with that name
Hyper-edge Expansion For any relevant record,
include other related records

16
Adaptive Expansion for a Query

Too many records with unconstrained expansion
Adaptively select records based on ambiguity
Smith is more ambiguous than McManus
Adaptive Name Expansion
Expand the more ambiguous records
They need extra evidence
Adaptive Hyper-edge expansion
Add fewer ambiguous records
They lead to imprecision

17
Query-time ER Results

Unconstrained expansion
Collective resolution more accurate
Accuracy improves beyond depth 1

A pair-wise attributes similarity AN also
neighbors attributes transitive closure

Adaptive expansion
Minimal loss in accuracy
Dramatic reduction in query processing time

AX-2 adaptive expansion at depths 2 and
beyond AX-1 adaptive expansion even at depth 1
18
Cost-sensitive Markov Networks

Need for cost-sensitive classification for
structured domains
Developed a framework for cost-sensitive maximum
entropy classifier
Evaluated on synthetic and real sensor network
data

19
Sensor net data

Used Intel Lab Dataset
2M records describing temperature, humidity,
light and sensor voltage
Task predict light values
Misclassification costs based on
if light is insufficient but predicted to be
sufficient incur occupant discomfort
if light is sufficient but predicted to be
insufficient incur excess electricity costs

20
CSMN Results
21
Efficient Feature Acquisition

Problem Selecting the best attributes to
acquire, given rich cost and probabilistic
dependence structure
Requires many expected value of information
calculations
Value Of Information LAttice (VOILA) is a
directed graph whose
nodes correspond to only the relevant subsets
exploiting constraints on the feature sets
edges represent subset relationships between its
nodes
exploiting subset relationship for EVI
computation sharing
Different acquisition strategies FF, SS, SF

22
Datasets
On average, 1/3 of VOILA nodes shared the same
EVI.
23
Results - Heart
24
Next Steps

Cost-sensitive query-time adaptive information
gathering
Complexity of the integrated SRL tasks require
flexible, adaptive algorithms which retrieve
relevant information in real time
Inference and learning needs to be scalable and
real time
Methods need to take complex cost models into
account
Some related areas to keep in mind
Visual Analytics complexity of the integrated
SRL tasks require sophisticated user interfaces
which allow user feedback and support explanation
Probabilistic Databases currently a resurgence
of work in this area in the DB community

25
Thanks
httpwww.cs.umd.edu/getoor
Work sponsored by the National Science
Foundation, Google, Microsoft, KDD program and
National Geospatial Agency
26
ILIADS

Goal
Produce high-quality integration via a flexible
method able to adapt to a wide variety of
ontology sizes and structures
Method
Combining statistical and logical inference
Use schema (structure) and data (instances)
effectively
Solution
Integrated Learning In Alignment of Data and
Schema (ILIADS)
Datasets and code available athttp//www.cs.umd.
edu/linqs/projects/iliads