ARO Workshop on Abductive Systems - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

ARO Workshop on Abductive Systems

Description:

A random sample of homogeneous objects from single relation ... Links between references indicate relations between the entities ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 27
Provided by: get76
Category:

less

Transcript and Presenter's Notes

Title: ARO Workshop on Abductive Systems


1
Statistical Relational Learning for Abductive
Reasoning in Heterogeneous Environments
  • Lise Getoor
  • University of Maryland, College Park

2
What is SRL?
  • Traditional statistical machine learning
    approaches assume
  • A random sample of homogeneous objects from
    single relation
  • Traditional relational learning approaches
    assume
  • No noise or uncertainty in data
  • Real world data sets
  • Multi-relational and heterogeneous
  • Noisy and uncertain
  • Statistical Relational Learning (SRL)
  • newly emerging research area at the intersection
    of statistical models and relational
    learning/inductive logic programming
  • Sample Domains
  • web data, social networks, biological data,
    communication data, customer networks, sensor
    networks, natural language, vision,

3
SRL Theory
  • Methods that combine expressive knowledge
    representation formalisms such as relational and
    first-order logic with principled probabilistic
    and statistical approaches to inference and
    learning
  • Directed Approaches
  • Semantics based on Bayesian Networks
  • Frame-based Directed Models
  • Rule-based Directed Models
  • Undirected Approaches
  • Semantics based on Markov Networks
  • Frame-based Undirected Models
  • Rule-based Undirected Models

4
SRL Tasks
  • Entity Resolution
  • Link Prediction
  • Collective Classification
  • Information Diffusion
  • Community Discovery/Group Detection
  • Ontology Alignment

5
Entity Resolution
James Smith
John Smith
John Smith
Jim Smith
J Smith
James Smith
Jon Smith
Jonathan Smith
J Smith
Jonthan Smith
  • Issues
  • Identification
  • Disambiguation

6
Collective Entity Resolution
  • Relational Resolution References not observed
    independently use relations to improve
    identification disambiguation
  • Links between references indicate relations
    between the entities
  • Co-author relations for bibliographic data
  • To, cc lists for email
  • Collective Resolution jointly determining the
    entities and mappings

Pasula et al. 03, Ananthakrishna et al. 02,
Bhattacharya Getoor 04,06,07, McCallum
Wellner 04, Li, Morie Roth 05, Culotta
McCallum 05, Kalashnikov et al. 05, Chen, Li,
Doan 05, Singla Domingos 05, Dong et al. 05
7
Link Prediction
Node 1
Node 2
Email
chris_at_enron.com
liz_at_enron.com
IM
chris37
lizs22
TXT
555-450-0981
555-901-8812
8
? Links in Information Graph
Node 1
Node 2
Manager
Chris
Elizabeth
Father
Tim
Steve
9
Collective Classification
  • Relational Classification predicting the
    category of an object based on its attributes and
    its links and attributes of linked objects
  • Collective Classification jointly predicting the
    categories for a collection of connected,
    unlabelled objects

Neville Jensen 00, Taskar , Abbeel Koller 02,
Lu Getoor 03, Neville, Jensen Galliger 04,
Sen Getoor TR07, Macskassy Provost 07, Gupta,
Diwam Sarawagi 07, Macskassy AAAI07, McDowell,
Gupta Aha AAAI07
10
Graph Identification
Data Graph ? Information Graph
  • Entity Resolution mapping email addresses to
    people
  • Link Prediction predicting social relationship
    based on communication
  • Collective Classification labeling nodes in the
    constructed social network

HP Labs, Huberman Adamic
11
Putting it all together
  • Requires collective inference
  • Data is not IID
  • Entity resolution, link prediction and
    classification decisions cannot be made
    independently!
  • Much interesting research within the machine
    learning community currently in how to put these
    together effectively

12
Abductive SRL
  • Need to be able to use query and observations to
    guide the construction of the SRL model
  • Need to reason about relevance, ambiguity and
    costs in order to decide what information to
    acquire
  • Using both relational background knowledge
  • And statistical/probabilistic models
  • Need computational mechanisms that make the value
    of information computation in these rich domains
    tractable

13
Some first steps.
  • Query-time Entity Resolution
  • Bhattacharya Getoor, KDD06, AAAI06, JAIR to
    appear
  • Cost-sensitive Markov Networks
  • Sen Getoor, ICML06, DMKD to appear
  • VOILA Efficient Feature-value Acquisition for
    Classification
  • Bilgic Getoor, AAAI07

14
Query-time ER
  • Simple approach for resolving queries
  • Use attributes
  • Quick but not accurate
  • Use best techniques available
  • Collective resolution using relationships
  • How can localize collective resolution?
  • Two-phase collective resolution for query
  • Extract minimal set of relevant records
  • Collective resolution on extracted records

15
Extracting Relevant Records
Name expansion
Name expansion
Hyper-edge expansion
Query
Level 0
Level 1
Level 2
S. Johnson
P4 Stephen C. Johnson P5 S.
Johnson P2 S. Johnson P1 S. Johnson
P4 Alfred V. Aho P5 A. Aho P4
Jefferey D. Ullman P5 J. Ullman P2 K.
McManus P2 C. Walshaw P1 C. Walshaw
P A. Aho P Alfred V. Aho P J.
Ullman P Jefferey D. Ullman P K.
McManus P K. McManus P C. Walshaw P C.
Walshaw
Start with query name or record
  • Alternate between
  • Name expansion For any relevant record, include
    other records with that name
  • Hyper-edge Expansion For any relevant record,
    include other related records

16
Adaptive Expansion for a Query
  • Too many records with unconstrained expansion
  • Adaptively select records based on ambiguity
  • Smith is more ambiguous than McManus
  • Adaptive Name Expansion
  • Expand the more ambiguous records
  • They need extra evidence
  • Adaptive Hyper-edge expansion
  • Add fewer ambiguous records
  • They lead to imprecision

17
Query-time ER Results
  • Unconstrained expansion
  • Collective resolution more accurate
  • Accuracy improves beyond depth 1

A pair-wise attributes similarity AN also
neighbors attributes transitive closure
  • Adaptive expansion
  • Minimal loss in accuracy
  • Dramatic reduction in query processing time

AX-2 adaptive expansion at depths 2 and
beyond AX-1 adaptive expansion even at depth 1
18
Cost-sensitive Markov Networks
  • Need for cost-sensitive classification for
    structured domains
  • Developed a framework for cost-sensitive maximum
    entropy classifier
  • Evaluated on synthetic and real sensor network
    data

19
Sensor net data
  • Used Intel Lab Dataset
  • 2M records describing temperature, humidity,
    light and sensor voltage
  • Task predict light values
  • Misclassification costs based on
  • if light is insufficient but predicted to be
    sufficient incur occupant discomfort
  • if light is sufficient but predicted to be
    insufficient incur excess electricity costs

20
CSMN Results
21
Efficient Feature Acquisition
  • Problem Selecting the best attributes to
    acquire, given rich cost and probabilistic
    dependence structure
  • Requires many expected value of information
    calculations
  • Value Of Information LAttice (VOILA) is a
    directed graph whose
  • nodes correspond to only the relevant subsets
  • exploiting constraints on the feature sets
  • edges represent subset relationships between its
    nodes
  • exploiting subset relationship for EVI
    computation sharing
  • Different acquisition strategies FF, SS, SF

22
Datasets
On average, 1/3 of VOILA nodes shared the same
EVI.
23
Results - Heart
24
Next Steps
  • Cost-sensitive query-time adaptive information
    gathering
  • Complexity of the integrated SRL tasks require
    flexible, adaptive algorithms which retrieve
    relevant information in real time
  • Inference and learning needs to be scalable and
    real time
  • Methods need to take complex cost models into
    account
  • Some related areas to keep in mind
  • Visual Analytics complexity of the integrated
    SRL tasks require sophisticated user interfaces
    which allow user feedback and support explanation
  • Probabilistic Databases currently a resurgence
    of work in this area in the DB community

25
Thanks
httpwww.cs.umd.edu/getoor
Work sponsored by the National Science
Foundation, Google, Microsoft, KDD program and
National Geospatial Agency
26
ILIADS
  • Goal
  • Produce high-quality integration via a flexible
    method able to adapt to a wide variety of
    ontology sizes and structures
  • Method
  • Combining statistical and logical inference
  • Use schema (structure) and data (instances)
    effectively
  • Solution
  • Integrated Learning In Alignment of Data and
    Schema (ILIADS)
  • Datasets and code available athttp//www.cs.umd.
    edu/linqs/projects/iliads
Write a Comment
User Comments (0)
About PowerShow.com