Relational Learning - PowerPoint PPT Presentation

1 / 13

About This Presentation

Title:

Relational Learning

Description:

Task Predict high-probability instances of identity-theft fraud for a ... Post-hoc analysis of 9/11 hijackers (Krebs 2001) KDL. What's new? Direct analysis ... – PowerPoint PPT presentation

Number of Views:20

Avg rating:3.0/5.0

Slides: 14

Provided by: davi413

Category:

more less

Transcript and Presenter's Notes

Title: Relational Learning

1
Relational Learning Link Analysis
David Jensen Knowledge Discovery
LaboratoryComputer Science DepartmentUniversity
of MassachusettsAmherst, Massachusetts, USA
2
Example FindingsPredicting Cell Phone Fraud
Called

Data Phones calls from a single US city over
three months
1.5M objects (phones)
7M links (call volume)
Task Predict high-probability instances of
identity-theft fraud for a future month

CalledWeightDistanceSource
Phone
Phone
NumberTypeFraud?
Called

Result Fraud rings exist where multiple
fraudulent numbers call the same number

3
Relational knowledge discovery

Knowledge discovery in large sets of interrelated
entities with variables on both entities and
relations.

Statistics
Relational databases
Graph drawing
Artificial intelligence
drawing on social network analysis, graph
theory, inductive logic programming, citation
analysis, web analysis, link analysis, sequence
analysis, spatial and temporal statistics, and
others. Knowledge discovery for the New
Science of Networks
4
Whats unique about relational data?

Traditional work in statistics and knowledge
discovery assume data instances form a single
table.

Traditional statistical models assume
independence among instances (rows) and find
associations among the values of multiple
variables within a single instance.

Relational models assume dependence among
instances in different rows and tables and find
associations among these values.

5
Example tasks

Identifying fraudulent securities brokers
Partner National Association of Securities
Dealers
Data 650,000 brokers 5,000 firms 90,000
offices
Predicting peer-to-peer downloads
Partners UMass Office of Information
Technologies UMass CS Secure Internet and
Group-Networking Lab
Data 2000 students 1 million files
Catching identify theft in cellphone networks
Partner Large wireless service provider
Data 2 million numbers 5 million call
aggregates

6
Why is relational learning useful?

Integrate learning from multiple information
sourcesSources generate many interrelated
records with heterogeneous structure
Use context to understand informationSources may
generate many interrelated
Integrate time, space, and other
relationsPotential to produce integrated view of
many types of relations
Learning methods to match complexity of current
methods for knowledge representation reasoning

7
Post-hoc analysis of 9/11 hijackers
(Krebs 2001)
8
Whats new?

Direct analysisNo need to preprocess the datato
form propositional instances
Relational inferenceInferences for one object
can inform inferences about other
objects(Neville Jensen 2000 Taskar et al.
2002)
Data instances are dependentThe assumptions of
many standard statistical approaches are
violated(Jensen Neville 2002 Perlich and
Provost 2003)
Structure and attribute values of data can be
correlatedAlgorithms need to separate the
effects of structure from attribute values
(Jensen, Neville, Hay 2003)

9
Example Relational Probability Tree
(Neville, Jensen, Friedland, Hay 2003)
CV accuracy 91 AUC 85
10
Autocorrelation and effective sample size

The reliability of a statistical association
varies with sample size (N)
When evaluating the association between
characteristics of groups and their members, what
is the effective sample size?
N members
N groups
members N groups

A
B

A

(Jensen Neville 2002)
11
Whats difficult?

Relational learning and inferenceAccurate models
must consider at least the relational
neighborhood of a record, rather than only a
record alone
Non-independenceData instances are
non-independent, greatly complicating the
statistics of both learning and inference
Semi-structured dataGood analysis requires
frequent restructuring and reinterpretation of
the underlying structure of data
Preserving privacy in relational and distributed
data miningRecord linkage vs. aggregation

12
Technical approaches

New learning algorithmsRepresentations and
learning techniques that consider relational
structure and attributes when constructing models
New inference algorithmsMethods for applying
learning models that leverage relational
structure
Relational statisticsStatistical tests that
correctly adjust for characteristics of
relational data such as linkage and
autocorrelation
Semi-structured databases and transformation
techniquesDatabases and techniques that allow
rapid restructuring of large databases by end
users

13
Where to go for more information

1998 AAAI Fall Symposium on AI Link
AnalysisWeb-accessible papers
AAAI 2000 IJCAI 2003 Workshops on Learning
Statistical Models from Relational
DataWeb-accessible papersConsider attending
IJCAI 2003 workshop (send email)
KDD 2002 KDD 2003 Workshops on Multi-Relational
Data MiningSpecial issue of SIGKDD Explorations
(forthcoming)Consider attending KDD 2003
workshop
DARPAs Evidence Extraction and Link Discovery
ProgramPattern learning areaWork published at
ICML and SIGKDD in past two years