Relational Learning - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

Relational Learning

Description:

Task Predict high-probability instances of identity-theft fraud for a ... Post-hoc analysis of 9/11 hijackers (Krebs 2001) KDL. What's new? Direct analysis ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 14
Provided by: davi413
Category:

less

Transcript and Presenter's Notes

Title: Relational Learning


1
Relational Learning Link Analysis
David Jensen Knowledge Discovery
LaboratoryComputer Science DepartmentUniversity
of MassachusettsAmherst, Massachusetts, USA
2
Example FindingsPredicting Cell Phone Fraud
Called
  • Data Phones calls from a single US city over
    three months
  • 1.5M objects (phones)
  • 7M links (call volume)
  • Task Predict high-probability instances of
    identity-theft fraud for a future month

CalledWeightDistanceSource
Phone
Phone
NumberTypeFraud?
Called
  • Result Fraud rings exist where multiple
    fraudulent numbers call the same number

3
Relational knowledge discovery
  • Knowledge discovery in large sets of interrelated
    entities with variables on both entities and
    relations.

Statistics
Relational databases
Graph drawing
Artificial intelligence
drawing on social network analysis, graph
theory, inductive logic programming, citation
analysis, web analysis, link analysis, sequence
analysis, spatial and temporal statistics, and
others. Knowledge discovery for the New
Science of Networks
4
Whats unique about relational data?
  • Traditional work in statistics and knowledge
    discovery assume data instances form a single
    table.
  • Traditional statistical models assume
    independence among instances (rows) and find
    associations among the values of multiple
    variables within a single instance.
  • Relational models assume dependence among
    instances in different rows and tables and find
    associations among these values.

5
Example tasks
  • Identifying fraudulent securities brokers
  • Partner National Association of Securities
    Dealers
  • Data 650,000 brokers 5,000 firms 90,000
    offices
  • Predicting peer-to-peer downloads
  • Partners UMass Office of Information
    Technologies UMass CS Secure Internet and
    Group-Networking Lab
  • Data 2000 students 1 million files
  • Catching identify theft in cellphone networks
  • Partner Large wireless service provider
  • Data 2 million numbers 5 million call
    aggregates

6
Why is relational learning useful?
  • Integrate learning from multiple information
    sourcesSources generate many interrelated
    records with heterogeneous structure
  • Use context to understand informationSources may
    generate many interrelated
  • Integrate time, space, and other
    relationsPotential to produce integrated view of
    many types of relations
  • Learning methods to match complexity of current
    methods for knowledge representation reasoning

7
Post-hoc analysis of 9/11 hijackers
(Krebs 2001)
8
Whats new?
  • Direct analysisNo need to preprocess the datato
    form propositional instances
  • Relational inferenceInferences for one object
    can inform inferences about other
    objects(Neville Jensen 2000 Taskar et al.
    2002)
  • Data instances are dependentThe assumptions of
    many standard statistical approaches are
    violated(Jensen Neville 2002 Perlich and
    Provost 2003)
  • Structure and attribute values of data can be
    correlatedAlgorithms need to separate the
    effects of structure from attribute values
    (Jensen, Neville, Hay 2003)

9
Example Relational Probability Tree
(Neville, Jensen, Friedland, Hay 2003)
CV accuracy 91 AUC 85
10
Autocorrelation and effective sample size
  • The reliability of a statistical association
    varies with sample size (N)
  • When evaluating the association between
    characteristics of groups and their members, what
    is the effective sample size?
  • N members
  • N groups
  • members N groups







A
B


A




(Jensen Neville 2002)
11
Whats difficult?
  • Relational learning and inferenceAccurate models
    must consider at least the relational
    neighborhood of a record, rather than only a
    record alone
  • Non-independenceData instances are
    non-independent, greatly complicating the
    statistics of both learning and inference
  • Semi-structured dataGood analysis requires
    frequent restructuring and reinterpretation of
    the underlying structure of data
  • Preserving privacy in relational and distributed
    data miningRecord linkage vs. aggregation

12
Technical approaches
  • New learning algorithmsRepresentations and
    learning techniques that consider relational
    structure and attributes when constructing models
  • New inference algorithmsMethods for applying
    learning models that leverage relational
    structure
  • Relational statisticsStatistical tests that
    correctly adjust for characteristics of
    relational data such as linkage and
    autocorrelation
  • Semi-structured databases and transformation
    techniquesDatabases and techniques that allow
    rapid restructuring of large databases by end
    users

13
Where to go for more information
  • 1998 AAAI Fall Symposium on AI Link
    AnalysisWeb-accessible papers
  • AAAI 2000 IJCAI 2003 Workshops on Learning
    Statistical Models from Relational
    DataWeb-accessible papersConsider attending
    IJCAI 2003 workshop (send email)
  • KDD 2002 KDD 2003 Workshops on Multi-Relational
    Data MiningSpecial issue of SIGKDD Explorations
    (forthcoming)Consider attending KDD 2003
    workshop
  • DARPAs Evidence Extraction and Link Discovery
    ProgramPattern learning areaWork published at
    ICML and SIGKDD in past two years
Write a Comment
User Comments (0)
About PowerShow.com