Multimedia and Graph mining - PowerPoint PPT Presentation

1 / 67
About This Presentation
Title:

Multimedia and Graph mining

Description:

similarly, in many other languages; for customers and sales volume; city populations etc etc ... E(t 1) =? 2 * E(t) IC '06. C. Faloutsos. 46. CMU SCS. Temporal ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 68
Provided by: christosf
Learn more at: https://cs.login.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Multimedia and Graph mining


1
Multimedia and Graph mining
  • Christos Faloutsos
  • CMU

2
CONGRATULATIONS!
Welcome to CMU!
3
Outline
  • Problem definition / Motivation
  • Biological image mining
  • Graphs and power laws
  • Streams and forecasting
  • Conclusions

4
Motivation
  • Data mining find patterns (rules, outliers)
  • How do detached cat retinas evolve?
  • How do real graphs look like?
  • How do (numerical) streams look like?

5
ViVo cat retina mining
  • with Ambuj Singh, Mark Verardo, Vebjorn Ljosa,
    Arnab Bhattacharya (UCSB)
  • Jia-Yu Tim Pan, HJ Yang (CMU)

6
Detachment Development
1 day after detachment
3 days after detachment
Normal
3 months after detachment
7 days after detachment
28 days after detachment
7
Data and Problem
  • (Problem) What happens in retina after
    detachment?
  • What tissues (regions) are involved?
  • How do they change over time?
  • How will a program convey this info?
  • More than classificationwe want to learn what
    classifier learned

8
Main idea
  • extract characteristic visual words
  • Equivalent to characteristic keywords, in a
    collection of text documents

9
Visual vocabulary?
10
Visual vocabulary?
news president, minister, economic
sports baseball, score, penalty
11
Visual Vocabulary (ViVo) generation
Visualvocabulary
Step 3 ViVo generation
Step 1 Tile image
8x12 tiles
Step 2 Extract tile features
Feature 2
Feature 1
12
Biological interpretation
ID ViVo Description Condition
V1 GFAP in inner retina (Müller cells) Healthy
V10 Healthy outer segments of rod photoreceptors Healthy
V8 Redistribution of rod opsin into cell bodies of rod photoreceptors Detached
V11 Co-occurring processes Müller cell hypertrophy and rod opsin redistribution Detached
13
Which tissue is significant on 7-day?
14
FEMine Mining Fly Embryos
15
With
  • Eric Xing (CMU CS)
  • Bob Murphy (CMU Bio)
  • Tim Pan (CMU -gt Google)
  • Andre Balan (U. Sao Paulo)

16
Outline
  • Problem definition / Motivation
  • Biological image mining
  • Graphs and power laws
  • Streams and forecasting
  • Conclusions

17
Graphs - why should we care?
18
Graphs - why should we care?
Internet Map lumeta.com
Food Web Martinez 91
Protein Interactions genomebiology.com
Friendship Network Moody 01
19
Joint work with
  • Dr. Deepayan Chakrabarti (CMU/Yahoo R.L.)

20
Problem network and graph mining
  • How does the Internet look like?
  • How does the web look like?
  • What constitutes a normal social network?
  • What is normal/abnormal?
  • which patterns/laws hold?

21
Graph mining
  • Are real graphs random?

22
Laws and patterns
  • NO!!
  • Diameter
  • in- and out- degree distributions
  • other (surprising) patterns

23
Laws degree distributions
  • Q avg degree is 3 - what is the most probable
    degree?

count
??
degree
3
24
Laws degree distributions
  • Q avg degree is 3 - what is the most probable
    degree?

degree
25
Solution
Frequency
Exponent slope
O -2.15
-2.15
Nov97
Outdegree
  • The plot is linear in log-log scale FFF99
  • freq degree (-2.15)

26
But
  • Q1 How about graphs from other domains?
  • Q2 How about temporal evolution?

27
The Peer-to-Peer Topology
Jovanovic
  • Frequency versus degree
  • Number of adjacent peers follows a power-law

28
More power laws
  • citation counts (citeseer.nj.nec.com 6/2001)

log(count)
Ullman
log(citations)
29
Swedish sex-web
Nodes people (Females Males) Links sexual
relationships
Albert Laszlo Barabasi http//www.nd.edu/networks
/ Publication20Categories/ 0420Talks/2005-norway
-3hours.ppt
Liljeros et al. Nature 2001
4781 Swedes 18-74 59 response rate.
30
More power laws
  • web hit counts w/ A. Montgomery

Web Site Traffic
log(count)
Zipf
ebay
log(in-degree)
31
epinions.com
  • who-trusts-whom Richardson Domingos, KDD 2001

count
trusts-2000-people user
(out) degree
32
(No Transcript)
33
A famous power law Zipfs law
log(freq)
  • Bible - rank vs frequency (log-log)
  • similarly, in many other languages for
    customers and sales volume city populations etc
    etc

log(rank)
34
Olympic medals (Sidney00, Athens04)
log(medals)
log( rank)
35
More power laws areas Korcaks law
log(count( gt area))
Scandinavian lakes area vs complementary
cumulative count (log-log axes)
log(area)
36
(No Transcript)
37
But
  • Q1 How about graphs from other domains?
  • Q2 How about temporal evolution?

38
Time evolution
  • with Jure Leskovec (CMU)
  • and Jon Kleinberg (Cornell)
  • (best paper KDD05)

39
Evolution of the Diameter
  • Prior work on Power Law graphs hints at slowly
    growing diameter
  • diameter O(log N)
  • diameter O(log log N)
  • What is happening in real data?

40
Evolution of the Diameter
  • Prior work on Power Law graphs hints at slowly
    growing diameter
  • diameter O(log N)
  • diameter O(log log N)
  • What is happening in real data?
  • Diameter shrinks over time
  • As the network grows the distances between nodes
    slowly decrease

41
Diameter ArXiv citation graph
diameter
  • Citations among physics papers
  • 1992 2003
  • One graph per year

time years
42
Diameter Autonomous Systems
diameter
  • Graph of Internet
  • One graph per day
  • 1997 2000

number of nodes
43
Diameter Affiliation Network
diameter
  • Graph of collaborations in physics authors
    linked to papers
  • 10 years of data

time years
44
Diameter Patents
diameter
  • Patent citation network
  • 25 years of data

time years
45
Temporal Evolution of the Graphs
  • N(t) nodes at time t
  • E(t) edges at time t
  • Suppose that
  • N(t1) 2 N(t)
  • Q what is your guess for
  • E(t1) ? 2 E(t)

46
Temporal Evolution of the Graphs
  • N(t) nodes at time t
  • E(t) edges at time t
  • Suppose that
  • N(t1) 2 N(t)
  • Q what is your guess for
  • E(t1) ? 2 E(t)
  • A over-doubled!
  • But obeying the Densification Power Law

47
Densification Physics Citations
  • Citations among physics papers
  • 2003
  • 29,555 papers, 352,807 citations

E(t)
1.69
N(t)
48
Densification Physics Citations
  • Citations among physics papers
  • 2003
  • 29,555 papers, 352,807 citations

E(t)
1.69
1 tree
N(t)
49
Densification Physics Citations
  • Citations among physics papers
  • 2003
  • 29,555 papers, 352,807 citations

E(t)
1.69
clique 2
N(t)
50
Densification Patent Citations
  • Citations among patents granted
  • 1999
  • 2.9 million nodes
  • 16.5 million edges
  • Each year is a datapoint

E(t)
1.66
N(t)
51
Densification Autonomous Systems
  • Graph of Internet
  • 2000
  • 6,000 nodes
  • 26,000 edges
  • One graph per day

E(t)
1.18
N(t)
52
Densification Affiliation Network
  • Authors linked to their publications
  • 2002
  • 60,000 nodes
  • 20,000 authors
  • 38,000 papers
  • 133,000 edges

E(t)
1.15
N(t)
53
Graphs - Conclusions
  • Real graphs obey some surprising patterns
  • which can help us spot anomalies / outliers
  • A lot of interest from web searching companies
  • recommendation systems
  • link spamming
  • trust propagation
  • HUGE graphs (Millions and Billions of nodes)

54
Outline
  • Problem definition / Motivation
  • Biological image mining
  • Graphs and power laws
  • Streams and forecasting
  • Conclusions

55
Why care about streams?
56
Why care about streams?
  • Sensor devices
  • Temperature, weather measurements
  • Road traffic data
  • Geological observations
  • Patient physiological data
  • sensor-Andrew project
  • Embedded devices
  • Network routers

57
Co-evolving time sequences
  • Joint work with
  • Jimeng Sun (CMU)
  • Dr. Spiros Papadimitriou (CMU/IBM)
  • Dr. Yasushi Sakurai (NTT)
  • Prof. Jeanne VanBriesen (CMU/CEE)

58
Motivation
sensors near leak
sensors away from leak
water distribution network
normal operation
59
Motivation
sensors near leak
chlorine concentrations
sensors away from leak
water distribution network
normal operation
major leak
60
Motivation
actual measurements (n streams)
k hidden variable(s)
spot hidden (latent) variables
61
Motivation
Phase 1
Phase 1
Phase 2
Phase 2
chlorine concentrations
k 2
actual measurements (n streams)
k hidden variable(s)
spot hidden (latent) variables
62
Motivation
Phase 1
Phase 1
Phase 2
Phase 2
Phase 3
Phase 3
chlorine concentrations
k 1
actual measurements (n streams)
k hidden variable(s)
spot hidden (latent) variables
63
SPIRIT / InteMon
  • http//warsteiner.db.cs.cmu.edu/demo/intemon.jsp
  • http//localhost8080/demo/graphs.jsp
  • self- storage system (PDL/CMU)
  • 1 PetaByte storage
  • self-monitoring, self-healing self-
  • with Jimeng Sun (CMU/CS)
  • Evan Hoke (CMU/CS-ug)
  • Prof. Greg Ganger (CMU/CS/ECE)
  • John Strunk (CMU/ECE)

64
Related project
  • Anomaly detection in network traffic (Zhang, Xie)

65
Conclusions
  • Biological images, graphs streams pose
    fascinating problems
  • self-similarity, fractals and power laws work,
    when other methods fail!

66
Books
  • Manfred Schroeder Fractals, Chaos, Power Laws
    Minutes from an Infinite Paradise W.H. Freeman
    and Company, 1991 (Probably the BEST book on
    fractals!)

67
Contact info
  • christos_at_cs.cmu.edu
  • www.cs.cmu.edu/christos
  • Wean Hall 7107
  • Ph x8.1457
  • and, again WELCOME!
Write a Comment
User Comments (0)
About PowerShow.com