Title: Multimedia and Graph mining
1Multimedia and Graph mining
2CONGRATULATIONS!
Welcome to CMU!
3Outline
- Problem definition / Motivation
- Biological image mining
- Graphs and power laws
- Streams and forecasting
- Conclusions
4Motivation
- Data mining find patterns (rules, outliers)
- How do detached cat retinas evolve?
- How do real graphs look like?
- How do (numerical) streams look like?
5ViVo cat retina mining
- with Ambuj Singh, Mark Verardo, Vebjorn Ljosa,
Arnab Bhattacharya (UCSB) - Jia-Yu Tim Pan, HJ Yang (CMU)
6Detachment Development
1 day after detachment
3 days after detachment
Normal
3 months after detachment
7 days after detachment
28 days after detachment
7Data and Problem
- (Problem) What happens in retina after
detachment? - What tissues (regions) are involved?
- How do they change over time?
- How will a program convey this info?
- More than classificationwe want to learn what
classifier learned
8Main idea
- extract characteristic visual words
- Equivalent to characteristic keywords, in a
collection of text documents
9Visual vocabulary?
10Visual vocabulary?
news president, minister, economic
sports baseball, score, penalty
11Visual Vocabulary (ViVo) generation
Visualvocabulary
Step 3 ViVo generation
Step 1 Tile image
8x12 tiles
Step 2 Extract tile features
Feature 2
Feature 1
12Biological interpretation
ID ViVo Description Condition
V1 GFAP in inner retina (Müller cells) Healthy
V10 Healthy outer segments of rod photoreceptors Healthy
V8 Redistribution of rod opsin into cell bodies of rod photoreceptors Detached
V11 Co-occurring processes Müller cell hypertrophy and rod opsin redistribution Detached
13Which tissue is significant on 7-day?
14FEMine Mining Fly Embryos
15With
- Eric Xing (CMU CS)
- Bob Murphy (CMU Bio)
- Tim Pan (CMU -gt Google)
- Andre Balan (U. Sao Paulo)
16Outline
- Problem definition / Motivation
- Biological image mining
- Graphs and power laws
- Streams and forecasting
- Conclusions
17Graphs - why should we care?
18Graphs - why should we care?
Internet Map lumeta.com
Food Web Martinez 91
Protein Interactions genomebiology.com
Friendship Network Moody 01
19Joint work with
- Dr. Deepayan Chakrabarti (CMU/Yahoo R.L.)
20Problem network and graph mining
- How does the Internet look like?
- How does the web look like?
- What constitutes a normal social network?
- What is normal/abnormal?
- which patterns/laws hold?
21Graph mining
22Laws and patterns
- NO!!
- Diameter
- in- and out- degree distributions
- other (surprising) patterns
23Laws degree distributions
- Q avg degree is 3 - what is the most probable
degree?
count
??
degree
3
24Laws degree distributions
- Q avg degree is 3 - what is the most probable
degree?
degree
25Solution
Frequency
Exponent slope
O -2.15
-2.15
Nov97
Outdegree
- The plot is linear in log-log scale FFF99
- freq degree (-2.15)
26But
- Q1 How about graphs from other domains?
- Q2 How about temporal evolution?
27The Peer-to-Peer Topology
Jovanovic
- Frequency versus degree
- Number of adjacent peers follows a power-law
28More power laws
- citation counts (citeseer.nj.nec.com 6/2001)
log(count)
Ullman
log(citations)
29Swedish sex-web
Nodes people (Females Males) Links sexual
relationships
Albert Laszlo Barabasi http//www.nd.edu/networks
/ Publication20Categories/ 0420Talks/2005-norway
-3hours.ppt
Liljeros et al. Nature 2001
4781 Swedes 18-74 59 response rate.
30More power laws
- web hit counts w/ A. Montgomery
Web Site Traffic
log(count)
Zipf
ebay
log(in-degree)
31epinions.com
- who-trusts-whom Richardson Domingos, KDD 2001
count
trusts-2000-people user
(out) degree
32(No Transcript)
33A famous power law Zipfs law
log(freq)
- Bible - rank vs frequency (log-log)
- similarly, in many other languages for
customers and sales volume city populations etc
etc
log(rank)
34Olympic medals (Sidney00, Athens04)
log(medals)
log( rank)
35More power laws areas Korcaks law
log(count( gt area))
Scandinavian lakes area vs complementary
cumulative count (log-log axes)
log(area)
36(No Transcript)
37But
- Q1 How about graphs from other domains?
- Q2 How about temporal evolution?
38Time evolution
- with Jure Leskovec (CMU)
- and Jon Kleinberg (Cornell)
- (best paper KDD05)
39Evolution of the Diameter
- Prior work on Power Law graphs hints at slowly
growing diameter - diameter O(log N)
- diameter O(log log N)
- What is happening in real data?
40Evolution of the Diameter
- Prior work on Power Law graphs hints at slowly
growing diameter - diameter O(log N)
- diameter O(log log N)
- What is happening in real data?
- Diameter shrinks over time
- As the network grows the distances between nodes
slowly decrease
41Diameter ArXiv citation graph
diameter
- Citations among physics papers
- 1992 2003
- One graph per year
time years
42Diameter Autonomous Systems
diameter
- Graph of Internet
- One graph per day
- 1997 2000
number of nodes
43Diameter Affiliation Network
diameter
- Graph of collaborations in physics authors
linked to papers - 10 years of data
time years
44Diameter Patents
diameter
- Patent citation network
- 25 years of data
time years
45Temporal Evolution of the Graphs
- N(t) nodes at time t
- E(t) edges at time t
- Suppose that
- N(t1) 2 N(t)
- Q what is your guess for
- E(t1) ? 2 E(t)
46Temporal Evolution of the Graphs
- N(t) nodes at time t
- E(t) edges at time t
- Suppose that
- N(t1) 2 N(t)
- Q what is your guess for
- E(t1) ? 2 E(t)
- A over-doubled!
- But obeying the Densification Power Law
47Densification Physics Citations
- Citations among physics papers
- 2003
- 29,555 papers, 352,807 citations
E(t)
1.69
N(t)
48Densification Physics Citations
- Citations among physics papers
- 2003
- 29,555 papers, 352,807 citations
E(t)
1.69
1 tree
N(t)
49Densification Physics Citations
- Citations among physics papers
- 2003
- 29,555 papers, 352,807 citations
E(t)
1.69
clique 2
N(t)
50Densification Patent Citations
- Citations among patents granted
- 1999
- 2.9 million nodes
- 16.5 million edges
- Each year is a datapoint
E(t)
1.66
N(t)
51Densification Autonomous Systems
- Graph of Internet
- 2000
- 6,000 nodes
- 26,000 edges
- One graph per day
E(t)
1.18
N(t)
52Densification Affiliation Network
- Authors linked to their publications
- 2002
- 60,000 nodes
- 20,000 authors
- 38,000 papers
- 133,000 edges
E(t)
1.15
N(t)
53Graphs - Conclusions
- Real graphs obey some surprising patterns
- which can help us spot anomalies / outliers
- A lot of interest from web searching companies
- recommendation systems
- link spamming
- trust propagation
- HUGE graphs (Millions and Billions of nodes)
54Outline
- Problem definition / Motivation
- Biological image mining
- Graphs and power laws
- Streams and forecasting
- Conclusions
55Why care about streams?
56Why care about streams?
- Sensor devices
- Temperature, weather measurements
- Road traffic data
- Geological observations
- Patient physiological data
- sensor-Andrew project
- Embedded devices
- Network routers
57Co-evolving time sequences
- Joint work with
- Jimeng Sun (CMU)
- Dr. Spiros Papadimitriou (CMU/IBM)
- Dr. Yasushi Sakurai (NTT)
- Prof. Jeanne VanBriesen (CMU/CEE)
58Motivation
sensors near leak
sensors away from leak
water distribution network
normal operation
59Motivation
sensors near leak
chlorine concentrations
sensors away from leak
water distribution network
normal operation
major leak
60Motivation
actual measurements (n streams)
k hidden variable(s)
spot hidden (latent) variables
61Motivation
Phase 1
Phase 1
Phase 2
Phase 2
chlorine concentrations
k 2
actual measurements (n streams)
k hidden variable(s)
spot hidden (latent) variables
62Motivation
Phase 1
Phase 1
Phase 2
Phase 2
Phase 3
Phase 3
chlorine concentrations
k 1
actual measurements (n streams)
k hidden variable(s)
spot hidden (latent) variables
63SPIRIT / InteMon
- http//warsteiner.db.cs.cmu.edu/demo/intemon.jsp
- http//localhost8080/demo/graphs.jsp
- self- storage system (PDL/CMU)
- 1 PetaByte storage
- self-monitoring, self-healing self-
- with Jimeng Sun (CMU/CS)
- Evan Hoke (CMU/CS-ug)
- Prof. Greg Ganger (CMU/CS/ECE)
- John Strunk (CMU/ECE)
64Related project
- Anomaly detection in network traffic (Zhang, Xie)
65Conclusions
- Biological images, graphs streams pose
fascinating problems - self-similarity, fractals and power laws work,
when other methods fail!
66Books
- Manfred Schroeder Fractals, Chaos, Power Laws
Minutes from an Infinite Paradise W.H. Freeman
and Company, 1991 (Probably the BEST book on
fractals!)
67Contact info
- christos_at_cs.cmu.edu
- www.cs.cmu.edu/christos
- Wean Hall 7107
- Ph x8.1457
- and, again WELCOME!