Latent Variable Models of Social Networks and Text - PowerPoint PPT Presentation

About This Presentation
Title:

Latent Variable Models of Social Networks and Text

Description:

Automatically Building Special Purpose Search Engines with ... – PowerPoint PPT presentation

Number of Views:214
Avg rating:3.0/5.0
Slides: 97
Provided by: AndrewM139
Category:

less

Transcript and Presenter's Notes

Title: Latent Variable Models of Social Networks and Text


1
Latent Variable Models of Social Networks and
Text
  • Andrew McCallum
  • Computer Science Department
  • University of Massachusetts Amherst

Joint work with ?Xuerui Wang, Natasha
Mohanty, Andres Corrada, Chris Pal, Wei Li, David
Mimno and Gideon Mann.
2
Social Network in an Email Dataset
3
Outline
Social Network Analysis with Topic Models
  • Role Discovery (Author-Recipient-Topic Model,
    ART)
  • Group Discovery (Group-Topic Model, GT)
  • Enhanced Topic Models
  • Time Localized Topics (Topics-over-Time Model,
    TOT)
  • Time Localized Groups (Groups-over-Time Model,
    GOT)
  • Markov Dependencies in Topics (Topical N-Grams
    Model, TNG)
  • Bibliometric Impact Transfer Measures using
    Topics

Multi-Conditional Mixtures AAAI 2006
4
Clustering words into topics withLatent
Dirichlet Allocation
Blei, Ng, Jordan 2003
GenerativeProcess
Mixed Membershipmodel
Example
For each document
70 Iraq war 30 US election
Sample a distributionover topics, ?
Multinomialover topics
For each word in doc
Iraq war
Sample a topic, z
Topic
Sample a wordfrom the topic, w
bombing
Word
Per-topicmultinomialover words
5
Example topicsinduced from a large collection of
text
JOB WORK JOBS CAREER EXPERIENCE EMPLOYMENT OPPORTU
NITIES WORKING TRAINING SKILLS CAREERS POSITIONS F
IND POSITION FIELD OCCUPATIONS REQUIRE OPPORTUNITY
EARN ABLE
SCIENCE STUDY SCIENTISTS SCIENTIFIC KNOWLEDGE WORK
RESEARCH CHEMISTRY TECHNOLOGY MANY MATHEMATICS BI
OLOGY FIELD PHYSICS LABORATORY STUDIES WORLD SCIEN
TIST STUDYING SCIENCES
BALL GAME TEAM FOOTBALL BASEBALL PLAYERS PLAY FIEL
D PLAYER BASKETBALL COACH PLAYED PLAYING HIT TENNI
S TEAMS GAMES SPORTS BAT TERRY
FIELD MAGNETIC MAGNET WIRE NEEDLE CURRENT COIL POL
ES IRON COMPASS LINES CORE ELECTRIC DIRECTION FORC
E MAGNETS BE MAGNETISM POLE INDUCED
STORY STORIES TELL CHARACTER CHARACTERS AUTHOR REA
D TOLD SETTING TALES PLOT TELLING SHORT FICTION AC
TION TRUE EVENTS TELLS TALE NOVEL
MIND WORLD DREAM DREAMS THOUGHT IMAGINATION MOMENT
THOUGHTS OWN REAL LIFE IMAGINE SENSE CONSCIOUSNES
S STRANGE FEELING WHOLE BEING MIGHT HOPE
DISEASE BACTERIA DISEASES GERMS FEVER CAUSE CAUSED
SPREAD VIRUSES INFECTION VIRUS MICROORGANISMS PER
SON INFECTIOUS COMMON CAUSING SMALLPOX BODY INFECT
IONS CERTAIN
WATER FISH SEA SWIM SWIMMING POOL LIKE SHELL SHARK
TANK SHELLS SHARKS DIVING DOLPHINS SWAM LONG SEAL
DIVE DOLPHIN UNDERWATER
Tennenbaum et al
6
Example topicsinduced from a large collection of
text
JOB WORK JOBS CAREER EXPERIENCE EMPLOYMENT OPPORTU
NITIES WORKING TRAINING SKILLS CAREERS POSITIONS F
IND POSITION FIELD OCCUPATIONS REQUIRE OPPORTUNITY
EARN ABLE
SCIENCE STUDY SCIENTISTS SCIENTIFIC KNOWLEDGE WORK
RESEARCH CHEMISTRY TECHNOLOGY MANY MATHEMATICS BI
OLOGY FIELD PHYSICS LABORATORY STUDIES WORLD SCIEN
TIST STUDYING SCIENCES
BALL GAME TEAM FOOTBALL BASEBALL PLAYERS PLAY FIEL
D PLAYER BASKETBALL COACH PLAYED PLAYING HIT TENNI
S TEAMS GAMES SPORTS BAT TERRY
FIELD MAGNETIC MAGNET WIRE NEEDLE CURRENT COIL POL
ES IRON COMPASS LINES CORE ELECTRIC DIRECTION FORC
E MAGNETS BE MAGNETISM POLE INDUCED
STORY STORIES TELL CHARACTER CHARACTERS AUTHOR REA
D TOLD SETTING TALES PLOT TELLING SHORT FICTION AC
TION TRUE EVENTS TELLS TALE NOVEL
MIND WORLD DREAM DREAMS THOUGHT IMAGINATION MOMENT
THOUGHTS OWN REAL LIFE IMAGINE SENSE CONSCIOUSNES
S STRANGE FEELING WHOLE BEING MIGHT HOPE
DISEASE BACTERIA DISEASES GERMS FEVER CAUSE CAUSED
SPREAD VIRUSES INFECTION VIRUS MICROORGANISMS PER
SON INFECTIOUS COMMON CAUSING SMALLPOX BODY INFECT
IONS CERTAIN
WATER FISH SEA SWIM SWIMMING POOL LIKE SHELL SHARK
TANK SHELLS SHARKS DIVING DOLPHINS SWAM LONG SEAL
DIVE DOLPHIN UNDERWATER
Tennenbaum et al
7
From LDA to Author-Recipient-Topic
McCallum et al 2005
(ART)
8
Inference and Estimation
  • Gibbs Sampling
  • Easy to implement
  • Reasonably fast

r
9
Enron Email Corpus
  • 250k email messages
  • 23k people

Date Wed, 11 Apr 2001 065600 -0700 (PDT) From
debra.perlingiere_at_enron.com To
steve.hooser_at_enron.com Subject
Enron/TransAltaContract dated Jan 1, 2001 Please
see below. Katalin Kiss of TransAlta has
requested an electronic copy of our final draft?
Are you OK with this? If so, the only version I
have is the original draft without
revisions. DP Debra Perlingiere Enron North
America Corp. Legal Department 1400 Smith Street,
EB 3885 Houston, Texas 77002 dperlin_at_enron.com
10
Topics, and prominent senders /
receiversdiscovered by ART
Topic names, by hand
11
Topics, and prominent senders /
receiversdiscovered by ART
Beck Chief Operations Officer
Dasovich Government Relations
Executive Shapiro Vice President of
Regulatory Affairs Steffes Vice President of
Government Affairs
12
Comparing Role Discovery
Traditional SNA
Author-Topic
ART
connection strength (A,B)
distribution over recipients
distribution over authored topics
distribution over authored topics
13
Comparing Role Discovery Tracy Geaconne ? Dan
McCarty
Traditional SNA
Author-Topic
ART
Different roles
Different roles
Similar roles
Geaconne Secretary McCarty Vice President
14
Comparing Role Discovery Lynn Blair ? Kimberly
Watson
Traditional SNA
Author-Topic
ART
Very different
Very similar
Different roles
Blair Gas pipeline logistics Watson
Pipeline facilities planning
15
McCallum Email Corpus 2004
  • January - October 2004
  • 23k email messages
  • 825 people

From kate_at_cs.umass.edu Subject NIPS and
.... Date June 14, 2004 22741 PM EDT To
mccallum_at_cs.umass.edu There is pertinent stuff
on the first yellow folder that is completed
either travel or other things, so please sign
that first folder anyway. Then, here is the
reminder of the things I'm still waiting
for NIPS registration receipt. CALO
registration receipt. Thanks, Kate
16
Four most prominent topicsin discussions with
____?
17
(No Transcript)
18
Two most prominent topicsin discussions with
____?
19
(No Transcript)
20
Role-Author-Recipient-Topic Models
21
Results with RARTPeople in Role 3 in
Academic Email
  • olc lead Linux sysadmin
  • gauthier sysadmin for CIIR group
  • irsystem mailing list CIIR sysadmins
  • system mailing list for dept. sysadmins
  • allan Prof., chair of computing committee
  • valerie second Linux sysadmin
  • tech mailing list for dept. hardware
  • steve head of dept. I.T. support

22
Roles for allan (James Allan)
  • Role 3 I.T. support
  • Role 2 Natural Language researcher

Roles for pereira (Fernando Pereira)
  • Role 2 Natural Language researcher
  • Role 4 SRI CALO project participant
  • Role 6 Grant proposal writer
  • Role 10 Grant proposal coordinator
  • Role 8 Guests at McCallums house

23
ART Roles but not Groups
Traditional SNA
Author-Topic
ART
Not
Not
Block structured
Enron TransWestern Division
24
Outline
Social Network Analysis with Topic Models
a
  • Role Discovery (Author-Recipient-Topic Model,
    ART)
  • Group Discovery (Group-Topic Model, GT)
  • Enhanced Topic Models
  • Time Localized Topics (Topics-over-Time Model,
    TOT)
  • Time Localized Groups (Groups-over-Time Model,
    GOT)
  • Markov Dependencies in Topics (Topical N-Grams
    Model, TNG)
  • Bibliometric Impact Transfer Measures using
    Topics

Multi-Conditional Mixtures AAAI 2006
25
Groups and Topics
  • Input
  • Observed relations between people
  • Attributes on those relations (text, or
    categorical)
  • Output
  • Attributes clustered into topics
  • Groups of people---varying depending on topic

26
Discovering Groups from Observed Set of Relations
Student Roster Adams BennettCarterDavis Edward
s Frederking
Academic Admiration Acad(A, B) Acad(C,
B) Acad(A, D) Acad(C, D) Acad(B, E) Acad(D,
E) Acad(B, F) Acad(D, F) Acad(E, A) Acad(F,
A) Acad(E, C) Acad(F, C)
Admiration relations among six high school
students.
27
Adjacency Matrix Representing Relations
Student Roster Adams BennettCarterDavis Edward
s Frederking
Academic Admiration Acad(A, B) Acad(C,
B) Acad(A, D) Acad(C, D) Acad(B, E) Acad(D,
E) Acad(B, F) Acad(D, F) Acad(E, A) Acad(F,
A) Acad(E, C) Acad(F, C)
A B C D E F
G1 G2 G1 G2 G3 G3
G1
G2
G1
G2
G3
G3
A C B D E F
G1 G1 G2 G2 G3 G3
G1
G1
G2
G2
G3
G3
A B C D E F
A
B
C
D
E
F
A
B
C
D
E
F
A
C
B
D
E
F
28
Group Model Partitioning Entities into Groups
Stochastic Blockstructures for Relations Nowicki,
Snijders 2001
Beta
Dirichlet
Multinomial
S number of entities G number of groups
Binomial
Enhanced with arbitrary number of groups in
Kemp, Griffiths, Tenenbaum 2004
29
Two Relations with Different Attributes
Student Roster Adams BennettCarterDavis Edward
s Frederking
Academic Admiration Acad(A, B) Acad(C,
B) Acad(A, D) Acad(C, D) Acad(B, E) Acad(D,
E) Acad(B, F) Acad(D, F) Acad(E, A) Acad(F,
A) Acad(E, C) Acad(F, C)
Social Admiration Soci(A, B) Soci(A, D) Soci(A,
F) Soci(B, A) Soci(B, C) Soci(B, E) Soci(C, B)
Soci(C, D) Soci(C, F) Soci(D, A) Soci(D, C)
Soci(D, E) Soci(E, B) Soci(E, D) Soci(E,
F) Soci(F, A) Soci(F, C) Soci(F, E)
A C B D E F
G1 G1 G2 G2 G3 G3
G1
G1
G2
G2
G3
G3
A C E B D F
G1 G1 G1 G2 G2 G2
G1
G1
G1
G2
G2
G2
A
C
E
B
D
F
A
C
B
D
E
F
30
The Group-Topic Model Discovering Groups and
Topics Simultaneously
Wang, Mohanty, McCallum 2006
Beta
Uniform
Dirichlet
Multinomial
Dirichlet
Binomial
Multinomial
31
Inference and Estimation
  • Gibbs Sampling
  • Many r.v.s can be integrated out
  • Easy to implement
  • Reasonably fast

We assume the relationship is symmetric.
32
Dataset 1U.S. Senate
  • 16 years of voting records in the US Senate (1989
    2005)
  • a Senator may respond Yea or Nay to a resolution
  • 3423 resolutions with text attributes (index
    terms)
  • 191 Senators in total across 16 years

S.543 Title An Act to reform Federal deposit
insurance, protect the deposit insurance funds,
recapitalize the Bank Insurance Fund, improve
supervision and regulation of insured depository
institutions, and for other purposes. Sponsor
Sen Riegle, Donald W., Jr. MI (introduced
3/5/1991) Cosponsors (2) Latest Major Action
12/19/1991 Became Public Law No 102-242. Index
terms Banks and banking Accounting
Administrative fees Cost control Credit Deposit
insurance Depressed areas and other 110 terms
Adams (D-WA), Nay Akaka (D-HI), Yea Bentsen
(D-TX), Yea Biden (D-DE), Yea Bond (R-MO), Yea
Bradley (D-NJ), Nay Conrad (D-ND), Nay
33
Topics Discovered (U.S. Senate)
Education Energy Military Misc. Economic
education energy government federal
school power military labor
aid water foreign insurance
children nuclear tax aid
drug gas congress tax
students petrol aid business
elementary research law employee
prevention pollution policy care
Mixture of Unigrams
Education Domestic Foreign Economic Social Security Medicare
education foreign labor social
school trade insurance security
federal chemicals tax insurance
aid tariff congress medical
government congress income care
tax drugs minimum medicare
energy communicable wage disability
research diseases business assistance
Group-Topic Model
34
Groups Discovered (US Senate)
Groups from topic Education Domestic
35
Senators Who Change Coalition the most Dependent
on Topic
e.g. Senator Shelby (D-AL) votes with the
Republicans on Economic with the Democrats on
Education Domestic with a small group of
maverick Republicans on Social Security Medicaid
36
Dataset 2The UN General Assembly
  • Voting records of the UN General Assembly (1990 -
    2003)
  • A country may choose to vote Yes, No or Abstain
  • 931 resolutions with text attributes (titles)
  • 192 countries in total
  • Also experiments later with resolutions from
    1960-2003

Vote on Permanent Sovereignty of Palestinian
People, 87th plenary meeting The draft
resolution on permanent sovereignty of the
Palestinian people in the occupied Palestinian
territory, including Jerusalem, and of the Arab
population in the occupied Syrian Golan over
their natural resources (document A/54/591) was
adopted by a recorded vote of 145 in favour to 3
against with 6 abstentions In favour
Afghanistan, Argentina, Belgium, Brazil, Canada,
China, France, Germany, India, Japan, Mexico,
Netherlands, New Zealand, Pakistan, Panama,
Russian Federation, South Africa, Spain, Turkey,
and other 126 countries. Against Israel,
Marshall Islands, United States. Abstain
Australia, Cameroon, Georgia, Kazakhstan,
Uzbekistan, Zambia.
37
Topics Discovered (UN)
Everything Nuclear Human Rights Security in Middle East
Everything Nuclear Security in Middle East
nuclear rights occupied
weapons human israel
use palestine syria
implementation situation security
countries israel calls
Mixture of Unigrams
Nuclear Non-proliferation Nuclear Arms Race Human Rights
nuclear nuclear rights
states arms human
united prevention palestine
weapons race occupied
nations space israel
Group-TopicModel
38
GroupsDiscovered(UN)
The countries list for each group are ordered by
their 2005 GDP (PPP) and only 5 countries are
shown in groups that have more than 5 members.
39
Do We Get Better Groups with the GT Model?
Baseline Model GT Model
  1. Cluster bills into topics using mixture of
    unigrams
  2. Apply group model on topic-specific subsets of
    bills.
  1. Jointly cluster topic and groups at the same time
    using the GT model.

Datasets Avg. AI for Baseline Avg. AI for GT p-value
Senate 0.8198 0.8294 lt.01
UN 0.8548 0.8664 lt.01
Agreement Index (AI) measures group cohesion.
Higher, better.
40
Groups and Topics, Trends over Time (UN)
41
Outline
Social Network Analysis with Topic Models
a
  • Role Discovery (Author-Recipient-Topic Model,
    ART)
  • Group Discovery (Group-Topic Model, GT)
  • Enhanced Topic Models
  • Time Localized Topics (Topics-over-Time Model,
    TOT)
  • Time Localized Groups (Groups-over-Time Model,
    GOT)
  • Markov Dependencies in Topics (Topical N-Grams
    Model, TNG)
  • Bibliometric Impact Transfer Measures using
    Topics

a
Multi-Conditional Mixtures AAAI 2006
42
Want to Model Trends over Time
  • Is prevalence of topic growing or waning?
  • Pattern appears only briefly
  • Capture its statistics in focused way
  • Dont confuse it with patterns elsewhere in time
  • How do roles, groups, influence shift over time?

43
Topics over Time (TOT)
Wang, McCallum, KDD 2006
?
Dirichlet
?
multinomialover topics
Uniformprior
Dirichlet prior
topicindex
z
?
?
timestamp
word
w
t
?
?
T
T
Nd
Betaover time
Multinomialover words
D
44
State of the Union Address
208 Addresses delivered between January 8, 1790
and January 29, 2002.
  • To increase the number of documents, we split the
    addresses into paragraphs and treated them as
    documents. One-line paragraphs were excluded.
    Stopping was applied.
  • 17156 documents
  • 21534 words
  • 669,425 tokens

Our scheme of taxation, by means of which this
needless surplus is taken from the people and put
into the public Treasury, consists of a tariff
or duty levied upon importations from abroad and
internal-revenue taxes levied upon the
consumption of tobacco and spirituous and malt
liquors. It must be conceded that none of the
things subjected to internal-revenue
taxation are, strictly speaking, necessaries.
There appears to be no just complaint of this
taxation by the consumers of these articles, and
there seems to be nothing so well able to bear
the burden without hardship to any portion of the
people.
1910
45
Comparing TOT with LDA
46
Sample Topic Cold War
world nations united states peace free economic mi
litary soviet international security strength defe
nse freedom europe force peoples efforts aggressio
n today
47
ComparingTOTagainst LDA
48
TOT on 17 years of NIPS proceedings
49
Topic Distributions Conditioned on Time
topic mass (in vertical height)
time
50
TOT on 17 years of NIPS proceedings
TOT
LDA
51
TOT versusLDAon my email
52
TOT improves ability to Predict Time
Predicting the year of a State-of-the-Union
address.
L1 distance between predicted year and actual
year.
53
Discovering Group StructureTrends over Time
Group Model without Time
Group Model with Time
per groupbeta overtime
G
multinomialdistributionover groups
groupid
time- stamp
observedrelation
per group-pairbinomial overrelation absent /
present
54
Outline
Social Network Analysis with Topic Models
a
  • Role Discovery (Author-Recipient-Topic Model,
    ART)
  • Group Discovery (Group-Topic Model, GT)
  • Enhanced Topic Models
  • Time Localized Topics (Topics-over-Time Model,
    TOT)
  • Time Localized Groups (Groups-over-Time Model,
    GOT)
  • Markov Dependencies in Topics (Topical N-Grams
    Model, TNG)
  • Bibliometric Impact Transfer Measures using
    Topics

a
a
a
Multi-Conditional Mixtures AAAI 2006
55
Topics Modeling Phrases
  • Topics based only on unigrams often difficult to
    interpret
  • Topic discovery itself is confused because
    important meaning / distinctions carried by
    phrases.

56
Topic Interpretability
LDA algorithms algorithm genetic problems efficie
nt
Topical N-grams genetic algorithms genetic
algorithm evolutionary computation evolutionary
algorithms fitness function
57
Topical N-gram Model
Wang, McCallum 2005
?
?
z1
z2
z3
z4
. . .
topic
uni- / bi-gramstatus
y1
y2
y3
y4
. . .
w1
w2
w3
w4
. . .
words
D
?1
?2
?
?1
?
?2
W
W
bi-
uni-
T
T
58
Features of Topical N-Grams model
  • Easily trained by Gibbs sampling
  • Can run efficiently on millions of words
  • Topic-specific phrase discovery
  • white house has special meaning as a phrasein
    the politics topic,
  • ... but not in the real estate topic.

59
Topic Comparison
LDA
Topical N-grams (2)
Topical N-grams (1)
policy action states actions function reward contr
ol agent q-learning optimal goal learning space st
ep environment system problem steps sutton policie
s
learning optimal reinforcement state problems poli
cy dynamic action programming actions function mar
kov methods decision rl continuous spaces step pol
icies planning
reinforcement learning optimal policy dynamic
programming optimal control function
approximator prioritized sweeping finite-state
controller learning system reinforcement learning
rl function approximators markov decision
problems markov decision processes local
search state-action pair markov decision
process belief states stochastic policy action
selection upright position reinforcement learning
methods
60
Topic Comparison
LDA
Topical N-grams (2)
Topical N-grams (1)
motion response direction cells stimulus figure co
ntrast velocity model responses stimuli moving cel
l intensity population image center tuning complex
directions
motion visual field position figure direction fiel
ds eye location retina receptive velocity vision m
oving system flow edge center light local
receptive field spatial frequency temporal
frequency visual motion motion energy tuning
curves horizontal cells motion detection preferred
direction visual processing area mt visual
cortex light intensity directional
selectivity high contrast motion
detectors spatial phase moving stimuli decision
strategy visual stimuli
61
Topic Comparison
LDA
Topical N-grams (2)
Topical N-grams (1)
speech word training system recognition hmm speake
r performance phoneme acoustic words context syste
ms frame trained sequence phonetic speakers mlp hy
brid
word system recognition hmm speech training perfor
mance phoneme words context systems frame trained
speaker sequence speakers mlp frames segmentation
models
speech recognition training data neural
network error rates neural net hidden markov
model feature vectors continuous speech training
procedure continuous speech recognition gamma
filter hidden control speech production neural
nets input representation output layers training
algorithm test set speech frames speaker dependent
62
Outline
Social Network Analysis with Topic Models
a
  • Role Discovery (Author-Recipient-Topic Model,
    ART)
  • Group Discovery (Group-Topic Model, GT)
  • Enhanced Topic Models
  • Time Localized Topics (Topics-over-Time Model,
    TOT)
  • Time Localized Groups (Groups-over-Time Model,
    GOT)
  • Markov Dependencies in Topics (Topical N-Grams
    Model, TNG)
  • Bibliometric Impact Transfer Measures using
    Topics

a
a
a
a
Multi-Conditional Mixtures AAAI 2006
63
Social Networks in Research Literature
  • Better understand structure of our own research
    area.
  • Structure helps us learn a new field.
  • Aid collaboration
  • Map how ideas travel through social networks of
    researchers.
  • Aids for hiring and finding reviewers!

64
Traditional Bibliometrics
  • Analyses a small amount of data(e.g. 19 articles
    from a single issue of a journal)
  • Uses journal as a proxy for research
    topic(but there is no journal for information
    extraction)
  • Uses impact measures almost exclusively based on
    simple citation counts.

How can we use topic models to create new,
interesting impact measures? Can create a social
network of scientific sub-fields?
65
Our Data
  • Over 1.6 million research papers, gathered as
    part of Rexa.info portal.
  • Cross linked references / citations.

66
Previous Systems
67
(No Transcript)
68
Previous Systems
Cites
Research Paper
69
More Entities and Relations
Expertise
Cites
Research Paper
Person
Grant
University
Venue
Groups
70
(No Transcript)
71
(No Transcript)
72
(No Transcript)
73
(No Transcript)
74
(No Transcript)
75
(No Transcript)
76
(No Transcript)
77
(No Transcript)
78
(No Transcript)
79
(No Transcript)
80
(No Transcript)
81
Finding Topics with TNG
Traditional unigram LDArun on 1.6 milliontitles
/ abstracts (200 topics) ...select 300k
papers onML, NLP, robotics, vision... Find 200
TNG topics among those papers.
82
Topical Bibliometric Impact Measures
Mann, Mimno, McCallum, 2006
  • Topical Citation Counts
  • Topical Impact Factors
  • Topical Longevity
  • Topical Precedence
  • Topical Diversity
  • Topical Transfer

83
Topical Diversity
Can also be measured on particular papers...
84
Topical Diversity
Entropy of the topic distribution among papers
that cite this paper (this topic).
LowDiversity
HighDiversity
85
Topical Transfer
Transfer from Digital Libraries to other topics
Other topic Cits Paper Title
Web Pages 31 Trawling the Web for Emerging Cyber-Communities, Kumar, Raghavan,... 1999.
Computer Vision 14 On being Undigital with digital cameras extending the dynamic...
Video 12 Lessons learned from the creation and deployment of a terabyte digital video
Graphs 12 Trawling the Web for Emerging Cyber-Communities
Web Pages 11 WebBase a repository of Web pages
86
Topical Transfer
Citation counts from one topic to another.
Map producers and consumers
87
(No Transcript)
88
(No Transcript)
89
(No Transcript)
90
(No Transcript)
91
(No Transcript)
92
(No Transcript)
93
(No Transcript)
94
Topical Transfer Through Time
  • Can we predict which research topicswill be
    hot at ICML next year?
  • ...based on
  • the hot topics in neighboring venues last year
  • learned neighborhood distances for venue pairs

95
How do Ideas Progress Through Social Networks?
Hypothetical Example
ADA Boost
SIGIR(Info. Retrieval)
COLT
ICML
ICCV(Vision)
ACL(NLP)
96
How do Ideas Progress Through Social Networks?
Hypothetical Example
ADA Boost
SIGIR(Info. Retrieval)
COLT
ICML
ICCV(Vision)
ACL(NLP)
97
How do Ideas Progress Through Social Networks?
Hypothetical Example
ADA Boost
SIGIR(Info. Retrieval)
COLT
ICML
ICCV(Vision)
ACL(NLP)
98
How do ConferencesInfluence Each Other?
  • Run an LDA on research papers.
  • For each year, create an agglomerated topic
    distribution for a particular conference
  • Model the topic distribution of a conference by
    the topic distributions of related conferences

99
Topic Prediction Models
Static Model
Transfer Model
Linear Regression and Ridge Regression Used for
Coefficient Training.
100
Preliminary Results
Mean Squared Prediction Error
(Smaller Is better)
TransferModel
Venues used for prediction
Transfer Model with Ridge Regression is a good
Predictor
101
Estimated Neighborhood Distances
Transfer into NIPS, 1988-1989
ML .079 Neural Computation .023 UAI
-0.0035 PAMI .0998 Theoretical
CS .0955 AI .032 AAAI .082
102
Outline
Social Network Analysis with Topic Models
a
  • Role Discovery (Author-Recipient-Topic Model,
    ART)
  • Group Discovery (Group-Topic Model, GT)
  • Enhanced Topic Models
  • Time Localized Topics (Topics-over-Time Model,
    TOT)
  • Time Localized Groups (Groups-over-Time Model,
    GOT)
  • Markov Dependencies in Topics (Topical N-Grams
    Model, TNG)
  • Bibliometric Impact Transfer Measures using
    Topics

a
a
a
a
a
a
Multi-Conditional Mixtures AAAI 2006
103
Outline
Social Network Analysis with Topic Models
a
  • Role Discovery (Author-Recipient-Topic Model,
    ART)
  • Group Discovery (Group-Topic Model, GT)
  • Enhanced Topic Models
  • Time Localized Topics (Topics-over-Time Model,
    TOT)
  • Time Localized Groups (Groups-over-Time Model,
    GOT)
  • Markov Dependencies in Topics (Topical N-Grams
    Model, TNG)
  • Bibliometric Impact Transfer Measures using
    Topics

a
a
a
a
a
Multi-Conditional Mixtures AAAI 2006
104
Want a topic model with the advantages of CRFs
  • Use arbitrary, overlapping features of the input.
  • Undirected graphical model, so we dont have to
    think about avoiding cycles.
  • Integrate naturally with our other CRF
    components.
  • Train discriminatively
  • Natural semi-supervised training

What does this mean? Topic models are
unsupervised!
105
Multi-Conditional MixturesLatent Variable
Models fit by Multi-way Conditional Probability
McCallum, Wang, Pal, 2005, McCallum, Pal,
Druck, Wang, 2006
  • For clustering structured data,ala Latent
    Dirichlet Allocation its successors
  • But an undirected model,like the Harmonium
    Welling, Rosen-Zvi, Hinton, 2005
  • But trained by a multi-conditional objective
    O P(AB,C) P(BA,C) P(CA,B)e.g. A,B,C are
    different modalities

106
Objective Functions for Parameter Estimation
Traditional
New, multi-conditional
107
Multi-Conditional Learning (Regularization)
McCallum, Pal, Wang, 2006
108
Multi-Conditional Mixtures
109
Predictive Random Fieldsmixture of Gaussians on
synthetic data
McCallum, Wang, Pal, 2005
Data, classify by color
Generatively trained
Multi-Conditional
Conditionally-trained Jebara 1998
110
Multi-Conditional Mixturesvs. Harmoniunon
document retrieval task
McCallum, Wang, Pal, 2005
Multi-Conditional,multi-way conditionally trained
Conditionally-trained,to predict class labels
Harmonium, joint,with class labels and words
Harmonium, joint with words, no labels
111
Multi-Conditional Topics
Strong positive and negative indicators
112
Outline
Social Network Analysis with Topic Models
  • Role Discovery (Author-Recipient-Topic Model,
    ART)
  • Group Discovery (Group-Topic Model, GT)
  • Enhanced Topic Models
  • Correlations among Topics (Pachinko Allocation,
    PAM)
  • Time Localized Topics (Topics-over-Time Model,
    TOT)
  • Markov Dependencies in Topics (Topical N-Grams
    Model, TNG)
  • Bibliometric Impact Measures enabled by Topics

Multi-Conditional Mixtures
113
Summary
114
Topical Precedence
Early-ness
Within a topic, what are the earliest papers
that received more than n citations?
  • Information Retrieval
  • On Relevance, Probabilistic Indexing and
    Information Retrieval, Kuhns and Maron (1960)
  • Expected Search Length A Single Measure of
    Retrieval Effectiveness Based on the Weak
    Ordering Action of Retrieval Systems, Cooper
    (1968)
  • Relevance feedback in information retrieval,
    Rocchio (1971)
  • Relevance feedback and the optimization of
    retrieval effectiveness, Salton (1971)
  • New experiments in relevance feedback, Ide
    (1971)
  • Automatic Indexing of a Sound Database Using
    Self-organizing Neural Nets, Feiten and Gunzel
    (1982)

115
Topical Precedence
Early-ness
Within a topic, what are the earliest papers
that received more than n citations?
  • Speech Recognition
  • Some experiments on the recognition of speech,
    with one and two ears, E. Colin Cherry (1953)
  • Spectrographic study of vowel reduction, B.
    Lindblom (1963)
  • Automatic Lipreading to enhance speech
    recognition, Eric D. Petajan (1965)
  • Effectiveness of linear prediction
    characteristics of the speech wave for..., B.
    Atal (1974)
  • Automatic Recognition of Speakers from Their
    Voices, B. Atal (1976)

116
Summary
Write a Comment
User Comments (0)
About PowerShow.com