Semantic Representations with Probabilistic Topic Models - PowerPoint PPT Presentation

1 / 105
About This Presentation
Title:

Semantic Representations with Probabilistic Topic Models

Description:

... dependencies short range dependencies Semantic dependencies long-range q z1 z2 z3 z4 w1 w2 w3 w4 s1 s2 s3 s4 ... SATURN GALAXY associate number 1 ... – PowerPoint PPT presentation

Number of Views:248
Avg rating:3.0/5.0
Slides: 106
Provided by: TomG56
Category:

less

Transcript and Presenter's Notes

Title: Semantic Representations with Probabilistic Topic Models


1
Semantic Representations with Probabilistic Topic
Models
Mark Steyvers Department of Cognitive
Sciences University of California, Irvine
Joint work with Tom Griffiths, UC
Berkeley Padhraic Smyth, UC Irvine
2
Topic Models in Machine Learning
  • Unsupervised extraction of content from large
    text collection
  • Topics provide quick summary of content / gist
  • What is in this corpus?
  • What is in this document, paragraph, or sentence?
  • What are similar documents to a query?
  • What are the topical trends over time?

3
Topic Models in Psychology
  • Topic models address three computational problems
    for semantic memory system
  • Gist extraction what is this set of words about?
  • Disambiguation what is the sense of this word?
  • - E.g. football field vs. magnetic field
  • Prediction what fact, concept, or word is next?

4
Two approaches to semantic representation
Semantic networks
Semantic Spaces
BAT
BALL
LOAN
CASH
GAME
FUN
MONEY
PLAY
THEATER
BANK
STAGE
RIVER
STREAM
Can be learned (e.g. Latent Semantic Analysis),
but is this representation flexible enough?
How are these learned?
5
Overview
  • I Probabilistic Topic Models
  • generative model
  • statistical inference Gibbs sampling
  • II Explaining human memory
  • word association
  • semantic isolation
  • false memory
  • III Information retrieval

6
Probabilistic Topic Models
  • Extract topics from large text collections
  • ? unsupervised
  • ? generative
  • ? Bayesian statistical inference
  • Our modeling work is based on
  • pLSI Model Hoffman (1999)
  • LDA Model Blei, Ng, and Jordan (2001, 2003)
  • Topics Model Griffiths and Steyvers (2003, 2004)

7
Model input bag of words
  • Matrix of number of times words occur in
    documents
  • Note some function words are deleted the,
    a, and, etc

documents
words
8
Probabilistic Topic Models
  • A topic represents a probability distribution
    over words
  • Related words get high probability in same topic
  • Example topics extracted from NIH/NSF grants

Probability distribution over words. Most likely
words listed at the top
9
Document mixture of topics
20
Document ------------------------------- --------
--------------------------------------------------
---- ---------------------------------------------
------------------------------------------
80
100
Document ------------------------------- --------
--------------------------------------------------
---- ---------------------------------------------
------------------------------------------
10
Generative Process
  • For each document, choose a mixture of topics
  • ? ? Dirichlet(?)
  • Sample a topic 1..T from the mixture z ?
    Multinomial(?)
  • Sample a word from the topic w ?
    Multinomial(?(z)) ? ? Dirichlet(ß)

Nd
D
T
11
Prior Distributions
  • Dirichlet priors encourage sparsity on topic
    mixtures and topics

Topic 3
Word 3
Topic 1
Topic 2
Word 1
Word 2
? Dirichlet( a )
? Dirichlet( ß )
(darker colors indicate lower probability)
12
Creating Artificial Dataset
Two topics
16 documents
Docs
Can we recover the original topics and topic
mixtures from this data?
13
Statistical Inference
  • Three sets of latent variables
  • topic mixtures ?
  • word mixtures ?
  • topic assignments z
  • Estimate posterior distribution over topic
    assignments
  • P( z w )
  • (we can later infer ? and ?)

14
Statistical Inference
  • Exact inference is impossible
  • Use approximate methods
  • Markov chain Monte Carlo (MCMC) with Gibbs
    sampling

Sum over Tn terms
15
Gibbs Sampling
count of topic t assigned to doc d
count of word w assigned to topic t
probability that word i is assigned to topic t
16
Example of Gibbs Sampling
  • Assign word tokens randomly to topics

(?topic 1 ?topic 2 )
17
After 1 iteration
  • Apply sampling equation to each word token

(?topic 1 ?topic 2 )
18
After 4 iterations
(?topic 1 ?topic 2 )
19
After 8 iterations
(?topic 1 ?topic 2 )
20
After 32 iterations
?
(?topic 1 ?topic 2 )
21
Algorithm input/output
INPUT word-document counts (word order is
irrelevant)
OUTPUT topic assignments to each word P( zi
) likely words in each topic P( w z ) likely
topics in each document (gist) P( ? d )
22
Software
  • Public-domain MATLAB toolbox for topic modeling
    on the Web
  • http//psiexp.ss.uci.edu/research/programs_data/t
    oolbox.htm

23
Examples Topics from New York Times
Terrorism
Wall Street Firms
Stock Market
Bankruptcy
WEEK DOW_JONES POINTS 10_YR_TREASURY_YIELD PERCENT
CLOSE NASDAQ_COMPOSITE STANDARD_POOR CHANGE FRIDA
Y DOW_INDUSTRIALS GRAPH_TRACKS EXPECTED BILLION NA
SDAQ_COMPOSITE_INDEX EST_02 PHOTO_YESTERDAY YEN 10
500_STOCK_INDEX
WALL_STREET ANALYSTS INVESTORS FIRM GOLDMAN_SACHS
FIRMS INVESTMENT MERRILL_LYNCH COMPANIES SECURITIE
S RESEARCH STOCK BUSINESS ANALYST WALL_STREET_FIRM
S SALOMON_SMITH_BARNEY CLIENTS INVESTMENT_BANKING
INVESTMENT_BANKERS INVESTMENT_BANKS
SEPT_11 WAR SECURITY IRAQ TERRORISM NATION KILLED
AFGHANISTAN ATTACKS OSAMA_BIN_LADEN AMERICAN ATTAC
K NEW_YORK_REGION NEW MILITARY NEW_YORK WORLD NATI
ONAL QAEDA TERRORIST_ATTACKS
BANKRUPTCY CREDITORS BANKRUPTCY_PROTECTION ASSETS
COMPANY FILED BANKRUPTCY_FILING ENRON BANKRUPTCY_C
OURT KMART CHAPTER_11 FILING COOPER BILLIONS COMPA
NIES BANKRUPTCY_PROCEEDINGS DEBTS RESTRUCTURING CA
SE GROUP
24
Example topics from an educational corpus
PRINTING PAPER PRINT PRINTED TYPE PROCESS INK PRES
S IMAGE
PLAY PLAYS STAGE AUDIENCE THEATER ACTORS DRAMA SHA
KESPEARE ACTOR
TEAM GAME BASKETBALL PLAYERS PLAYER PLAY PLAYING S
OCCER PLAYED
JUDGE TRIAL COURT CASE JURY ACCUSED GUILTY DEFENDA
NT JUSTICE
HYPOTHESIS EXPERIMENT SCIENTIFIC OBSERVATIONS SCIE
NTISTS EXPERIMENTS SCIENTIST EXPERIMENTAL TEST
STUDY TEST STUDYING HOMEWORK NEED CLASS MATH TRY T
EACHER
Example topics from psych review abstracts
SIMILARITY CATEGORY CATEGORIES RELATIONS DIMENSION
S FEATURES STRUCTURE SIMILAR REPRESENTATION ONJECT
S
STIMULUS CONDITIONING LEARNING RESPONSE STIMULI RE
SPONSES AVOIDANCE REINFORCEMENT CLASSICAL DISCRIMI
NATION
MEMORY RETRIEVAL RECALL ITEMS INFORMATION TERM REC
OGNITION ITEMS LIST ASSOCIATIVE
GROUP INDIVIDUAL GROUPS OUTCOMES INDIVIDUALS GROUP
S OUTCOMES INDIVIDUALS DIFFERENCES INTERACTION
EMOTIONAL EMOTION BASIC EMOTIONS AFFECT STATES EXP
ERIENCES AFFECTIVE AFFECTS RESEARCH
25
Choosing number of topics
  • Bayesian model selection
  • Generalization test
  • e.g., perplexity on out-of-sample data
  • Non-parametric Bayesian approach
  • Number of topics grows with size of data
  • E.g. Hierarchical Dirichlet Processes (HDP)

26
Applications to Human Memory
27
Computational Problems for Semantic Memory System
  • Gist extraction
  • What is this set of words about?
  • Disambiguation
  • What is the sense of this word?
  • Prediction
  • what fact, concept, or word is next?

28
Disambiguation
  • FIELD

FOOTBALL FIELD
P( zFIELD w )
P( zFIELD w )
FIELD MAGNETIC MAGNET WIRE NEEDLE CURRENT COIL POL
ES
BALL GAME TEAM FOOTBALL BASEBALL PLAYERS PLAY FIEL
D
FIELD MAGNETIC MAGNET WIRE NEEDLE CURRENT COIL POL
ES
BALL GAME TEAM FOOTBALL BASEBALL PLAYERS PLAY FIEL
D
29
Modeling Word Association
30
Word Association(norms from Nelson et al. 1998)
CUE PLANET
31
Word Association(norms from Nelson et al. 1998)
CUE PLANET
people EARTH STARS SPACE
SUN MARS UNIVERSE SATURN GALAXY
associate number 1 2 3 4 5 6 7 8
(vocabulary 5000 words)
32
Word Association as a Prediction Problem
  • Given that a single word is observed, predict
    what other words might occur in that context
  • Under a single topic assumption

Cue
Response
33
Word Association(norms from Nelson et al. 1998)
CUE PLANET
people EARTH STARS SPACE
SUN MARS UNIVERSE SATURN GALAXY
model STARS STAR SUN
EARTH SPACE SKY PLANET UNIVERSE
associate number 1 2 3 4 5 6 7 8
First associate EARTH has rank 4 in model
34
Median rank of first associate
TOPICS
35
Median rank of first associate
LSA
TOPICS
36
Episodic Memory Semantic Isolation
EffectsFalse Memory
37
Semantic Isolation Effect
Study this list PEAS, CARROTS, BEANS, SPINACH,
LETTUCE, HAMMER, TOMATOES, CORN, CABBAGE, SQUASH
HAMMER, PEAS, CARROTS, ...
38
Semantic isolation effect / Von Restorff effect
  • Finding contextually unique words are better
    remembered
  • Verbal explanations
  • Attention, surprise, distinctiveness
  • Our approach
  • assume memories can be accessed and encoded at
    multiple levels of description
  • Semantic/ Gist aspects generic information
  • Verbatim specific information

39
Computational Problem
  • How to tradeoff specificity and generality?
  • Remembering detail and gist
  • Dual route topic model topic model encoding
    of specific words

40
Dual route topic model
  • Two ways to generate words
  • Topic Model
  • Verbatim word distribution (unique to document)
  • Each word comes from a single route
  • Switch variable xi for every word i
  • xi 0 ? topics
  • xi 1 ? verbatim
  • Conditional prob. of a word under a document

41
Graphical Model
Variable x is a switch x0 ? sample from
topic x1 ? sample from verbatim word
distribution
42
Applying Dual Route Topic Model to Human Memory
  • Train model on educational corpus (TASA)
  • 37K documents, 1700 topics
  • Apply model to list memory experiments
  • Study list is a document
  • Recall probability based on model

43
RETRIEVAL
ENCODING
Study wordsPEAS CARROTS BEANS SPINACH LETTUCE
HAMMER TOMATOES CORN CABBAGE SQUASH
Special
verbatim
44
Hunt Lamb (2001 exp. 1)
  • OUTLIER LIST
  • PEAS
  • CARROTS
  • BEANS
  • SPINACH
  • LETTUCE
  • HAMMER
  • TOMATOES
  • CORN
  • CABBAGE
  • SQUASH

CONTROL LIST SAW SCREW CHISEL DRILL
SANDPAPER HAMMER NAILS BENCH RULER ANVIL
45
False Memory(e.g. Deese, 1959 Roediger
McDermott)
Study this list Bed, Rest, Awake, Tired, Dream,
Wake, Snooze, Blanket, Doze, Slumber, Snore, Nap,
Peace, Yawn, Drowsy
SLEEP, BED, REST, ...
46
False memory effects
Number of Associates
3
6
9
MAD FEAR HATE SMOOTH NAVY HEAT SALAD TUNE COURTS C
ANDY PALACE PLUSH TOOTH BLIND WINTER
MAD FEAR HATE RAGE TEMPER FURY SALAD TUNE COURTS C
ANDY PALACE PLUSH TOOTH BLIND WINTER
MAD FEAR HATE RAGE TEMPER FURY WRATH HAPPY FIGHT C
ANDY PALACE PLUSH TOOTH BLIND WINTER
(lure ANGER)
Robinson Roediger (1997)
47
Modeling Serial Order Effects in Free Recall
48
Problem
  • Dual route model predicts no sequential effects
  • Order of words is important in human memory
    experiments
  • Standard Gibbs sampler is psychologically
    implausible
  • Assumes list is processed in parallel
  • Each item can influence encoding of each other
    item

49
Semantic isolation experiment to study order
effects
  • Study lists of 14 words long
  • 14 isolate lists (e.g. A A A B A A ... A A )
  • 14 control lists (e.g. A A A A A A ... A A )
  • Varied serial position of isolate (any of 14
    positions)

50
Immediate Recall Results
Control list A A A A A ... A
Isolate list B A A A A ... A
51
Immediate Recall Results
Control list A A A A A ... A
Isolate list B A A A A ... A
52
Immediate Recall Results
Control list A A A A A ... A
Isolate list A B A A A ... A
53
Immediate Recall Results
Control list A A A A A ... A
Isolate list A A B A A ... A
54
Immediate Recall Results
55
Modified Gibbs Sampling Scheme
  • Update items non-uniformly in Gibbs sampler
  • Probability of updating item i after observing
    words 1..t
  • ? Words further back in time are less likely to
    be re-assigned

item to update
Current time
Parameter
56
Effect of Sampling Scheme
l1
l0.3
l0
Study order
57
Normalized Serial Position Effects
DATA
MODEL
58
Information Retrieval Human Memory
59
Example
  • Searching for information on Padhraic Smyth

60
Query Smyth
61
Query Smyth irish computer science department
62
Query Smyth irish computer science department
weather prediction seasonal climate fluctuations
hmm models nips conference consultant yahoo
netflix prize dave newman steyvers
63
Problem
  • More information in a query can lead to worse
    search results
  • Human memory typically works better with more
    cues
  • Problem how can we better match queries to
    documents to allow for partial matches, and
    matches across documents?

64
Dual route model for information retrieval
  • Encode documents with two routes
  • contextually unique words ? verbatim route
  • Thematic words ? topics route

65
Example encoding of a psych review abstract
Contextually unique words ALCOVE, SCHAFFER,
MEDIN, NOSOFSKY
Kruschke, J. K.. ALCOVE An exemplar-based
connectionist model of category learning.
Psychological Review, 99, 22-44.
Topic 1 (p0.21) learning phenomena acquisition
learn acquired ... Topic 22 (p0.17) similarity
objects object space category dimensional
categories spatial Topic 61 (p0.08)
representations representation order alternative
1st higher 2nd descriptions problem form
66
Retrieval Experiments
  • For each candidate document, calculate how likely
    the query was generated from the models
    encoding

67
Information Retrieval Results
Evaluation Metric precision for 10 highest
ranked docs
FRs
APs
Method Title Desc Concepts
TFIDF .406 .434 .549
LSI .455 .469 .523
LDA .478 .463 .556
SW .488 .468 .561
SWB .495 .473 .558
Method Title Desc Concepts
TFIDF .300 .287 .483
LSI .366 .327 .487
LDA .428 .340 .487
SW .448 .407 .560
SWB .459 .400 .560
68
Information retrieval systems in the mind web
  • Similar computational demands
  • Both retrieve the most relevant items from a
    large information repository in response to
    external cues or queries.
  • Useful analogies/ interdisciplinary approaches
  • Many cognitive aspects in information retrieval
  • Internet content is produced by humans
  • Queries are formulated by humans

69
Recent Papers
  • Steyvers, M., Griffiths, T.L., Dennis, S.
    (2006). Probabilistic inference in human semantic
    memory. Trends in Cognitive Sciences, 10(7),
    327-334.
  • Griffiths, T.L., Steyvers, M., Tenenbaum,
    J.B.T. (2007). Topics in Semantic Representation.
    Psychological Review, 114(2), 211-244.
  • Griffiths, T.L., Steyvers, M., Firl, A. (in
    press). Google and the mind Predicting fluency
    with PageRank. Psychological Science.
  • Steyvers, M. Griffiths, T.L. (in press).
    Rational Analysis as a Link between Human Memory
    and Information Retrieval. In N. Chater and M
    Oaksford (Eds.) The Probabilistic Mind Prospects
    from Rational Models of Cognition. Oxford
    University Press.
  • Chemudugunta, C., Smyth, P., Steyvers, M.
    (2007, in press). Modeling General and Specific
    Aspects of Documents with a Probabilistic Topic
    Model. In Advances in Neural Information
    Processing Systems, 19.

70
Text Mining Applications
71
Topics provide quick summary of content
  • Who writes on what topics?
  • What is in this corpus? What is in this document?
  • What are the topical trends over time?
  • Who is mentioned in what context?

72
Faculty Browser
  • System spiders UCI/UCSD faculty websites related
    to CalIT2 California Institute for
    Telecommunications and Information Technology
  • Applies topic model on text extracted from pdf
    files
  • Browser demohttp//yarra.calit2.uci.edu/calit2/

73
one topic
most prolific researchers for this topic
74
one researcher
topics this researcher works on
other researchers with similar topical interests
75
Inferred network of researchers connected through
topics
76
Analyzing the New York Times
330,000 articles 2000-2002
77
Extracted Named Entities
Three investigations began Thursday into the
securities and exchange_commission's choice of
william_webster to head a new board overseeing
the accounting profession. house and
senate_democrats called for the resignations of
both judge_webster and harvey_pitt, the
commission's chairman. The white_house expressed
support for judge_webster as well as for
harvey_pitt, who was harshly criticized Thursday
for failing to inform other commissioners before
they approved the choice of judge_webster that he
had led the audit committee of a company facing
fraud accusations. The president still has
confidence in harvey_pitt, said dan_bartlett,
bush's communications director
  • Used standard algorithms to extract named
    entities
  • People
  • Places
  • Organizations

78
Standard Topic Model with Entities
79
Topic Trends
Tour-de-France
Proportion of words assigned to topic for that
time slice
Quarterly Earnings
Anthrax
80
Example of Extracted Entity-Topic Network
81
Prediction of Missing Entities in Text
Shares of XXXX slid 8 percent, or 1.10, to 12.65 Tuesday, as major credit agencies said the conglomerate would still be challenged in repaying its debts, despite raising 4.6 billion Monday in taking its finance group public. Analysts at XXXX Investors service in XXXX said they were keeping XXXX and its subsidiaries under review for a possible debt downgrade, saying the company will continue to face a significant debt burden,'' with large slices of debt coming due, over the next 18 months. XXXX said
Test article with entities removed
82
Prediction of Missing Entities in Text
Shares of XXXX slid 8 percent, or 1.10, to 12.65 Tuesday, as major credit agencies said the conglomerate would still be challenged in repaying its debts, despite raising 4.6 billion Monday in taking its finance group public. Analysts at XXXX Investors service in XXXX said they were keeping XXXX and its subsidiaries under review for a possible debt downgrade, saying the company will continue to face a significant debt burden,'' with large slices of debt coming due, over the next 18 months. XXXX said
fitch goldman-sachs lehman-brother moody morgan-stanley new-york-stock-exchange standard-and-poor tyco tyco-international wall-street worldco
Test article with entities removed
Actual missing entities
83
Prediction of Missing Entities in Text
Shares of XXXX slid 8 percent, or 1.10, to 12.65 Tuesday, as major credit agencies said the conglomerate would still be challenged in repaying its debts, despite raising 4.6 billion Monday in taking its finance group public. Analysts at XXXX Investors service in XXXX said they were keeping XXXX and its subsidiaries under review for a possible debt downgrade, saying the company will continue to face a significant debt burden,'' with large slices of debt coming due, over the next 18 months. XXXX said
fitch goldman-sachs lehman-brother moody morgan-stanley new-york-stock-exchange standard-and-poor tyco tyco-international wall-street worldco
wall-street new-york nasdaq securities-exchange-commission sec merrill-lynch new-york-stock-exchange goldman-sachs standard-and-poor
Test article with entities removed
Actual missing entities
Predicted entities given observed words (matches
in blue)
84
Model Extensions
85
Model Extensions
  • HMM-topics model
  • Modeling aspects of syntax
  • Hierarchical topic model
  • Modeling relations between topics
  • Collocation topic models
  • Learning collocations of words within topics

86
Hidden Markov Topic Model
87
Hidden Markov Topics Model
  • Syntactic dependencies ? short range dependencies
  • Semantic dependencies ? long-range

q
Semantic state generate words from topic model
z1
z2
z3
z4
w1
w2
w3
w4
Syntactic states generate words from HMM
s1
s2
s3
s4
(Griffiths, Steyvers, Blei, Tenenbaum, 2004)
88
Transition between semantic state and syntactic
states
OF 0.6 FOR 0.3 BETWEEN 0.1
0.8
z 1 0.4
z 2 0.6
HEART 0.2 LOVE 0.2 SOUL 0.2 TEARS 0.2 JOY
0.2
SCIENTIFIC 0.2 KNOWLEDGE 0.2 WORK
0.2 RESEARCH 0.2 MATHEMATICS 0.2
0.7
0.1
0.3
0.2
THE 0.6 A 0.3 MANY 0.1
0.9
89
Combining topics and syntax
x 2
OF 0.6 FOR 0.3 BETWEEN 0.1
x 1
0.8
z 1 0.4
z 2 0.6
HEART 0.2 LOVE 0.2 SOUL 0.2 TEARS 0.2 JOY
0.2
SCIENTIFIC 0.2 KNOWLEDGE 0.2 WORK
0.2 RESEARCH 0.2 MATHEMATICS 0.2
0.7
0.1
0.3
0.2
x 3
THE 0.6 A 0.3 MANY 0.1
0.9
THE
90
Combining topics and syntax
x 2
OF 0.6 FOR 0.3 BETWEEN 0.1
x 1
0.8
z 1 0.4
z 2 0.6
HEART 0.2 LOVE 0.2 SOUL 0.2 TEARS 0.2 JOY
0.2
SCIENTIFIC 0.2 KNOWLEDGE 0.2 WORK
0.2 RESEARCH 0.2 MATHEMATICS 0.2
0.7
0.1
0.3
0.2
x 3
THE 0.6 A 0.3 MANY 0.1
0.9
THE LOVE
91
Combining topics and syntax
x 2
OF 0.6 FOR 0.3 BETWEEN 0.1
x 1
0.8
z 1 0.4
z 2 0.6
HEART 0.2 LOVE 0.2 SOUL 0.2 TEARS 0.2 JOY
0.2
SCIENTIFIC 0.2 KNOWLEDGE 0.2 WORK
0.2 RESEARCH 0.2 MATHEMATICS 0.2
0.7
0.1
0.3
0.2
x 3
THE 0.6 A 0.3 MANY 0.1
0.9
THE LOVE OF
92
Combining topics and syntax
x 2
OF 0.6 FOR 0.3 BETWEEN 0.1
x 1
0.8
z 1 0.4
z 2 0.6
HEART 0.2 LOVE 0.2 SOUL 0.2 TEARS 0.2 JOY
0.2
SCIENTIFIC 0.2 KNOWLEDGE 0.2 WORK
0.2 RESEARCH 0.2 MATHEMATICS 0.2
0.7
0.1
0.3
0.2
x 3
THE 0.6 A 0.3 MANY 0.1
0.9
THE LOVE OF RESEARCH
93
Semantic topics
PLANTS PLANT LEAVES SEEDS SOIL ROOTS FLOWERS WATER
FOOD GREEN SEED STEMS FLOWER STEM LEAF ANIMALS RO
OT POLLEN GROWING GROW
GOLD IRON SILVER COPPER METAL METALS STEEL CLAY LE
AD ADAM ORE ALUMINUM MINERAL MINE STONE MINERALS P
OT MINING MINERS TIN
BEHAVIOR SELF INDIVIDUAL PERSONALITY RESPONSE SOCI
AL EMOTIONAL LEARNING FEELINGS PSYCHOLOGISTS INDIV
IDUALS PSYCHOLOGICAL EXPERIENCES ENVIRONMENT HUMAN
RESPONSES BEHAVIORS ATTITUDES PSYCHOLOGY PERSON
CELLS CELL ORGANISMS ALGAE BACTERIA MICROSCOPE MEM
BRANE ORGANISM FOOD LIVING FUNGI MOLD MATERIALS NU
CLEUS CELLED STRUCTURES MATERIAL STRUCTURE GREEN M
OLDS
DOCTOR PATIENT HEALTH HOSPITAL MEDICAL CARE PATIEN
TS NURSE DOCTORS MEDICINE NURSING TREATMENT NURSES
PHYSICIAN HOSPITALS DR SICK ASSISTANT EMERGENCY P
RACTICE
BOOK BOOKS READING INFORMATION LIBRARY REPORT PAGE
TITLE SUBJECT PAGES GUIDE WORDS MATERIAL ARTICLE
ARTICLES WORD FACTS AUTHOR REFERENCE NOTE
MAP NORTH EARTH SOUTH POLE MAPS EQUATOR WEST LINES
EAST AUSTRALIA GLOBE POLES HEMISPHERE LATITUDE PL
ACES LAND WORLD COMPASS CONTINENTS
FOOD FOODS BODY NUTRIENTS DIET FAT SUGAR ENERGY MI
LK EATING FRUITS VEGETABLES WEIGHT FATS NEEDS CARB
OHYDRATES VITAMINS CALORIES PROTEIN MINERALS
94
Syntactic classes
BE MAKE GET HAVE GO TAKE DO FIND USE SEE HELP KEEP
GIVE LOOK COME WORK MOVE LIVE EAT BECOME
ONE SOME MANY TWO EACH ALL MOST ANY THREE THIS EVE
RY SEVERAL FOUR FIVE BOTH TEN SIX MUCH TWENTY EIGH
T
HE YOU THEY I SHE WE IT PEOPLE EVERYONE OTHERS SCI
ENTISTS SOMEONE WHO NOBODY ONE SOMETHING ANYONE EV
ERYBODY SOME THEN
MORE SUCH LESS MUCH KNOWN JUST BETTER RATHER GREAT
ER HIGHER LARGER LONGER FASTER EXACTLY SMALLER SOM
ETHING BIGGER FEWER LOWER ALMOST
ON AT INTO FROM WITH THROUGH OVER AROUND AGAINST A
CROSS UPON TOWARD UNDER ALONG NEAR BEHIND OFF ABOV
E DOWN BEFORE
THE HIS THEIR YOUR HER ITS MY OUR THIS THESE A AN
THAT NEW THOSE EACH MR ANY MRS ALL
GOOD SMALL NEW IMPORTANT GREAT LITTLE LARGE BIG
LONG HIGH DIFFERENT SPECIAL OLD STRONG YOUNG COMMO
N WHITE SINGLE CERTAIN
SAID ASKED THOUGHT TOLD SAYS MEANS CALLED CRIED SH
OWS ANSWERED TELLS REPLIED SHOUTED EXPLAINED LAUGH
ED MEANT WROTE SHOWED BELIEVED WHISPERED
95
NIPS Semantics
IMAGE IMAGES OBJECT OBJECTS FEATURE RECOGNITION VI
EWS PIXEL VISUAL
KERNEL SUPPORT VECTOR SVM KERNELS SPACE FUNCTION
MACHINES SET
NETWORK NEURAL NETWORKS OUPUT INPUT TRAINING INPUT
S WEIGHTS OUTPUTS
EXPERTS EXPERT GATING HME ARCHITECTURE MIXTURE LEA
RNING MIXTURES FUNCTION GATE
MEMBRANE SYNAPTIC CELL CURRENT DENDRITIC POTENTI
AL NEURON CONDUCTANCE CHANNELS
DATA GAUSSIAN MIXTURE LIKELIHOOD POSTERIOR PRIOR D
ISTRIBUTION EM BAYESIAN PARAMETERS
STATE POLICY VALUE FUNCTION ACTION REINFORCEMENT L
EARNING CLASSES OPTIMAL
NIPS Syntax
IN WITH FOR ON FROM AT USING INTO OVER WITHIN
I X T N - C F P
IS WAS HAS BECOMES DENOTES BEING REMAINS REPRESENT
S EXISTS SEEMS
SEE SHOW NOTE CONSIDER ASSUME PRESENT NEED PROPOSE
DESCRIBE SUGGEST
MODEL ALGORITHM SYSTEM CASE PROBLEM NETWORK METHOD
APPROACH PAPER PROCESS
HOWEVER ALSO THEN THUS THEREFORE FIRST HERE NOW HE
NCE FINALLY
USED TRAINED OBTAINED DESCRIBED GIVEN FOUND PRESEN
TED DEFINED GENERATED SHOWN
96
Random sentence generation
LANGUAGE S RESEARCHERS GIVE THE SPEECH S THE
SOUND FEEL NO LISTENERS S WHICH WAS TO BE
MEANING S HER VOCABULARIES STOPPED WORDS S HE
EXPRESSLY WANTED THAT BETTER VOWEL
97
Nested Chinese Restaurant Process
98
Topic Hierarchies
  • In regular topic model, no relations between
    topics

topic 1
topic 2
topic 3
  • Nested Chinese Restaurant Process
  • Blei, Griffiths, Jordan, Tenenbaum (2004)
  • Learn hierarchical structure, as well as topics
    within structure

topic 6
topic 4
topic 5
topic 7
99
Example Psych Review Abstracts
THE OF AND TO IN A IS
A MODEL MEMORY FOR MODELS TASK INFORMATION RESULTS
ACCOUNT
SELF SOCIAL PSYCHOLOGY RESEARCH RISK STRATEGIES IN
TERPERSONAL PERSONALITY SAMPLING
MOTION VISUAL SURFACE BINOCULAR RIVALRY CONTOUR DI
RECTION CONTOURS SURFACES
DRUG FOOD BRAIN AROUSAL ACTIVATION AFFECTIVE HUNGE
R EXTINCTION PAIN
RESPONSE STIMULUS REINFORCEMENT RECOGNITION STIMUL
I RECALL CHOICE CONDITIONING
SPEECH READING WORDS MOVEMENT MOTOR VISUAL WORD SE
MANTIC
ACTION SOCIAL SELF EXPERIENCE EMOTION GOALS EMOTIO
NAL THINKING
GROUP IQ INTELLIGENCE SOCIAL RATIONAL INDIVIDUAL G
ROUPS MEMBERS
SEX EMOTIONS GENDER EMOTION STRESS WOMEN HEALTH HA
NDEDNESS
REASONING ATTITUDE CONSISTENCY SITUATIONAL INFEREN
CE JUDGMENT PROBABILITIES STATISTICAL
IMAGE COLOR MONOCULAR LIGHTNESS GIBSON SUBMOVEMENT
ORIENTATION HOLOGRAPHIC
CONDITIONIN STRESS EMOTIONAL BEHAVIORAL FEAR STIMU
LATION TOLERANCE RESPONSES
100
Generative Process
THE OF AND TO IN A IS
A MODEL MEMORY FOR MODELS TASK INFORMATION RESULTS
ACCOUNT
SELF SOCIAL PSYCHOLOGY RESEARCH RISK STRATEGIES IN
TERPERSONAL PERSONALITY SAMPLING
MOTION VISUAL SURFACE BINOCULAR RIVALRY CONTOUR DI
RECTION CONTOURS SURFACES
DRUG FOOD BRAIN AROUSAL ACTIVATION AFFECTIVE HUNGE
R EXTINCTION PAIN
RESPONSE STIMULUS REINFORCEMENT RECOGNITION STIMUL
I RECALL CHOICE CONDITIONING
SPEECH READING WORDS MOVEMENT MOTOR VISUAL WORD SE
MANTIC
ACTION SOCIAL SELF EXPERIENCE EMOTION GOALS EMOTIO
NAL THINKING
GROUP IQ INTELLIGENCE SOCIAL RATIONAL INDIVIDUAL G
ROUPS MEMBERS
SEX EMOTIONS GENDER EMOTION STRESS WOMEN HEALTH HA
NDEDNESS
REASONING ATTITUDE CONSISTENCY SITUATIONAL INFEREN
CE JUDGMENT PROBABILITIES STATISTICAL
IMAGE COLOR MONOCULAR LIGHTNESS GIBSON SUBMOVEMENT
ORIENTATION HOLOGRAPHIC
CONDITIONIN STRESS EMOTIONAL BEHAVIORAL FEAR STIMU
LATION TOLERANCE RESPONSES
101
Collocation Topic Model
102
What about collocations?
  • Why are these words related?
  • PLAY - GROUND
  • DOW - JONES
  • BUMBLE - BEE
  • Suggests at least two routes for association
  • Semantic
  • Collocation
  • ? Integrate collocations into topic model

103
Collocation Topic Model
TOPIC MIXTURE
If x0, sample a word from the topic If
x1, sample a word from the distribution based on
previous word
...
TOPIC
TOPIC
TOPIC
WORD
WORD
WORD
...
X
X
...
104
Collocation Topic Model
TOPIC MIXTURE
Example DOW JONES RISES JONES is more likely
explained as a word following DOW than as word
sampled from topic Result DOW_JONES recognized
as collocation
...
TOPIC
TOPIC
DOW
JONES
RISES
...
X1
X0
...
105
Examples Topics from New York Times
Terrorism
Wall Street Firms
Stock Market
Bankruptcy
WEEK DOW_JONES POINTS 10_YR_TREASURY_YIELD PERCENT
CLOSE NASDAQ_COMPOSITE STANDARD_POOR CHANGE FRIDA
Y DOW_INDUSTRIALS GRAPH_TRACKS EXPECTED BILLION NA
SDAQ_COMPOSITE_INDEX EST_02 PHOTO_YESTERDAY YEN 10
500_STOCK_INDEX
WALL_STREET ANALYSTS INVESTORS FIRM GOLDMAN_SACHS
FIRMS INVESTMENT MERRILL_LYNCH COMPANIES SECURITIE
S RESEARCH STOCK BUSINESS ANALYST WALL_STREET_FIRM
S SALOMON_SMITH_BARNEY CLIENTS INVESTMENT_BANKING
INVESTMENT_BANKERS INVESTMENT_BANKS
SEPT_11 WAR SECURITY IRAQ TERRORISM NATION KILLED
AFGHANISTAN ATTACKS OSAMA_BIN_LADEN AMERICAN ATTAC
K NEW_YORK_REGION NEW MILITARY NEW_YORK WORLD NATI
ONAL QAEDA TERRORIST_ATTACKS
BANKRUPTCY CREDITORS BANKRUPTCY_PROTECTION ASSETS
COMPANY FILED BANKRUPTCY_FILING ENRON BANKRUPTCY_C
OURT KMART CHAPTER_11 FILING COOPER BILLIONS COMPA
NIES BANKRUPTCY_PROCEEDINGS DEBTS RESTRUCTURING CA
SE GROUP
Write a Comment
User Comments (0)
About PowerShow.com