Title: Topics in statistical language modeling
1Topics in statistical language modeling
2- Mark Steyvers
- UC Irvine
- Josh Tenenbaum
- MIT
- Dave Blei
- CMU
- Mike Jordan
- UC Berkeley
3Latent Dirichlet Allocation (LDA)
- Each document a mixture of topics
- Each word chosen from a single topic
- Introduced by Blei, Ng, and Jordan (2001),
reinterpretation of PLSI (Hofmann, 1999) - Idea of probabilistic topics widely used
(eg. Bigi et al., 1997 Iyer Ostendorf, 1996
Ueda Saito, 2003)
4Latent Dirichlet Allocation (LDA)
- Each document a mixture of topics
- Each word chosen from a single topic
- from parameters
- from parameters
5Latent Dirichlet Allocation (LDA)
w P(wz 1) f (1)
w P(wz 2) f (2)
HEART 0.2 LOVE 0.2 SOUL 0.2 TEARS 0.2 JOY
0.2 SCIENTIFIC 0.0 KNOWLEDGE 0.0 WORK
0.0 RESEARCH 0.0 MATHEMATICS 0.0
HEART 0.0 LOVE 0.0 SOUL 0.0 TEARS 0.0 JOY
0.0 SCIENTIFIC 0.2 KNOWLEDGE 0.2 WORK
0.2 RESEARCH 0.2 MATHEMATICS 0.2
topic 1
topic 2
6Choose mixture weights for each document,
generate bag of words
q P(z 1), P(z 2) 0, 1 0.25,
0.75 0.5, 0.5 0.75, 0.25 1, 0
MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS
RESEARCH WORK SCIENTIFIC MATHEMATICS WORK
SCIENTIFIC KNOWLEDGE MATHEMATICS SCIENTIFIC
HEART LOVE TEARS KNOWLEDGE HEART
MATHEMATICS HEART RESEARCH LOVE MATHEMATICS WORK
TEARS SOUL KNOWLEDGE HEART
WORK JOY SOUL TEARS MATHEMATICS TEARS LOVE LOVE
LOVE SOUL
TEARS LOVE JOY SOUL LOVE TEARS SOUL SOUL TEARS JOY
7Generating a document
- 1. Choose ?d ? Dirichlet (?)
- 2. For each word in the document
- choose z ? Multinomial (?(d))
- choose w ? Multinomial (?(z))
q
z
z
z
w
w
w
8Inverting the generative model
- Generative model gives procedure to obtain corpus
from topics, mixing proportions - Inverting the model extracts topics ? and mixing
proportions ? from corpus - Goal describe content of documents, and be able
to identify content of new documents - All inference completely unsupervised, fixed of
topics T, words W, documents D
9Inverting the generative model
- Maximum likelihood estimation (EM)
- eg. Hofmann (1999)
- slow, local maxima
- Approximate E-steps
- VB Blei, Ng Jordan (2001)
- EP Minka Lafferty (2002)
- Bayesian inference
(via Gibbs sampling)
10Gibbs sampling in LDA
- Numerator rewards sparsity in words assigned to
topics, topics to documents - Sum in the denominator over Tn terms
- Full posterior tractable to a constant, so use
Markov chain Monte Carlo (MCMC)
11Markov chain Monte Carlo
- Sample from a Markov chain constructed to
converge to the target distribution - Allows sampling from unnormalized posterior, and
other complex distributions - Can compute approximate statistics from
intractable distributions - Gibbs sampling one such method, construct Markov
chain with conditional distributions
12Gibbs sampling in LDA
- Need full conditional distributions for variables
- Since we only sample z we need
number of times word w assigned to topic j
number of times topic j used in document d
13Gibbs sampling in LDA
iteration 1
14Gibbs sampling in LDA
iteration 1 2
15Gibbs sampling in LDA
iteration 1 2
16Gibbs sampling in LDA
iteration 1 2
17Gibbs sampling in LDA
iteration 1 2
18Gibbs sampling in LDA
iteration 1 2
19Gibbs sampling in LDA
iteration 1 2
20Gibbs sampling in LDA
iteration 1 2
21Gibbs sampling in LDA
iteration 1 2
1000
22Estimating topic distributions
Parameter estimates from posterior predictive
distributions
23A visual example Bars
sample each pixel from a mixture of topics
pixel word image document
24(No Transcript)
25(No Transcript)
26Strategy
- Markov chain Monte Carlo (MCMC) is normally slow,
so why consider using it? - In discrete models, use conjugate priors to
reduce inference to discrete variables - Several benefits
- save memory need only track sparse counts
- save time cheap updates, even with complex
dependencies between variables
27Perplexity vs. time
(not estimating Dirichlet hyperparameters ?, ?)
28Strategy
- Markov chain Monte Carlo (MCMC) is normally slow,
so why consider using it? - In discrete models, use conjugate priors to
reduce inference to discrete variables - Several benefits
- save memory need only track sparse counts
- save time cheap updates, even with complex
dependencies between variables
These properties let us explore larger, more
complex models
29Application to corpus data
- TASA corpus text from first grade to college
- 26414 word types, over 37000 documents, used
approximately 6 million word tokens - Run Gibbs for models with T 300, 500, , 1700
topics
30A selection from 500 topics P(wz j)
BRAIN NERVE SENSE SENSES ARE NERVOUS NERVES BODY S
MELL TASTE TOUCH MESSAGES IMPULSES CORD ORGANS SPI
NAL FIBERS SENSORY PAIN IS
CURRENT ELECTRICITY ELECTRIC CIRCUIT IS ELECTRICAL
VOLTAGE FLOW BATTERY WIRE WIRES SWITCH CONNECTED
ELECTRONS RESISTANCE POWER CONDUCTORS CIRCUITS TUB
E NEGATIVE
ART PAINT ARTIST PAINTING PAINTED ARTISTS MUSEUM W
ORK PAINTINGS STYLE PICTURES WORKS OWN SCULPTURE P
AINTER ARTS BEAUTIFUL DESIGNS PORTRAIT PAINTERS
STUDENTS TEACHER STUDENT TEACHERS TEACHING CLASS C
LASSROOM SCHOOL LEARNING PUPILS CONTENT INSTRUCTIO
N TAUGHT GROUP GRADE SHOULD GRADES CLASSES PUPIL G
IVEN
SPACE EARTH MOON PLANET ROCKET MARS ORBIT ASTRONAU
TS FIRST SPACECRAFT JUPITER SATELLITE SATELLITES A
TMOSPHERE SPACESHIP SURFACE SCIENTISTS ASTRONAUT S
ATURN MILES
THEORY SCIENTISTS EXPERIMENT OBSERVATIONS SCIENTIF
IC EXPERIMENTS HYPOTHESIS EXPLAIN SCIENTIST OBSERV
ED EXPLANATION BASED OBSERVATION IDEA EVIDENCE THE
ORIES BELIEVED DISCOVERED OBSERVE FACTS
31A selection from 500 topics P(wz j)
FIELD MAGNETIC MAGNET WIRE NEEDLE CURRENT COIL POL
ES IRON COMPASS LINES CORE ELECTRIC DIRECTION FORC
E MAGNETS BE MAGNETISM POLE INDUCED
STORY STORIES TELL CHARACTER CHARACTERS AUTHOR REA
D TOLD SETTING TALES PLOT TELLING SHORT FICTION AC
TION TRUE EVENTS TELLS TALE NOVEL
JOB WORK JOBS CAREER EXPERIENCE EMPLOYMENT OPPORTU
NITIES WORKING TRAINING SKILLS CAREERS POSITIONS F
IND POSITION FIELD OCCUPATIONS REQUIRE OPPORTUNITY
EARN ABLE
MIND WORLD DREAM DREAMS THOUGHT IMAGINATION MOMENT
THOUGHTS OWN REAL LIFE IMAGINE SENSE CONSCIOUSNES
S STRANGE FEELING WHOLE BEING MIGHT HOPE
SCIENCE STUDY SCIENTISTS SCIENTIFIC KNOWLEDGE WORK
RESEARCH CHEMISTRY TECHNOLOGY MANY MATHEMATICS BI
OLOGY FIELD PHYSICS LABORATORY STUDIES WORLD SCIEN
TIST STUDYING SCIENCES
BALL GAME TEAM FOOTBALL BASEBALL PLAYERS PLAY FIEL
D PLAYER BASKETBALL COACH PLAYED PLAYING HIT TENNI
S TEAMS GAMES SPORTS BAT TERRY
32A selection from 500 topics P(wz j)
FIELD MAGNETIC MAGNET WIRE NEEDLE CURRENT COIL POL
ES IRON COMPASS LINES CORE ELECTRIC DIRECTION FORC
E MAGNETS BE MAGNETISM POLE INDUCED
STORY STORIES TELL CHARACTER CHARACTERS AUTHOR REA
D TOLD SETTING TALES PLOT TELLING SHORT FICTION AC
TION TRUE EVENTS TELLS TALE NOVEL
JOB WORK JOBS CAREER EXPERIENCE EMPLOYMENT OPPORTU
NITIES WORKING TRAINING SKILLS CAREERS POSITIONS F
IND POSITION FIELD OCCUPATIONS REQUIRE OPPORTUNITY
EARN ABLE
MIND WORLD DREAM DREAMS THOUGHT IMAGINATION MOMENT
THOUGHTS OWN REAL LIFE IMAGINE SENSE CONSCIOUSNES
S STRANGE FEELING WHOLE BEING MIGHT HOPE
SCIENCE STUDY SCIENTISTS SCIENTIFIC KNOWLEDGE WORK
RESEARCH CHEMISTRY TECHNOLOGY MANY MATHEMATICS BI
OLOGY FIELD PHYSICS LABORATORY STUDIES WORLD SCIEN
TIST STUDYING SCIENCES
BALL GAME TEAM FOOTBALL BASEBALL PLAYERS PLAY FIEL
D PLAYER BASKETBALL COACH PLAYED PLAYING HIT TENNI
S TEAMS GAMES SPORTS BAT TERRY
33Evaluation Word association
Cue
PLANET
(Nelson, McEvoy Schreiber, 1998)
34Evaluation Word association
Associates
EARTH PLUTO JUPITER NEPTUNE VENUS URANUS SATURN CO
MET MARS ASTEROID
Cue
PLANET
(Nelson, McEvoy Schreiber, 1998)
35Evaluation Word association
associates
cues
36Evaluation Word association
- Comparison with Latent Semantic Analysis (LSA
Landauer Dumais, 1997) - Both algorithms applied to TASA corpus
(D gt 30,000, W gt 20,000, n gt 6,000,000) - Compare LSA cosine, inner product, with the
on-topic conditional probability
37Latent Semantic Analysis(Landauer Dumais, 1997)
co-occurrence matrix
high dimensional space
SVD
X
U D V T
38Latent Semantic Analysis(Landauer Dumais, 1997)
documents
dims
dims
documents
C U D VT
dims
words
words
vectors
LSA
Dimensionality reduction makes storage efficient,
extracts correlation
39Properties of word association
- Asymmetry
- Violation of triangle inequality
- Small world graph
40Small world graph
associates
treat association matrix as adjacency
matrix (edges indicate positive association)
cues
41Small world graph
- Properties
- short path lengths
- clustering
- power law degree distribution
- Small world graphs arise elsewhere
- social relations, biology, the internet
42Small world graph
- Properties
- short path lengths
- clustering
- power law degree distribution
- Small world graphs arise elsewhere
- social relations, biology, the internet
43What is a power law distribution?
44What is a power law distribution?
- Exponential height
- Power law wealth
45A power law in word association
Word association data
- Cue
- PLANET
- Associates
- EARTH
- PLUTO
- JUPITER
- NEPTUNE
k number of cues
(Steyvers Tenenbaum)
46The statistics of meaning
Rogets Thesaurus
- Zipfs law of meaning
- number of senses
- Rogets Thesaurus
- number of classes
k number of classes
(Steyvers Tenenbaum)
47Meanings and associations
- Word association involves words
- Meaning involves words and contexts
48Meanings and associations
- Word association involves words unipartite
- Meaning involves words and contexts
49Meanings and associations
- Word association involves words unipartite
- Meaning involves words and contexts bipartite
50Meanings and associations
CONTEXT 1
CONTEXT 2
MATHEMATICS
MYSTERY
JOY
RESEARCH
LOVE
RESEARCH
MYSTERY
JOY
LOVE
MATHEMATICS
- Power law in bipartite implies same in unipartite
- Can get word association power law from meanings
51Power law in word association
Word association data
WORDS
IN
SEMANTIC
SPACES
k number of associations
(Steyvers Tenenbaum)
52Power law in word association
Word association data
WORDS
IN
SEMANTIC
SPACES
k number of associations
(Steyvers Tenenbaum)
53Power law in word association
Word association data
Latent Semantic Analysis
k number of associations
(Steyvers Tenenbaum)
54(No Transcript)
55Probability of contaning first associate
Rank
56Meanings and associations
Topic model - P(w2w1)
Topic model - P(wz j)
k number of topics
k number of cues
57Problems
- Finding the right number of topics
- No dependencies between topics
- The bag of words assumption
- Need for a stop list
58Problems
- Finding the right number of topics
- No dependencies between topics
- The bag of words assumption
- Need for a stop list
CRP models (Blei, Jordan, Tenenbaum)
HMM syntax (Steyvers, Blei Tenenbaum)
59Problems
- Finding the right number of topics
- No dependencies between topics
- The bag of words assumption
- Need for a stop list
CRP models (Blei, Jordan, Tenenbaum)
HMM syntax (Steyvers, Blei Tenenbaum)
60Standard LDA
1
T
T corpus topics
doc3
doc1
doc2
all T topics are in each document
611
T
T corpus topics
doc3
doc1
doc2
only L topics are in each document
621
T
T corpus topics
topic identities indexed by c
doc3
doc1
doc2
only L topics are in each document
63Richer dependencies
- Nature of topic dependencies comes from prior on
assignments to documents p(c) - Inference with Gibbs is straightforward
- Boring prior pick L from T uniformly
- Some interesting priors on assignments
- Chinese restaurant process (CRP)
- nested CRP (for hierarchies)
64Chinese restaurant process
- The mth customer at an infinitely large Chinese
restaurant chooses a table with - Also Dirichlet process, infinite models (Beal,
Ghahramani, Neal, Rasmussen) - Prior on assignments one topic on each table, L
visits/document, T is unbounded
65Generating a document
- 1. Choose c by sampling L tables from the Chinese
restaurant, without replacement - 2. Choose ?d ? Dirichlet (?) (over L slots)
- 3. For each word in the document
- choose z ? Multinomial (?(d))
- choose w ? Multinomial (?(c(z)))
66Inverting the generative model
- Draw z as before, but conditioned on c
- Draw c one at a time from
- Need only track occupied tables
- Recover topics, number of occupied tables
67Model selection with the CRP
Chinese restaurant process prior
Bayes factor
68Nested CRP
- Infinitely many infinite-table restaurants
- Every table has a card for another restaurant,
forming an infinite-branching tree - L day vacation visit root restaurant first
night, go to restaurant on card the next night,
etc. - Once inside the restaurant, choose the table (and
the next restaurant) via the standard CRP
69The nested CRP as a prior
- One topic per restaurant, each document has one
topic at each of the L-levels of a tree - Each c is a path through the tree
- Collecting these paths from all documents gives a
finite subtree of used topics - Allows unsupervised learning of hierarchies
- Extends Hofmanns (1999) topic hierarchies
70Generating a document
- 1. Choose c by sampling a path from the nested
Chinese restaurant process - 2. Choose ?d ? Dirichlet (?) (over L slots)
- 3. For each word in the document
- choose z ? Multinomial (?(d))
- choose w ? Multinomial (?(c(z)))
71Inverting the generative model
- Draw z as before, but conditioned on c
- Draw c as a block from
- Need only track previously taken paths
- Recover topics, set of paths (finite subtree)
72Twelve years of NIPS
73Summary
- Letting document topics to be a subset of corpus
topics allows richer dependencies - Using Gibbs sampling makes it possible to have an
unbounded number of corpus topics - Flat model, hierarchies only two options of many
factorial, arbitrary graphs, etc
74Problems
- Finding the right number of topics
- No dependencies between topics
- The bag of words assumption
- Need for a stop list
CRP models (Blei, Jordan, Tenenbaum)
HMM syntax (Steyvers, Tenenbaum)
75Syntax and semantics from statistics
Factorization of language based on statistical
dependency patterns long-range, document
specific, dependencies short-range
dependencies constant across all documents
semantics probabilistic topics
q
z
z
z
w
w
w
x
x
x
syntax probabilistic regular grammar
76x 2
OF 0.6 FOR 0.3 BETWEEN 0.1
x 1
0.8
z 1 0.4
z 2 0.6
HEART 0.2 LOVE 0.2 SOUL 0.2 TEARS 0.2 JOY
0.2
SCIENTIFIC 0.2 KNOWLEDGE 0.2 WORK
0.2 RESEARCH 0.2 MATHEMATICS 0.2
0.7
0.1
0.3
0.2
x 3
THE 0.6 A 0.3 MANY 0.1
0.9
77x 2
OF 0.6 FOR 0.3 BETWEEN 0.1
x 1
0.8
z 1 0.4
z 2 0.6
HEART 0.2 LOVE 0.2 SOUL 0.2 TEARS 0.2 JOY
0.2
SCIENTIFIC 0.2 KNOWLEDGE 0.2 WORK
0.2 RESEARCH 0.2 MATHEMATICS 0.2
0.7
0.1
0.3
0.2
x 3
THE 0.6 A 0.3 MANY 0.1
0.9
THE
78x 2
OF 0.6 FOR 0.3 BETWEEN 0.1
x 1
0.8
z 1 0.4
z 2 0.6
HEART 0.2 LOVE 0.2 SOUL 0.2 TEARS 0.2 JOY
0.2
SCIENTIFIC 0.2 KNOWLEDGE 0.2 WORK
0.2 RESEARCH 0.2 MATHEMATICS 0.2
0.7
0.1
0.3
0.2
x 3
THE 0.6 A 0.3 MANY 0.1
0.9
THE LOVE
79x 2
OF 0.6 FOR 0.3 BETWEEN 0.1
x 1
0.8
z 1 0.4
z 2 0.6
HEART 0.2 LOVE 0.2 SOUL 0.2 TEARS 0.2 JOY
0.2
SCIENTIFIC 0.2 KNOWLEDGE 0.2 WORK
0.2 RESEARCH 0.2 MATHEMATICS 0.2
0.7
0.1
0.3
0.2
x 3
THE 0.6 A 0.3 MANY 0.1
0.9
THE LOVE OF
80x 2
OF 0.6 FOR 0.3 BETWEEN 0.1
x 1
0.8
z 1 0.4
z 2 0.6
HEART 0.2 LOVE 0.2 SOUL 0.2 TEARS 0.2 JOY
0.2
SCIENTIFIC 0.2 KNOWLEDGE 0.2 WORK
0.2 RESEARCH 0.2 MATHEMATICS 0.2
0.7
0.1
0.3
0.2
x 3
THE 0.6 A 0.3 MANY 0.1
0.9
THE LOVE OF RESEARCH
81Inverting the generative model
- Sample z conditioned on x, other z
- draw from prior if x gt 1
- Sample x conditioned on z, other x
- Inference allows estimation of
- semantic topics
- syntactic classes
82Semantic topics
PLANTS PLANT LEAVES SEEDS SOIL ROOTS FLOWERS WATER
FOOD GREEN SEED STEMS FLOWER STEM LEAF ANIMALS RO
OT POLLEN GROWING GROW
GOLD IRON SILVER COPPER METAL METALS STEEL CLAY LE
AD ADAM ORE ALUMINUM MINERAL MINE STONE MINERALS P
OT MINING MINERS TIN
BEHAVIOR SELF INDIVIDUAL PERSONALITY RESPONSE SOCI
AL EMOTIONAL LEARNING FEELINGS PSYCHOLOGISTS INDIV
IDUALS PSYCHOLOGICAL EXPERIENCES ENVIRONMENT HUMAN
RESPONSES BEHAVIORS ATTITUDES PSYCHOLOGY PERSON
CELLS CELL ORGANISMS ALGAE BACTERIA MICROSCOPE MEM
BRANE ORGANISM FOOD LIVING FUNGI MOLD MATERIALS NU
CLEUS CELLED STRUCTURES MATERIAL STRUCTURE GREEN M
OLDS
DOCTOR PATIENT HEALTH HOSPITAL MEDICAL CARE PATIEN
TS NURSE DOCTORS MEDICINE NURSING TREATMENT NURSES
PHYSICIAN HOSPITALS DR SICK ASSISTANT EMERGENCY P
RACTICE
BOOK BOOKS READING INFORMATION LIBRARY REPORT PAGE
TITLE SUBJECT PAGES GUIDE WORDS MATERIAL ARTICLE
ARTICLES WORD FACTS AUTHOR REFERENCE NOTE
MAP NORTH EARTH SOUTH POLE MAPS EQUATOR WEST LINES
EAST AUSTRALIA GLOBE POLES HEMISPHERE LATITUDE PL
ACES LAND WORLD COMPASS CONTINENTS
FOOD FOODS BODY NUTRIENTS DIET FAT SUGAR ENERGY MI
LK EATING FRUITS VEGETABLES WEIGHT FATS NEEDS CARB
OHYDRATES VITAMINS CALORIES PROTEIN MINERALS
83Syntactic classes
BE MAKE GET HAVE GO TAKE DO FIND USE SEE HELP KEEP
GIVE LOOK COME WORK MOVE LIVE EAT BECOME
ONE SOME MANY TWO EACH ALL MOST ANY THREE THIS EVE
RY SEVERAL FOUR FIVE BOTH TEN SIX MUCH TWENTY EIGH
T
HE YOU THEY I SHE WE IT PEOPLE EVERYONE OTHERS SCI
ENTISTS SOMEONE WHO NOBODY ONE SOMETHING ANYONE EV
ERYBODY SOME THEN
MORE SUCH LESS MUCH KNOWN JUST BETTER RATHER GREAT
ER HIGHER LARGER LONGER FASTER EXACTLY SMALLER SOM
ETHING BIGGER FEWER LOWER ALMOST
ON AT INTO FROM WITH THROUGH OVER AROUND AGAINST A
CROSS UPON TOWARD UNDER ALONG NEAR BEHIND OFF ABOV
E DOWN BEFORE
THE HIS THEIR YOUR HER ITS MY OUR THIS THESE A AN
THAT NEW THOSE EACH MR ANY MRS ALL
GOOD SMALL NEW IMPORTANT GREAT LITTLE LARGE BIG
LONG HIGH DIFFERENT SPECIAL OLD STRONG YOUNG COMMO
N WHITE SINGLE CERTAIN
SAID ASKED THOUGHT TOLD SAYS MEANS CALLED CRIED SH
OWS ANSWERED TELLS REPLIED SHOUTED EXPLAINED LAUGH
ED MEANT WROTE SHOWED BELIEVED WHISPERED
84Bayes factors for different models
Part-of-speech tagging
85NIPS Semantics
IMAGE IMAGES OBJECT OBJECTS FEATURE RECOGNITION VI
EWS PIXEL VISUAL
KERNEL SUPPORT VECTOR SVM KERNELS SPACE FUNCTION
MACHINES SET
NETWORK NEURAL NETWORKS OUPUT INPUT TRAINING INPUT
S WEIGHTS OUTPUTS
EXPERTS EXPERT GATING HME ARCHITECTURE MIXTURE LEA
RNING MIXTURES FUNCTION GATE
MEMBRANE SYNAPTIC CELL CURRENT DENDRITIC POTENTI
AL NEURON CONDUCTANCE CHANNELS
DATA GAUSSIAN MIXTURE LIKELIHOOD POSTERIOR PRIOR D
ISTRIBUTION EM BAYESIAN PARAMETERS
STATE POLICY VALUE FUNCTION ACTION REINFORCEMENT L
EARNING CLASSES OPTIMAL
NIPS Syntax
IN WITH FOR ON FROM AT USING INTO OVER WITHIN
I X T N - C F P
IS WAS HAS BECOMES DENOTES BEING REMAINS REPRESENT
S EXISTS SEEMS
SEE SHOW NOTE CONSIDER ASSUME PRESENT NEED PROPOSE
DESCRIBE SUGGEST
MODEL ALGORITHM SYSTEM CASE PROBLEM NETWORK METHOD
APPROACH PAPER PROCESS
HOWEVER ALSO THEN THUS THEREFORE FIRST HERE NOW HE
NCE FINALLY
USED TRAINED OBTAINED DESCRIBED GIVEN FOUND PRESEN
TED DEFINED GENERATED SHOWN
86Function and content words
87Highlighting and templating
88Open questions
- Are MCMC methods useful elsewhere?
- smoothing with negative weights
- Markov chains on grammars
- Other nonparametric language models?
- infinite HMM, infinite PCFG, clustering
- Better ways of combining topics and syntax?
- richer syntactic models
- better combination schemes