Title: Language Independent Methods of Clustering Similar Contexts (with applications)
1Language Independent Methods of Clustering
Similar Contexts (with applications)
- Ted Pedersen
- University of Minnesota, Duluth
- tpederse_at_d.umn.edu
- http//www.d.umn.edu/tpederse/SCTutorial.html
2Language Independent Methods
- Do not utilize syntactic information
- No parsers, part of speech taggers, etc. required
- Do not utilize dictionaries or other manually
created lexical resources - Based on lexical features selected from corpora
- Assumption word segmentation can be done by
looking for white spaces between strings - No manually annotated data of any kind, methods
are completely unsupervised in the strictest sense
3Clustering Similar Contexts
- A context is a short unit of text
- often a phrase to a paragraph in length, although
it can be longer - Input N contexts
- Output K clusters
- Where each member of a cluster is a context that
is more similar to each other than to the
contexts found in other clusters
4Applications
- Headed contexts (contain target word)
- Name Discrimination
- Word Sense Discrimination
- Headless contexts
- Email Organization
- Document Clustering
- Paraphrase identification
- Clustering Sets of Related Words
5Tutorial Outline
- Identifying lexical features
- Measures of association tests of significance
- Context representations
- First second order
- Dimensionality reduction
- Singular Value Decomposition
- Clustering
- Partitional techniques
- Cluster stopping
- Cluster labeling
- Evaluation
6SenseClusters
- A package for clustering contexts
- http//senseclusters.sourceforge.net
- SenseClusters Live! (Knoppix CD)
- Integrates with various other tools
- Ngram Statistics Package
- CLUTO
- SVDPACKC
7Many thanks
- Amruta Purandare (M.S., 2004)
- Founding developer of SenseClusters (2002-2004)
- Now PhD student in Intelligent Systems at the
University of Pittsburgh http//www.cs.pitt.edu/a
mruta/ - Anagha Kulkarni (M.S., 2006, expected)
- Enhancing SenseClusters since Fall 2004!
- Will start as PhD student at CMU/LTI in Fall 2006
http//www.d.umn.edu/kulka020/ - NSF for supporting Amruta, Anagha and Ted via
CAREER award 0092784
8Background and Motivations
9Headed and Headless Contexts
- A headed context includes a target word
- Our goal is to cluster the target words based on
their surrounding contexts - Target word is center of context and our
attention - A headless context has no target word
- Our goal is to cluster the contexts based on
their similarity to each other - The focus is on the context as a whole
10Headed Contexts (input)
- I can hear the ocean in that shell.
- My operating system shell is bash.
- The shells on the shore are lovely.
- The shell command line is flexible.
- The oyster shell is very hard and black.
11Headed Contexts (output)
- Cluster 1
- My operating system shell is bash.
- The shell command line is flexible.
- Cluster 2
- The shells on the shore are lovely.
- The oyster shell is very hard and black.
- I can hear the ocean in that shell.
12Headless Contexts (input)
- The new version of Linux is more stable and has
better support for cameras. - My Chevy Malibu has had some front end troubles.
- Osborne made one of the first personal computers.
- The brakes went out, and the car flew into the
house. - With the price of gasoline, I think Ill be
taking the bus more often!
13Headless Contexts (output)
- Cluster 1
- The new version of Linux is more stable and
better support for cameras. - Osborne made one of the first personal computers.
- Cluster 2
- My Chevy Malibu has had some front end troubles.
- The brakes went out, and the car flew into the
house. - With the price of gasoline, I think Ill be
taking the bus more often!
14Web Search as Application
- Web search results are headed contexts
- Search term is target word (found in snippets)
- Web search results are often disorganized two
people sharing same name, two organizations
sharing same abbreviation, etc. often have their
pages mixed up - If you click on search results or follow links in
pages found, you will encounter headless contexts
too
15Email Foldering as Application
- Email (public or private) is made up of headless
contexts - Short, usually focused
- Cluster similar email messages together
- Automatic email foldering
- Take all messages from sent-mail file or inbox
and organize into categories
16Clustering News as Application
- News articles are headless contexts
- Entire article or first paragraph
- Short, usually focused
- Cluster similar articles together
17What is it to be similar?
- You shall know a word by the company it keeps
- Firth, 1957 (Studies in Linguistic Analysis)
- Meanings of words are (largely) determined by
their distributional patterns (Distributional
Hypothesis) - Harris, 1968 (Mathematical Structures of
Language) - Words that occur in similar contexts will have
similar meanings (Strong Contextual Hypothesis) - Miller and Charles, 1991 (Language and Cognitive
Processes) - Various extensions
- Similar contexts will have similar meanings, etc.
- Names that occur in similar contexts will refer
to the same underlying person, etc.
18General Methodology
- Represent contexts to be clustered using first or
second order feature vectors - Lexical features
- Reduce dimensionality to make vectors more
tractable and/or understandable - Singular value decomposition
- Cluster the context vectors
- Find the number of clusters
- Label the clusters
- Evaluate and/or use the contexts!
19Identifying Lexical Features
- Measures of Association and
- Tests of Significance
20What are features?
- Features represent the (hopefully) salient
characteristics of the contexts to be clustered - Eventually we will represent each context as a
vector, where the dimensions of the vector are
associated with features - Vectors/contexts that include many of the same
features will be similar to each other
21Where do features come from?
- In unsupervised clustering, it is common for the
feature selection data to be the same data that
is to be clustered - This is not cheating, since data to be clustered
does not have any labeled classes that can be
used to assist feature selection - It may also be necessary, since we may need to
cluster all available data, and not hold out some
for a separate feature identification step - Email or news articles
22Feature Selection
- Test data the contexts to be clustered
- Assume that the feature selection data is the
same as the test data, unless otherwise indicated
- Training data a separate corpus of held out
feature selection data (that will not be
clustered) - may need to use if you have a small number of
contexts to cluster (e.g., web search results) - This sense of training due to Schütze (1998)
23Lexical Features
- Unigram a single word that occurs more than a
given number of times - Bigram an ordered pair of words that occur
together more often than expected by chance - Consecutive or may have intervening words
- Co-occurrence an unordered bigram
- Target Co-occurrence a co-occurrence where one
of the words is the target word
24Bigrams
- fine wine (window size of 2)
- baseball bat
- house of representatives (window size of 3)
- president of the republic (window size of 4)
- apple orchard
- Selected using a small window size (2-4 words),
trying to capture a regular (localized) pattern
between two words (collocation?)
25Co-occurrences
- tropics water
- boat fish
- law president
- train travel
- Usually selected using a larger window (7-10
words) of context, hoping to capture pairs of
related words rather than collocations
26Bigrams and Co-occurrences
- Pairs of words tend to be much less ambiguous
than unigrams - bank versus river bank and bank card
- dot versus dot com and dot product
- Three grams and beyond occur much less frequently
(Ngrams very Zipfian) - Unigrams are noisy, but bountiful
27occur together more often than expected by
chance
- Observed frequencies for two words occurring
together and alone are stored in a 2x2 matrix - Throw out bigrams that include one or two stop
words - Expected values are calculated, based on the
model of independence and observed values - How often would you expect these words to occur
together, if they only occurred together by
chance? - If two words occur significantly more often
than the expected value, then the words do not
occur together by chance.
282x2 Contingency Table
Intelligence !Intelligence
Artificial 100.0 000.12 300.0 398.8 400
!Artificial 200.0 298.8 99,400.0 99,301.2 99,600
300 99,700 100,000
29Measures of Association
30Interpreting the Scores
- G2 and X2 are asymptotically approximated by
the chi-squared distribution - This meansif you fix the marginal totals of a
table, randomly generate internal cell values in
the table, calculate the G2 or X2 scores for
each resulting table, and plot the distribution
of the scores, you should get
31Interpreting the Scores
- Values above a certain level of significance can
be considered grounds for rejecting the null
hypothesis - H0 the words in the bigram are independent
- 3.841 is associated with 95 confidence that the
null hypothesis should be rejected
32Measures of Association
- There are numerous measures of association that
can be used to identify bigram and co-occurrence
features - Many of these are supported in the Ngram
Statistics Package (NSP) - http//www.d.umn.edu/tpederse/nsp.html
33Summary
- Identify lexical features based on frequency
counts or measures of association either in the
data to be clustered or in a separate set of
feature selection data - Language independent
- Unigrams usually only selected by frequency
- Remember, no labeled data from which to learn, so
somewhat less effective as features than in
supervised case - Bigrams and co-occurrences can also be selected
by frequency, or better yet measures of
association - Bigrams and co-occurrences need not be
consecutive - Stop words should be eliminated
- Frequency thresholds are helpful (e.g.,
unigram/bigram that occurs once may be too rare
to be useful)
34Context Representations
- First and Second Order Methods
35Once features selected
- We have a set of unigrams, bigrams,
co-occurrences or target co-occurrences - We believe/hope that these are descriptive of the
contexts - We also have frequency and measure of association
score that have been used in their selection - Convert contexts to be clustered into a vector
representation based on these features
36First Order Representation
- Each context is represented by a vector with M
dimensions, each of which indicates whether or
not a particular feature occurred in that context - Value may be binary, a frequency count, or an
association score - Context by Feature representation
37Contexts
- Cxt1 There was an island curse of black magic
cast by that voodoo child. - Cxt2 Harold, a known voodoo child, was gifted in
the arts of black magic. - Cxt3 Despite their military might, it was a
serious error to attack. - Cxt4 Military might is no defense against a
voodoo child or an island curse.
38Unigram Feature Set
- island 1000
- black 700
- curse 500
- magic 400
- child 200
- (assume these are frequency counts obtained from
some corpus)
39First Order Vectors of Unigrams
island black curse magic child
Cxt1 1 1 1 1 1
Cxt2 0 1 0 1 1
Cxt3 0 0 0 0 0
Cxt4 1 0 1 0 1
40Bigram Feature Set
- island curse 189.2
- black magic 123.5
- voodoo child 120.0
- military might 100.3
- serious error 89.2
- island child 73.2
- voodoo might 69.4
- military error 54.9
- black child 43.2
- serious curse 21.2
- (assume these are log-likelihood scores based on
frequency counts from some corpus)
41First Order Vectors of Bigrams
black magic island curse military might serious error voodoo child
Cxt1 1 1 0 0 1
Cxt2 1 0 0 0 1
Cxt3 0 0 1 1 0
Cxt4 0 1 1 0 1
42First Order Vectors
- Can have binary values or weights associated with
frequency, etc. - Forms a context by feature matrix
- May optionally be smoothed/reduced with Singular
Value Decomposition - More on that later
- The contexts are ready for clustering
- More on that later
43Second Order Features
- First order features encode the occurrence of a
feature in a context - Feature occurrence represented by binary value
- Second order features encode something extra
about a feature that occurs in a context - Feature occurrence represented by word
co-occurrences - Feature occurrence represented by context
occurrences
44Second Order Representation
- First, build word by word matrix from features
- Based on bigrams or co-occurrences
- First word is row, second word is column, cell is
score - (optionally) reduce dimensionality w/SVD
- Each row forms a vector of first order
co-occurrences - Second, replace each word in a context with its
row/vector as found in the word by word matrix - Average all the word vectors in the context to
create the second order representation - Due to Schütze (1998), related to LSI/LSA
45Word by Word Matrix
magic curse might error child
black 123.5 0 0 0 43.2
island 0 189.2 0 0 73.2
military 0 0 100.3 54.9 0
serious 0 21.2 0 89.2 0
voodoo 0 0 69.4 0 120.0
46Word by Word Matrix
- can also be used to identify sets of related
words - In the case of bigrams, rows represent the first
word in a bigram and columns represent the second
word - Matrix is asymmetric
- In the case of co-occurrences, rows and columns
are equivalent - Matrix is symmetric
- The vector (row) for each word represent a set of
first order features for that word - Each word in a context to be clustered for which
a vector exists (in the word by word matrix) is
replaced by that vector in that context
47 There was an island curse of black magic cast by
that voodoo child.
magic curse might error child
black 123.5 0 0 0 43.2
island 0 189.2 0 0 73.2
voodoo 0 0 69.4 0 120.0
48Second Order Co-Occurrences
- Word vectors for black and island show
similarity as both occur with child - black and island are second order
co-occurrence with each other, since both occur
with child but not with each other (i.e.,
black island is not observed)
49Second Order Representation
- There was an curse, child curse of magic,
child magic cast by that might, child child - curse, child magic, child might, child
50There was an island curse of black magic cast by
that voodoo child.
magic curse might error child
Cxt1 41.2 63.1 24.4 0 78.8
51Second Order Representation
- Results in a Context by Feature (Word)
Representation - Cell values do not indicate if feature occurred
in context. Rather, they show the strength of
association of that feature with other words that
occur with a word in the context.
52Summary
- First order representations are intuitive, but
- Can suffer from sparsity
- Contexts represented based on the features that
occur in those contexts - Second order representations are harder to
visualize, but - Allow a word to be represented by the words it
co-occurs with (i.e., the company it keeps) - Allows a context to be represented by the words
that occur with the words in the context - Helps combat sparsity
53Related Work
- Pedersen and Bruce 1997 (EMNLP) presented first
order method of discrimination - http//acl.ldc.upenn.edu/W/W97/W97-0322.pdf
- Schütze 1998 (Computational Linguistics)
introduced second order method - http//acl.ldc.upenn.edu/J/J98/J98-1004.pdf
- Purandare and Pedersen 2004 (CoNLL) compared
first and second order methods - http//acl.ldc.upenn.edu/hlt-naacl2004/conll0
4/pdf/purandare.pdf - First order better if you have lots of data
- Second order better with smaller amounts of data
54Dimensionality Reduction
- Singular Value Decomposition
55Effect of SVD
- SVD reduces a matrix to a given number of
dimensions This may convert a word level space
into a semantic or conceptual space - If dog and collie and wolf are
dimensions/columns in a word co-occurrence
matrix, after SVD they may be a single dimension
that represents canines
56Effect of SVD
- The dimensions of the matrix after SVD are
principal components that represent the meaning
of concepts - Similar columns are grouped together
- SVD is a way of smoothing a very sparse matrix,
so that there are very few zero valued cells
after SVD
57How can SVD be used?
- SVD on first order contexts will reduce a context
by feature representation down to a smaller
number of features - Latent Semantic Analysis typically performs SVD
on a feature by context representation, where the
contexts are reduced - SVD used in creating second order context
representations - Reduce word by word matrix
58Word by Word Matrix
apple blood cells ibm data box tissue graphics memory organ plasma
pc 2 0 0 1 3 1 0 0 0 0 0
body 0 3 0 0 0 0 2 0 0 2 1
disk 1 0 0 2 0 3 0 1 2 0 0
petri 0 2 1 0 0 0 2 0 1 0 1
lab 0 0 3 0 2 0 2 0 2 1 3
sales 0 0 0 2 3 0 0 1 2 0 0
linux 2 0 0 1 3 2 0 1 1 0 0
debt 0 0 0 2 3 4 0 2 0 0 0
59Singular Value DecompositionAUDV
60Word by Word Matrix After SVD
apple blood cells ibm data tissue graphics memory organ plasma
pc .73 .00 .11 1.3 2.0 .01 .86 .77 .00 .09
body .00 1.2 1.3 .00 .33 1.6 .00 .85 .84 1.5
disk .76 .00 .01 1.3 2.1 .00 .91 .72 .00 .00
germ .00 1.1 1.2 .00 .49 1.5 .00 .86 .77 1.4
lab .21 1.7 2.0 .35 1.7 2.5 .18 1.7 1.2 2.3
sales .73 .15 .39 1.3 2.2 .35 .85 .98 .17 .41
linux .96 .00 .16 1.7 2.7 .03 1.1 1.0 .00 .13
debt 1.2 .00 .00 2.1 3.2 .00 1.5 1.1 .00 .00
61Second Order Representation
- I got a new disk today!
- What do you think of linux?
apple blood cells ibm data tissue graphics memory organ plasma
disk .76 .00 .01 1.3 2.1 .00 .91 .72 .00 .00
linux .96 .00 .16 1.7 2.7 .03 1.1 1.0 .00 .13
- These two contexts share no words in common, yet
they are similar! disk and linux both occur with
Apple, IBM, data, graphics, and memory
- The two contexts are similar because they share
many second order co-occurrences
62Relationship to LSA
- Latent Semantic Analysis uses feature by context
first order representation - Indicates all the contexts in which a feature
occurs - Use SVD to reduce dimensions (contexts)
- Cluster features based on similarity of contexts
in which they occur - Represent sentences using an average of feature
vectors
63Feature by Context Representation
Cxt1 Cxt2 Cxt3 Cxt4
black magic 1 1 0 1
island curse 1 0 0 1
military might 0 0 1 0
serious error 0 0 1 0
voodoo child 1 1 0 1
64References
- Deerwester, S. and Dumais, S.T. and Furnas, G.W.
and Landauer, T.K. and Harshman, R., Indexing by
Latent Semantic Analysis, Journal of the American
Society for Information Science, vol. 41, 1990 - Landauer, T. and Dumais, S., A Solution to
Plato's Problem The Latent Semantic Analysis
Theory of Acquisition, Induction and
Representation of Knowledge, Psychological
Review, vol. 104, 1997 - Schütze, H, Automatic Word Sense Discrimination,
Computational Linguistics, vol. 24, 1998 - Berry, M.W. and Drmac, Z. and Jessup,
E.R.,Matrices, Vector Spaces, and Information
Retrieval, SIAM Review, vol 41, 1999
65Clustering
- Partitional Methods
- Cluster Stopping
- Cluster Labeling
66Many many methods
- Cluto supports a wide range of different
clustering methods - Agglomerative
- Average, single, complete link
- Partitional
- K-means (Direct)
- Hybrid
- Repeated bisections
- SenseClusters integrates with Cluto
- http//www-users.cs.umn.edu/karypis/cluto/
67General Methodology
- Represent contexts to be clustered in first or
second order vectors - Cluster the context vectors directly
- vcluster
- or convert to similarity matrix and then
cluster - scluster
68Partitional Methods
- Randomly create centroids equal to the number of
clusters you wish to find - Assign each context to nearest centroid
- After all contexts assigned, re-compute centroids
- best location decided by criterion function
- Repeat until stable clusters found
- Centroids dont shift from iteration to iteration
69Partitional Methods
- Advantages fast
- Disadvantages
- Results can be dependent on the initial placement
of centroids - Must specify number of clusters ahead of time
- maybe not
70Partitional Criterion Functions
- Intra-Cluster (Internal) similarity/distance
- How close together are members of a cluster?
- Closer together is better
- Inter-Cluster (External) similarity/distance
- How far apart are the different clusters?
- Further apart is better
71Intra Cluster Similarity
- Ball of String (I1)
- How far is each member from each other member
- Flower (I2)
- How far is each member of cluster from centroid
72Contexts to be Clustered
73Ball of String (I1 Internal Criterion Function)
74Flower(I2 Internal Criterion Function)
75Inter Cluster Similarity
- The Fan (E1)
- How far is each centroid from the centroid of the
entire collection of contexts - Maximize that distance
76The Fan(E1 External Criterion Function)
77Hybrid Criterion Functions
- Balance internal and external similarity
- H1 I1/E1
- H2 I2/E1
- Want internal similarity to increase, while
external similarity decreases - Want internal distances to decrease, while
external distances increase
78Cluster Stopping
79Cluster Stopping
- Many Clustering Algorithms require that the user
specify the number of clusters prior to
clustering - But, the user often doesnt know the number of
clusters, and in fact finding that out might be
the goal of clustering
80Criterion Functions Can Help
- Run partitional algorithm for k1 to deltaK
- DeltaK is a user estimated or automatically
determined upper bound for the number of clusters - Find the value of k at which the criterion
function does not significantly increase at k1 - Clustering can stop at this value, since no
further improvement in solution is apparent with
additional clusters (increases in k)
81H2 versus kT. Blair V. Putin S. Hussein
82PK2
- Based on Hartigan, 1975
- When ratio approaches 1, clustering is at a
plateau - Select value of k which is closest to but outside
of standard deviation interval
83PK2 predicts 3 sensesT. Blair V. Putin S.
Hussein
84PK3
- Related to Salvador and Chan, 2004
- Inspired by Dice Coefficient
- Values close to 1 mean clustering is improving
- Select value of k which is closest to but outside
of standard deviation interval
85PK3 predicts 3 sensesT. Blair V. Putin S.
Hussein
86References
- Hartigan, J. Clustering Algorithms, Wiley, 1975
- basis for SenseClusters stopping method PK2
- Mojena, R., Hierarchical Grouping Methods and
Stopping Rules An Evaluation, The Computer
Journal, vol 20, 1977 - basis for SenseClusters stopping method PK1
- Milligan, G. and Cooper, M., An Examination of
Procedures for Determining the Number of Clusters
in a Data Set, Psychometrika, vol. 50, 1985 - Very extensive comparison of cluster stopping
methods - Tibshirani, R. and Walther, G. and Hastie, T.,
Estimating the Number of Clusters in a Dataset
via the Gap Statistic,Journal of the Royal
Statistics Society (Series B), 2001 - Pedersen, T. and Kulkarni, A. Selecting the
"Right" Number of Senses Based on Clustering
Criterion Functions, Proceedings of the Posters
and Demo Program of the Eleventh Conference of
the European Chapter of the Association for
Computational Linguistics, 2006 - Describes SenseClusters stopping methods
87Cluster Labeling
88Cluster Labeling
- Once a cluster is discovered, how can you
generate a description of the contexts of that
cluster automatically? - In the case of contexts, you might be able to
identify significant lexical features from the
contents of the clusters, and use those as a
preliminary label
89Results of Clustering
- Each cluster consists of some number of contexts
- Each context is a short unit of text
- Apply measures of association to the contents of
each cluster to determine N most significant
bigrams - Use those bigrams as a label for the cluster
90Label Types
- The N most significant bigrams for each cluster
will act as a descriptive label - The M most significant bigrams that are unique to
each cluster will act as a discriminating label
91Evaluation Techniques
- Comparison to gold standard data
92Evaluation
- If Sense tagged text is available, can be used
for evaluation - But dont use sense tags for clustering or
feature selection! - Assume that sense tags represent true clusters,
and compare these to discovered clusters - Find mapping of clusters to senses that attains
maximum accuracy
93Evaluation
- Pseudo words are especially useful, since it is
hard to find data that is discriminated - Pick two words or names from a corpus, and
conflate them into one name. Then see how well
you can discriminate. - http//www.d.umn.edu/tpederse/tools.html
- Baseline Algorithm group all instances into one
cluster, this will reach accuracy equal to
majority classifier
94Evaluation
- Pseudo words are especially useful, since it is
hard to find data that is discriminated - Pick two or more words or names from a corpus,
and conflate them into one name. Then see how
well you can discriminate. - http//www.d.umn.edu/kulka020/kanaghaName.html
95Baseline Algorithm
- Baseline Algorithm group all instances into one
cluster, this will reach accuracy equal to
majority classifier - What if the clustering said everything should be
in the same cluster?
96Baseline Performance
S1 S2 S3 Totals
C1 0 0 0 0
C2 0 0 0 0
C3 80 35 55 170
Totals 80 35 55 170
S3 S2 S1 Totals
C1 0 0 0 0
C2 0 0 0 0
C3 55 35 80 170
Totals 55 35 80 170
- (0055)/170 .32 if C3 is S1
(0080)/170 .47 if C3 is S3
97Evaluation
- Suppose that C1 is labeled S1, C2 as S2, and C3
as S3 - Accuracy (10 0 10) / 170 12
- Diagonal shows how many members of the cluster
actually belong to the sense given on the column - Can the columns be rearranged to improve the
overall accuracy? - Optimally assign clusters to senses
-
S1 S2 S3 Totals
C1 10 30 5 45
C2 20 0 40 60
C3 50 5 10 65
Totals 80 35 55 170
98Evaluation
- The assignment of C1 to S2, C2 to S3, and C3 to
S1 results in 120/170 71 - Find the ordering of the columns in the matrix
that maximizes the sum of the diagonal. - This is an instance of the Assignment Problem
from Operations Research, or finding the Maximal
Matching of a Bipartite Graph from Graph Theory.
S2 S3 S1 Totals
C1 30 5 10 45
C2 0 40 20 60
C3 5 10 50 65
Totals 35 55 80 170
99Alternatives?
- Unsupervised methods may not discover clusters
equivalent to the classes learned in supervised
learning - Evaluation based on assuming that sense tags
represent the true cluster are likely a bit
harsh. Alternatives? - Humans could look at the members of each cluster
and determine the nature of the relationship or
meaning that they all share - Use the contents of the cluster to generate a
descriptive label that could be inspected by a
human
100Thank you!
- Questions or comments on tutorial or
SenseClusters are welcome at any time
tpederse_at_d.umn.edu - SenseClusters is freely available via LIVE CD,
the Web, and in source code form - http//senseclusters.sourceforge.net
- SenseClusters papers available at
- http//www.d.umn.edu/tpederse/senseclusters-pubs.
html