Title: The Small World of Human Language
1The Small World of Human Language
- Ramon Ferrer i Cancho
- Richard V. Sole
- presented by Emre Erdem
2IntroductionZipfs Law (Zipf 1972)
- A complete theory of language requires a
theoretical understanding of its implicit
statistical regularities. Zips Law is the best
known
- Zipfs Law the frequency of words decays as a
power function of its rank - In spite of its relevance and universality, such
a law can be obtained by various mechanisms and
does not provide deep insight into the
organization of the language
3IntroductionLexicons
- lexicon 1.dictionary
- 2.list of vocabulary belonging to a specific
field
Human brains store lexicons that are usually
formed by thousands of words. (in the range of
words)
kernel lexicon a common lexicon for successful
basic communication
4Introduction
- Co-occurrence of words in sentences relies on the
network structure of the lexicon. - Human language can be described in terms of a
graph of word interactions. This graph has some
unexpected properties that might underlie its
diversity and flexibility, and create new
questions about its origins and organization
5Graph Properties of Human Language
- Words co-occur in sentences
- Syntactical relationships
- Stereotyped expressions or collocations
- (New York, take it easy)
6Graph Properties of Human Language Links
- Links Significant co-occurrences between words
in the same sentence.
- The most correlated words in a sentence are the
closest. - A decision must be taken about the maximum
distance considered for forming links.
- If the distance is long, the risk of capturing
spurious co-occurrences increases - If the distance is too short, certain strong
co-occurrences can be systematically not taken
into account
7Graph Properties of Human Language Links
- A toy network constructed with four sentences
- John is tall
- John drinks water
- Mary is blonde
- Mary drinks wine
The graph is constructed by linking words at a
distance one or two in the same sentence
8Graph Properties of Human Language Links
- The maximum distance is decided according to
minimum distance at which most of the
co-occurrences are likely to happen - Many co-occurrences take place at a distance of
one - red flowers (adjective-noun), stay here
(verb-adverb), can see (modal-verb), getting dark
(verb-adjective), the/this house
(article-determiner-noun) - Many co-occurrences take place at a distance of
two - hit the ball (verb-object), Mary usually cries
(subject-verb), table of wood (noun-noun through
a prepositional phrase), live in Boston
(verb-noun)
9Graph Properties of Human Language Links
- Seek will be stopped at a distance of two
- Lack of an automatic capturing technique
- Method fails to capture the exact relationships
but does capture almost every possible type of
links - We are not interested in all the relationships.
Our goal is to capture as many links as possible
through an automatic procedure. - A long-distance syntactic link implies the
existence of lower-distance syntactic links. By
contrast a short-distance link does not imply a
long-distance link
10Graph Properties of Human Language Improving
the technique
- Choose only pairs of consecutive words, the
mutual co-occurrence of which is larger than
expected by chance.
presence of correlations (co-occurances in
real case)
if this condition is used in the graph
expected from random ordering (theoretical
probability of co-occurance)
11Graph Properties of Human Language The Graph
12Graph Properties of Human Language The Graph
- Possible pattern of wiring in . Black nodes
are common words and white nodes are rare words.
Two words are linked if they occur significantly
13Graph Properties of Human Language The Small
World Properties
The small world pattern can be detected from the
analysis of two basic statistical properties
14Graph Properties of Human Language The Small
World Properties
15Graph Properties of Human Language Clustering
coefficient
define (total number of edges that exists)
the set of nearest neighbors (possible number
of edges X 2)
16Graph Properties of Human Language Average path
length
17Scaling and Small-World Patterns
UWN (Unrestricted Word Network) the networks
that results from basic method RWN (Restricted
Word Network) the networks that results from
improved method
average connectivity
18Scaling and Small-World Patterns
Distribution of degrees both the UWN and RWN
obtained after processing three-quarters of the
words
19Scaling and Small-World Patterns
More frequent a word, the more available it is
for production and comprehension. This phenomenon
is known as frequency or recency effect. This
phenomenon explains why preferential attachment
shapes the scale-free distribution of our case
For the most frequent words,
where k is the degree and f is the frequency
20 Scaling and Small-World Patterns Kernel Words
The network formed exclusively by interaction of
kernel words, hereafter called the Kernel Word
Network (KWN) better agrees with the predictions
that can be performed when preferential
attachment is at play.
21 Scaling and Small-World Patterns Kernel Words
The connectivity distribution for the kernel word
network formed by 5000 most connected vertices in
RWN The average connectivity in the kernel is
22 Scaling and Small-World Patterns Kernel Words
23Discussion
- If the SW features derive from optimal navigation
needs - Words the main purpose of which is to speed-up
navigation must exist._ - Brain disorders characterized by navigation
deficits in which such words are involved must
exist_
24 Discussion First Prediction
- 10 most connected words
- and
- the
- of
- in
- a
- to
- s
- with
- by
- is
25 Discussion Second Prediction
Agrammatism a kind of aphasia in which speech is
non-fluent, laboured, halting and lacking in
function words aphasia total or partial loss of
the ability to use or understand spoken or
written language. It is a symptom of brain
disease or injury
26Thank you