Title: 4' Concreteness and Abstractness in Recursion
1Analysing Word Concreteness and Abstractness in
Dictionary Definitions
Graham Clark, Stevan Harnad, Les
Carr Intelligence, Agents, Multimedia
Group Department of Electronics and Computer
Science University of Southampton
1. Introduction The Symbol Grounding Problem
(Harnad 1990, Harnad 2002) indicates that
vocabulary must be grounded in the real, physical
world in order for the words to have meaning in
one's mind. But when words have been grounded in
this way, how can they develop into a full
vocabulary? Looking at dictionaries which use
controlled vocabularies to define all the words
within them (all words used in the definitions
are from a specified subset of the dictionary)
could give some idea as to how new words can
effectively be grounded by using a small set of
pre-grounded terms. In this investigation, two
corpora have been used, the Longman Dictionary of
Contemporary English (Longman 1997) and the
Cambridge International Dictionary of English
(Cambridge 1995). A Web-based survey was
conducted in order to categorise the words in the
two controlled vocabularies as concrete or
abstract. Concrete words are those which refer
to things that can be seen, felt or touched, for
example, "tree", "bird" or "flower". Abstract
words are those which refer to things and
properties of things that are more general or
conceptual, such as "goodness", "truth" or
abstractness.
3. Parts of Speech Figures 3-6 below show the
part-of-speech make-up of the concrete and
abstract words from the controlled vocabularies
of both corpora. The majority of concrete words
are nouns these can be easily physically
pointed out to someone, and hence grounded in the
real world. Abstract words cover a much wider
range of parts-of-speech, so more would have to
be effectively grounded through internal
processes, perhaps similar to the definition
recursion described previously.
4. Concreteness and Abstractness in
Recursion
Five concrete and five abstract words were taken
from each dictionary, and recursive definition
trees were built. Figures 7-10 show that many
more abstract words are used in definitions that
concrete. Each point on the graphs represents the
mean number of abstract, concrete or unknown
words at each level of the tree. Unknown words
account for those which are not present in the
controlled vocabulary, or those which do not
exactly match a headword. All words in the
corpora were stemmed this greatly reduced the
count of unknown words. The mean number of words
at each tree level has been scaled to take into
account the smaller proportion of concrete words
to abstract.
5. Definition Length The number of words in a
definition (the definition length) is an
indication of how many terms must be pre-grounded
in order for it to be understood. Figures 11 and
12 show frequency distribution graphs of the
definition length for the LDOCE and the CIDE. The
frequencies have been scaled to take into acount
the smaller proportion of concrete words to
abstract.
6. References Cambridge (1995). Cambridge
International Dictionary of English, CIDE
edition (electronic version), Cambridge
University Press. Harnad (1990). The symbol
grounding problem. Physica, 42, 335-346. Harnad
(2002). Symbol grounding and the origin of
language. In Scheutz, M. (Ed.) Computationalism
New Directions. MIT Press, 143-158. Longman
(1997). Longman Dictionary of Contemporary
English (LDOCE), 3rd edition (electronic
version), Addison Wesley Longman.