Title: Natural Language Processing
1Natural LanguageProcessing
2- According to research at an Elingsh
uinervtisy, it deosnt mttaer in what oredr the
ltteers in a wrod are, the olny iprmoetnt tihng
is that the frist and lsat ltteer are in the
rghit pclae. The rset can be a toatl mses and you
can sitll raed it wouthit a porbelm. Tihs is
bcuseae we do not raed ervey lteter by islelf but
the wrod as a wlohe. Ceehiro. Thatll sutff the
splelchekcer. - (Exert from Sideswipe NZ Herald 22/09/03)
31970s Example SHRDLU
Person PICK UP A BIG RED BLOCK. Computer OK.
(does it) Person GRASP THE PYRAMID. Computer
I DON'T UNDERSTAND WHICH PYRAMID YOU MEAN.
Person FIND A BLOCK WHICH IS TALLER THAN
THE ONE YOU ARE HOLDING AND PUT IT INTO THE BOX.
Computer BY "IT", I ASSUME YOU MEAN THE BLOCK
WHICH IS TALLER THAN THE ONE I AM HOLDING.
Computer OK. (does it) Person WHAT DOES THE
BOX CONTAIN? Computer THE BLUE PYRAMID AND THE
BLUE BLOCK. Person WHAT IS THE PYRAMID
SUPPORTED BY? Computer THE BOX. Person HOW
MANY BLOCKS ARE NOT IN THE BOX? Computer
FOUR OF THEM. Person IS AT LEAST ONE OF THEM
NARROWER THAN THE ONE WHICH I TOLD YOU
TO PICK UP? Computer YES, THE RED CUBE.
Terry Winograd. 1971. MIT Ph.D. Thesis.
Terry Winograd
4Pomegranade
5Natural language processing (NLP) Human Language
Technology (HLT), Natural Language Engineering
(NLE)
- is considered a sub-field of artificial
intelligence and has significant overlap with the
field of computational linguistics. It is
concerned with the interactions between computers
and human (natural) languages. - Natural language generation systems convert
information from computer databases into readable
human language. - Natural language understanding systems convert
human language into representations that are
easier for computer programs to manipulate. - The term natural language is used to distinguish
human languages (e.g. English, Persian, Swedish)
from formal or computer languages (e.g. C,
Prolog). - NLP encompasses both text and speech, but work on
speech processing has evolved into a separate
field.
6Where does it fit in the CS taxonomy?
Computers
Artificial Intelligence
Algorithms
Databases
Networking
Search
Robotics
Natural Language Processing
Information Retrieval
Machine Translation
Language Analysis
Semantics
Parsing
7Applications
- Yahoo, Google, Microsoft
Information Retrieval - Monster.com, HotJobs.com (Job finders)
Information Extraction -
Information Retrieval - Systran powers Babelfish, Google
Machine Translation - Ask Jeeves
Question Answering - Myspace, Facebook, Blogspot Processing of
User- - Generated Content
- Tools for business intelligence
- All Big Companies have (several)
- strong NLP research labs IBM, Microsoft,
ATT, Xerox, - Sun, etc.
- Academia research in an university
-
environment
8What is NLP?
- Combination of computational linguistics,
artificial intelligence cognitive science. - Concentrates on interpreting text using a
combination of - lexical, syntactic, semantic and real world
knowledge. - Applications include intelligent translators,
speech recognition software, information
management tools and other types of communication
software.
9Grammar
- The grammar of a language is a description of the
structure of that language. - Grammars provide a scheme for specifying the
structure of sentences and rules for combining
words into correct phrases and clauses.
10English Grammar
- English word order follows a Subject-Object-Verb
(SVO) linguistic topology. - The subject of a verb is the doer of the verb,
and the object is the doee.
The cat is drinking the milk.
Subject Verb Object
11Syntax
- Syntax is the study of the rules, or patterns,
that govern the way the words in a sentence come
together. - Syntax deals with how different words which are
categorised into parts of speech (nouns,
adjectives, verbs etc), and how they are combined
into clauses, or phrases, which in turn combine
into sentences.
12Syntactic Analysis
- Syntactic analysis involves isolating phrases and
sentences into a hierarchical structure, allowing
the study of its constituents. - For example the sentence the big cat is drinking
milk can be broken up into the following
constituents
13Syntactic Analysis
The big cat is drinking milk The big cat is drinking milk The big cat is drinking milk The big cat is drinking milk The big cat is drinking milk The big cat is drinking milk
Noun Phrase Noun Phrase Noun Phrase Verb Phrase Verb Phrase Verb Phrase
Determiner Adjective Phrase Noun Auxiliary Verb Noun Phrase
The big cat is drinking milk
14Implementation- Prolog
A Grammar for a very small fragment of English
sentence --gt noun_phrase, verb_phrase.
noun_phrase --gt determiner, noun. noun_phrase
--gt proper_noun. determiner --gtthe.
determiner --gta. proper_noun --gtpedro.
noun --gtman. noun --gtapple. verb_phrase
--gt verb, noun_phrase. verb_phrase --gt verb.
verb --gteats. verb --gtsings.
15?-Â phrase(sentence, the, man, eats). yes ?-
phrase(sentence, the, man, eats, the, apple).
yes ?-Â phrase(sentence, the, apple, eats, a,
man). yes ?-Â phrase(sentence, pedro, sings,
the, pedro). no ?- phrase(sentence,eats,
apple, man). no ?- phrase(sentence,L).
16L the, man, eats, the, man L the, man,
eats, the, apple L the, man, eats, a, man
L the, man, eats, a, apple L the, man,
eats, pedro L the, man, sings, the, man
L the, man, sings, the, apple L the,
man, sings, a, man L the, man, sings, a,
apple L the, man, sings, pedro L the,
man, eats L the, man, sings L the,
apple, eats, the, man L the, apple, eats,
the, apple L the, apple, eats, a, man L
the, apple, eats, a, apple L the, apple,
eats, pedro L the, apple, sings, the, man
L the, apple, sings, the, apple L the,
apple, sings, a, man
17Issues in Syntax
- the dog ate my homework - Who did what?
- Identify the part of speech (POS)
- Dog noun ate verb homework noun
- English POS tagging
- Identify collocations
- mother in law, hot dog
-
18Chomskys Grammars
- Chomsky introduced transformational grammars
(also called transformational generative grammars
or generative grammars). - He introduced the idea of deep structures which
provide a syntactic base of language and consist
of
19Chomskys Grammars
- a series of phrase-structure (rewrite) rules
- a series of (possibly universal) rules that
generates the underlying phrase-structure of a
sentence - a series of transformations that act upon the
phrase-structure, producing more complex
sentences - a series of morphophonemic rules controlling
pronunciation.
20Chomskys Lexicon
- The lexicon, which can be thought of as a
dictionary of the language in a particular form,
lists all of the vocabulary words in the language
and associates them with their syntactic,
semantic and phonological information. - This information is represented in terms of
features.
21Chomskys Feature Terms
- For example, the entry for cat might have the
following syntactic features - Cat Noun, Count, Common,
Animate - These features are used to fill slots in a set
of phrase markers. For example, a phrase marker
requiring an animate noun ( Animate) would
find cat eligible for lexical subsitiution into
that slot, as it fulfils the requirements of
being an animate noun.
22Syntactics vs Semantics
- One of the most controversial topics in the
development of transformational grammar is the
reationship between syntax and semantics. - There is a considerable degree of interdependence
between the two, and the problem is how to
formalise this relationship.
23Phrase Structure Grammars
- Phrase-structure rules are used to describe a
given language's syntax by attempting to break
language down into its constituent parts (also
known as syntactic categories) namely phrasal
categories and lexical categories (parts of
speech). - There are many kinds of phrase-structure rules,
which themselves can be combined to generate
additional phrase-structure rules.
24Phrase Structure Grammars
- In particlar phrase-structure rules must account
for the following characteristics - All languages combine nouns (N) and verbs (V) to
express ideas about the universe. - All languages have rules determining how these
are combined into meaningful units.
25Phrase Structure Grammars
- All languages have recursion, i.e. at least one
rule that can be repeated ad infinitum - An example of this is the English use of "and",
which can link any series of two or more nouns or
two or more verbs - "His and hers and theirs and Mary's and John's...
etc. " - "He ran and jumped and played and skipped and
danced and .. etc. "
26Phrase Structure Grammar
- This would be described in Transfomational
Grammar as - A noun phrase (NP) consists of a N or NP, the
word and, and another N or NP. - A verb phrase (VP) consists of a V or VP, the
word and, and another V or VP.
27Phrase Structure Tree
Sentence Sentence Sentence Sentence Sentence
Noun Phrase Noun Phrase Verb Phrase Verb Phrase Verb Phrase
Determiner Noun Verb Noun Phrase Noun Phrase
Determiner Noun
A monkey climbs the trees
28Problems with Traditional Grammars
- They are Grammar based when natural language
isnt strictly Grammar based. - Most dont take into account language variations
and dialects. - Humans have a built in natural language processor
that can handle things machine natural language
processors cannot.
29Yoda
- When 900 years old you reach, look as good you
will not. - With you the force is.
- A brave man your Father was.
- Yoda (typically) uses the OSV linguistic topology
which is characteristic of some of the Brazilian
languages.
30Inherent Complexity
- To understand a sentence you must do more than
combine the dictionary meanings of its
constituents. - A large amount of human knowledge is assumed and
communication takes place between complex agents
in complex environments.
31Statistical approach
- Statistical Machine Translation