Natural Language Processing - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Natural Language Processing

Description:

The rset can be a toatl mses and you can sitll raed it wouthit a porbelm. Tihs is bcuseae we do not raed ervey lteter by islelf but the wrod as a wlohe. Ceehiro. ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 32
Provided by: samaran1
Category:

less

Transcript and Presenter's Notes

Title: Natural Language Processing


1
Natural LanguageProcessing
2
  • According to research at an Elingsh
    uinervtisy, it deosnt mttaer in what oredr the
    ltteers in a wrod are, the olny iprmoetnt tihng
    is that the frist and lsat ltteer are in the
    rghit pclae. The rset can be a toatl mses and you
    can sitll raed it wouthit a porbelm. Tihs is
    bcuseae we do not raed ervey lteter by islelf but
    the wrod as a wlohe. Ceehiro. Thatll sutff the
    splelchekcer.
  • (Exert from Sideswipe NZ Herald 22/09/03)

3
1970s Example SHRDLU
Person PICK UP A BIG RED BLOCK. Computer OK.
(does it) Person GRASP THE PYRAMID. Computer
I DON'T UNDERSTAND WHICH PYRAMID YOU MEAN.
Person FIND A BLOCK WHICH IS TALLER THAN
THE ONE YOU ARE HOLDING AND PUT IT INTO THE BOX.
Computer BY "IT", I ASSUME YOU MEAN THE BLOCK
WHICH IS TALLER THAN THE ONE I AM HOLDING.
Computer OK. (does it) Person WHAT DOES THE
BOX CONTAIN? Computer THE BLUE PYRAMID AND THE
BLUE BLOCK. Person WHAT IS THE PYRAMID
SUPPORTED BY? Computer THE BOX. Person HOW
MANY BLOCKS ARE NOT IN THE BOX? Computer
FOUR OF THEM. Person IS AT LEAST ONE OF THEM
NARROWER THAN THE ONE WHICH I TOLD YOU
TO PICK UP? Computer YES, THE RED CUBE.
Terry Winograd. 1971. MIT Ph.D. Thesis.
Terry Winograd
4
Pomegranade
5
Natural language processing (NLP) Human Language
Technology (HLT), Natural Language Engineering
(NLE)
  • is considered a sub-field of artificial
    intelligence and has significant overlap with the
    field of computational linguistics. It is
    concerned with the interactions between computers
    and human (natural) languages.
  • Natural language generation systems convert
    information from computer databases into readable
    human language.
  • Natural language understanding systems convert
    human language into representations that are
    easier for computer programs to manipulate.
  • The term natural language is used to distinguish
    human languages (e.g. English, Persian, Swedish)
    from formal or computer languages (e.g. C,
    Prolog).
  • NLP encompasses both text and speech, but work on
    speech processing has evolved into a separate
    field.

6
Where does it fit in the CS taxonomy?
Computers

Artificial Intelligence
Algorithms
Databases
Networking
Search

Robotics
Natural Language Processing
Information Retrieval
Machine Translation
Language Analysis
Semantics
Parsing
7
Applications
  • Yahoo, Google, Microsoft
    Information Retrieval
  • Monster.com, HotJobs.com (Job finders)
    Information Extraction

  • Information Retrieval
  • Systran powers Babelfish, Google
    Machine Translation
  • Ask Jeeves
    Question Answering
  • Myspace, Facebook, Blogspot Processing of
    User-
  • Generated Content
  • Tools for business intelligence
  • All Big Companies have (several)
  • strong NLP research labs IBM, Microsoft,
    ATT, Xerox,
  • Sun, etc.
  • Academia research in an university

  • environment

8
What is NLP?
  • Combination of computational linguistics,
    artificial intelligence cognitive science.
  • Concentrates on interpreting text using a
    combination of
  • lexical, syntactic, semantic and real world
    knowledge.
  • Applications include intelligent translators,
    speech recognition software, information
    management tools and other types of communication
    software.

9
Grammar
  • The grammar of a language is a description of the
    structure of that language.
  • Grammars provide a scheme for specifying the
    structure of sentences and rules for combining
    words into correct phrases and clauses.

10
English Grammar
  • English word order follows a Subject-Object-Verb
    (SVO) linguistic topology.
  • The subject of a verb is the doer of the verb,
    and the object is the doee.

The cat is drinking the milk.
Subject Verb Object
11
Syntax
  • Syntax is the study of the rules, or patterns,
    that govern the way the words in a sentence come
    together.
  • Syntax deals with how different words which are
    categorised into parts of speech (nouns,
    adjectives, verbs etc), and how they are combined
    into clauses, or phrases, which in turn combine
    into sentences.

12
Syntactic Analysis
  • Syntactic analysis involves isolating phrases and
    sentences into a hierarchical structure, allowing
    the study of its constituents.
  • For example the sentence the big cat is drinking
    milk can be broken up into the following
    constituents

13
Syntactic Analysis
The big cat is drinking milk The big cat is drinking milk The big cat is drinking milk The big cat is drinking milk The big cat is drinking milk The big cat is drinking milk
Noun Phrase Noun Phrase Noun Phrase Verb Phrase Verb Phrase Verb Phrase
Determiner Adjective Phrase Noun Auxiliary Verb Noun Phrase
The big cat is drinking milk
14
Implementation- Prolog
A Grammar for a very small fragment of English
sentence --gt noun_phrase, verb_phrase.
noun_phrase --gt determiner, noun. noun_phrase
--gt proper_noun. determiner --gtthe.
determiner --gta. proper_noun --gtpedro.
noun --gtman. noun --gtapple. verb_phrase
--gt verb, noun_phrase. verb_phrase --gt verb.
verb --gteats. verb --gtsings.
15
?-  phrase(sentence, the, man, eats). yes ?-
phrase(sentence, the, man, eats, the, apple).
yes ?-  phrase(sentence, the, apple, eats, a,
man). yes ?-  phrase(sentence, pedro, sings,
the, pedro). no ?- phrase(sentence,eats,
apple, man). no ?- phrase(sentence,L).
16
L the, man, eats, the, man L the, man,
eats, the, apple L the, man, eats, a, man
L the, man, eats, a, apple L the, man,
eats, pedro L the, man, sings, the, man
L the, man, sings, the, apple L the,
man, sings, a, man L the, man, sings, a,
apple L the, man, sings, pedro L the,
man, eats L the, man, sings L the,
apple, eats, the, man L the, apple, eats,
the, apple L the, apple, eats, a, man L
the, apple, eats, a, apple L the, apple,
eats, pedro L the, apple, sings, the, man
L the, apple, sings, the, apple L the,
apple, sings, a, man
17
Issues in Syntax
  • the dog ate my homework - Who did what?
  • Identify the part of speech (POS)
  • Dog noun ate verb homework noun
  • English POS tagging
  • Identify collocations
  • mother in law, hot dog

18
Chomskys Grammars
  • Chomsky introduced transformational grammars
    (also called transformational generative grammars
    or generative grammars).
  • He introduced the idea of deep structures which
    provide a syntactic base of language and consist
    of

19
Chomskys Grammars
  • a series of phrase-structure (rewrite) rules
  • a series of (possibly universal) rules that
    generates the underlying phrase-structure of a
    sentence
  • a series of transformations that act upon the
    phrase-structure, producing more complex
    sentences
  • a series of morphophonemic rules controlling
    pronunciation.

20
Chomskys Lexicon
  • The lexicon, which can be thought of as a
    dictionary of the language in a particular form,
    lists all of the vocabulary words in the language
    and associates them with their syntactic,
    semantic and phonological information.
  • This information is represented in terms of
    features.

21
Chomskys Feature Terms
  • For example, the entry for cat might have the
    following syntactic features
  • Cat Noun, Count, Common,
    Animate
  • These features are used to fill slots in a set
    of phrase markers. For example, a phrase marker
    requiring an animate noun ( Animate) would
    find cat eligible for lexical subsitiution into
    that slot, as it fulfils the requirements of
    being an animate noun.

22
Syntactics vs Semantics
  • One of the most controversial topics in the
    development of transformational grammar is the
    reationship between syntax and semantics.
  • There is a considerable degree of interdependence
    between the two, and the problem is how to
    formalise this relationship.

23
Phrase Structure Grammars
  • Phrase-structure rules are used to describe a
    given language's syntax by attempting to break
    language down into its constituent parts (also
    known as syntactic categories) namely phrasal
    categories and lexical categories (parts of
    speech).
  • There are many kinds of phrase-structure rules,
    which themselves can be combined to generate
    additional phrase-structure rules.

24
Phrase Structure Grammars
  • In particlar phrase-structure rules must account
    for the following characteristics
  • All languages combine nouns (N) and verbs (V) to
    express ideas about the universe.
  • All languages have rules determining how these
    are combined into meaningful units.

25
Phrase Structure Grammars
  • All languages have recursion, i.e. at least one
    rule that can be repeated ad infinitum
  • An example of this is the English use of "and",
    which can link any series of two or more nouns or
    two or more verbs
  • "His and hers and theirs and Mary's and John's...
    etc. "
  • "He ran and jumped and played and skipped and
    danced and .. etc. "

26
Phrase Structure Grammar
  • This would be described in Transfomational
    Grammar as
  • A noun phrase (NP) consists of a N or NP, the
    word and, and another N or NP.
  • A verb phrase (VP) consists of a V or VP, the
    word and, and another V or VP.

27
Phrase Structure Tree
Sentence Sentence Sentence Sentence Sentence
Noun Phrase Noun Phrase Verb Phrase Verb Phrase Verb Phrase
Determiner Noun Verb Noun Phrase Noun Phrase
Determiner Noun
A monkey climbs the trees
28
Problems with Traditional Grammars
  • They are Grammar based when natural language
    isnt strictly Grammar based.
  • Most dont take into account language variations
    and dialects.
  • Humans have a built in natural language processor
    that can handle things machine natural language
    processors cannot.

29
Yoda
  • When 900 years old you reach, look as good you
    will not.
  • With you the force is.
  • A brave man your Father was.
  • Yoda (typically) uses the OSV linguistic topology
    which is characteristic of some of the Brazilian
    languages.

30
Inherent Complexity
  • To understand a sentence you must do more than
    combine the dictionary meanings of its
    constituents.
  • A large amount of human knowledge is assumed and
    communication takes place between complex agents
    in complex environments.

31
Statistical approach
  • Statistical Machine Translation
Write a Comment
User Comments (0)
About PowerShow.com