Natural Language Processing (NLP) - PowerPoint PPT Presentation

About This Presentation
Title:

Natural Language Processing (NLP)

Description:

Title: LING 180 Intro to Computer Speech and Language Lecture 1 Author: Dan Jurafsky Last modified by: Engineering Science Created Date: 1/18/2003 3:56:53 AM – PowerPoint PPT presentation

Number of Views:280
Avg rating:3.0/5.0
Slides: 28
Provided by: danj176
Category:

less

Transcript and Presenter's Notes

Title: Natural Language Processing (NLP)


1
Natural Language Processing (NLP)
  • Overview and history of the field
  • Knowledge of language
  • The role of ambiguity
  • Models and Algorithms
  • Eliza, Turing, and conversational agents
  • History of speech and language processing
  • Regular expressions

2
Computer Speech and Language Processing
  • What is it?
  • Getting computers to perform useful tasks
    involving human languages whether for
  • Enabling human-machine communication
  • Improving human-human communication
  • Doing stuff with language objects
  • Examples
  • Question Answering
  • Machine Translation
  • Spoken Conversational Agents

3
Knowledge needed to build speaking computer
  • Speech recognition and synthesis
  • Dictionaries (how words are pronounced)
  • Phonetics (how to recognize/produce each sound of
    English)
  • Natural language understanding
  • Knowledge of the English words involved
  • What they mean
  • How they combine (what is a pod bay door?)
  • Knowledge of syntactic structure
  • Im I do, Sorry that afraid Dave Im cant

4
Eliza Weizenbaum (1966)
  • User You are like my father in some ways.
  • ELIZA What resemblance do you see
  • User You are not very aggressive but I think
    you dont want me to notice that.
  • ELIZA What makes you think I am not aggressive
  • User You dont argue with me
  • ELIZA Why do you think I dont argue with you
  • User You are afraid of me
  • ELIZA Does it please you to believe I am afraid
    of you

5
Ambiguity
  • Computational linguists are obsessed with
    ambiguity
  • Ambiguity is a fundamental problem of
    computational linguistics
  • Resolving ambiguity is a crucial goal

6
Ambiguity
  • Find at least 5 meanings of this sentence
  • I made her duck

7
Ambiguity
  • Find at least 5 meanings of this sentence
  • I made her duck
  • I cooked waterfowl for her benefit (to eat)
  • I cooked waterfowl belonging to her
  • I created the (plaster?) duck she owns
  • I caused her to quickly lower her head or body
  • I waved my magic wand and turned her into
    undifferentiated waterfowl

8
Ambiguity is Pervasive
  • I caused her to quickly lower her head or body
  • Lexical category duck can be a N or V
  • I cooked waterfowl belonging to her.
  • Lexical category her can be a possessive (of
    her) or dative (for her) pronoun
  • I made the (plaster) duck statue she owns
  • Lexical Semantics make can mean create or
    cook

9
Ambiguity is Pervasive
  • Grammar Make can be
  • Transitive (verb has a noun direct object)
  • I cooked waterfowl belonging to her
  • Ditransitive (verb has 2 noun objects)
  • I made her (into) undifferentiated waterfowl
  • Action-transitive (verb has a direct object and
    another verb)
  • - I caused her to move her body

10
Ambiguity is Pervasive
  • Phonetics!
  • I mate or duck
  • Im eight or duck
  • Eye maid her duck
  • Aye mate, her duck
  • I maid her duck
  • Im aid her duck
  • I mate her duck
  • Im ate her duck
  • Im ate or duck
  • I mate or duck

11
Models and Algorithms
  • Models formalisms used to capture the various
    kinds of linguistic structure.
  • State machines (fsa, transducers, markov models)
  • Formal rule systems (context-free grammars,
    feature systems)
  • Logic (predicate calculus, inference)
  • Probabilistic versions of all of these others
    (gaussian mixture models, probabilistic
    relational models, etc etc)
  • Algorithms used to manipulate representations to
    create structure.
  • Search (A, dynamic programming)
  • Supervised learning, etc etc

12
Language, Thought, Understanding
  • A Gedanken Experiment Turing Test
  • Question can a machine think is not
    operational.
  • Operational version
  • 2 people and a computer
  • Interrogator talks to contestant and computer via
    teletype
  • Task of machine is to convince interrogator it is
    human
  • Task of contestant is to convince interrogator
    she and not machine is human.

13
History foundational insights 1940s-1950s
  • Automaton
  • Turing 1936
  • McCulloch-Pitts neuron (1943)
  • http//diwww.epfl.ch/mantra/tutorial/english/mcpit
    s/html/
  • Kleene (1951/1956)
  • Shannon (1948) link between automata and Markov
    models
  • Chomsky (1956)/Backus (1959)/Naur(1960) CFG
  • Probabilistic/Information-theoretic models
  • Shannon (1948)
  • Bell Labs speech recognition (1952)

14
History the two camps 1957-1970
  • Symbolic
  • Zellig Harris 1958 TDAP first parser
  • Cascade of finite-state transducers
  • Chomsky
  • AI workshop at Dartmouth (McCarthy, Minsky,
    Shannon, Rochester)
  • Newell and Simon Logic Theorist, General Problem
    Solver
  • Statistical
  • Bledsoe and Browning (1959) Bayesian OCR
  • Mosteller and Wallace (1964) Bayesian authorship
    attribution
  • Denes (1959) ASR combining grammar and acoustic
    probability

15
Four paradigms 1970-1983
  • Stochastic
  • Hidden Markov Model 1972
  • Independent application of Baker (CMU) and
    Jelinek/Bahl/Mercer lab (IBM) following work of
    Baum and colleagues at IDA
  • Logic-based
  • Colmerauer (1970,1975) Q-systems
  • Definite Clause Grammars (Pereira and Warren
    1980)
  • Kay (1979) functional grammar, Bresnan and Kaplan
    (1982) unification
  • Natural language understanding
  • Winograd (1972) Shrdlu
  • Schank and Abelson (1977) scripts, story
    understanding
  • Influence of case-role work of Fillmore (1968)
    via Simmons (1973), Schank.
  • Discourse Modeling
  • Grosz and colleagues discourse structure and
    focus
  • Perrault and Allen (1980) BDI model

16
Finite State Approach 83 - 93
  • Finite State Models
  • Kaplan and Kay (1981) Phonology/Morphology
  • Church (1980) Syntax
  • Return of Probabilistic Models
  • Corpora created for language tasks
  • Early statistical versions of NLP applications
    (parsing, tagging, machine translation)
  • Increased focus on methodological rigor
  • Cant test your hypothesis on the data you used
    to build it!
  • Training sets and test sets

17
The field comes together 1994-2007
  • NLP has borrowed statistical modeling from speech
    recognition, is now standard
  • ACL conference
  • 1990 39 articles 1 statistical
  • 2003 62 articles 48 statistical
  • Machine learning techniques key
  • NLP has borrowed focus on web and search and bag
    of words models from information retrieval
  • Unified field
  • NLP, MT, ASR, TTS, Dialog, IR

18
Regular expressions
  • A formal language for specifying text strings
  • How can we search for any of these?
  • woodchuck
  • woodchucks
  • Woodchuck
  • Woodchucks


19
Regular Expressions
  • Basic regular expression patterns
  • Perl-based syntax (slightly different from other
    notations for regular expressions)
  • Disjunctions /wWoodchuck/


20
Regular Expressions
  • Ranges A-Z
  • Negations Ss

21
Regular Expressions
  • Optional characters ? , and
  • ? (0 or 1)
  • /colou?r/ ? color or colour
  • (0 or more)
  • /ooh!/ ? oh! or Ooh! or Ooooh!
  • (1 or more)
  • /oh!/ ? oh! or Ooh! or Ooooh!
  • Wild cards .- /beg.n/ ? begin or began or begun


22
Regular Expressions
  • Anchors and
  • /A-Z/ ? Ramallah, Palestine
  • /A-Z/ ? verdad? really?
  • /\./ ? It is over.
  • /./ ? ?
  • Boundaries \b and \B
  • /\bon\b/ ? on my way Monday
  • /\Bon\b/ ? automaton
  • Disjunction
  • /yoursmine/ ? it is either yours or mine


23
Disjunction, Grouping, Precedence
  • Column 1 Column 2 Column 3 How do we
    express this?
  • /Column 0-9 /
  • /(Column 0-9 )/
  • Precedence
  • Parenthesis ()
  • Counters ?
  • Sequences and anchors the my end
  • Disjunction


24
Example
  • Find me all instances of the word the in a
    text.
  • /the/
  • Misses capitalized examples
  • /tThe/
  • Returns other or theology
  • /\btThe\b/
  • /a-zA-ZtThea-zA-Z/
  • /(a-zA-Z)tThea-zA-Z/


25
Errors
  • The process we just went through was based on
    fixing two kinds of errors
  • Matching strings that we should not have matched
    (there, then, other)
  • False positives
  • Not matching things that we should have matched
    (The)
  • False negatives

26
More complex RE example
  • Regular expressions for prices
  • /0-9/
  • Doesnt deal with fractions of dollars
  • /0-9\.0-90-9/
  • Doesnt allow 199, not word-aligned
  • \b0-9(\.0-90-9)?\b)

27
Advanced operators
Write a Comment
User Comments (0)
About PowerShow.com