Introduction to Computational Linguistics - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Introduction to Computational Linguistics

Description:

Lecture 1: Intro to Field, History, Quick Review of Regular Expressions, Start ... Wikepedia: A thought experiment (from the German term Gedankenexperiment, coined ... – PowerPoint PPT presentation

Number of Views:151
Avg rating:3.0/5.0
Slides: 42
Provided by: danj172
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Computational Linguistics


1
Introduction to Computational Linguistics
  • Lecture 1 Intro to Field, History, Quick Review
    of Regular Expressions, Start of Finite Automata
  • Based on Dan Jurafskys Lecture Notes for the
    textbook, Speech and Language Processing

2
  • Overview and history of the field
  • Knowledge of language
  • The role of ambiguity
  • Models and Algorithms
  • Eliza, Turing, and conversational agents
  • History of speech and language processing
  • Regular expressions

3
Computer Speech and Language Processing
  • What is Natural Language Processing?
  • Getting computers to perform useful tasks
    involving human languages whether for
  • Enabling human-machine communication
  • information retrieval, question answer, chatbot
  • Improving human-human communication
  • writing, translation
  • Manipulating language objects
  • words, phrases, sentences

4
Kinds of knowledge needed?
  • Consider the following interaction with HAL the
    computer from 2001 A Space Odyssey
  • Dave Open the pod bay doors, Hal.
  • HAL Im sorry Dave, Im afraid I cant do that.

5
Knowledge needed to build HAL?
  • Speech recognition and synthesis
  • Dictionaries (how words are pronounced)
  • Phonetics (how to recognize/produce each sound of
    English)
  • Natural language understanding
  • Knowledge of the English words involved
  • What they mean
  • How they combine (what is a pod bay door?)
  • Knowledge of syntactic structure
  • Im I do, Sorry that afraid Dave Im cant

6
Whats needed?
  • Dialog and pragmatic knowledge (???????)
  • open the door is a REQUEST (as opposed to a
    STATEMENT or information-question)
  • Politeness consideration
  • It is polite to pretend to want to be cooperative
  • Im afraid, I cant
  • Anaphoric Reference
  • What is that in I cant do that?
  • More practical and realistic situation
  • A system to book airline flights
  • Also needs much of this kind of knowledge

7
Modern conversational agents
  • Switchboard Receptionists
  • Implications need to know how a native or
    non-native speaker of English is likely to
    pronounce a random American name. Eek.
  • Airline and other simple dialogue systems
  • Lots of fabulously fun pragmatic issues
  • Grounding
  • Error handling
  • In general
  • Now that there are commercial installations
  • Big incentive to use linguistic or any other kind
    of knowledge if it actually improves performance

8
Eliza Weizenbaum (1966)
  • User You are like my father in some ways.
  • ELIZA What resemblance do you see
  • User You are not very aggressive but I think
    you dont want me to notice that.
  • ELIZA What makes you think I am not aggressive
  • User You dont argue with me
  • ELIZA Why do you think I dont argue with you
  • User You are afraid of me
  • ELIZA Does it please you to believe I am afraid
    of you

9
Question Answering
  • What does door mean?
  • What year was Abraham Lincoln born?
  • How many states were in the United States when
    Lincoln was born?
  • Was there a military draft during the Hoover
    administration?
  • What do US scientists think about whether human
    cloning should be legal?

10
Modern QA systems
  • Still in infancy
  • Simple factoid questions beginning to work OK
  • Annual government-sponsored bakeoff called TREC

11
Machine Translation
  • ??????????,
  • ???????????????????,?????
  • ???????????,????,????,????????
  • Chinese gloss Dai-yu alone on bed top
    think-of-with-gratitude Bao-chai Aagain listen to
    window outside bamboo tip plantain leaf of on-top
    rain sound sigh drop clear cold penetrate curtain
    not feeling again fall down tears come
  • Hawkes translation As she lay there alone,
    Dai-yus thoughts turned to Bao-chai Then she
    listened to the insistent rustle of the rain on
    the bamboos and plantains outside her window.
    The coldness penetrated the curtains of her bed.
    Almost without noticing it she had begun to cry.

12
Machine Translation
  • The Story of the Stone
  • The Dream of the Red Chamber (Cao Xueqin 1792)
  • Issues (Language Divergences)
  • Sentence segmentation
  • Zero-anaphora
  • Coding of tense/aspect
  • Penetrate -gt penetrated
  • Stylistic differences across languages
  • Bamboo tip plaintain leaf -gt bamboos and
    plantains
  • Cultural knowledge
  • Curtain -gt curtains of her bed

13
Open MT Evaluation 2008
  • Input
  • ltdoc docid"AFP_CMN_20070702.0022" genre"text"
    sysid"source"gt
  • lthlgt
  • ltseg id"1"gt???????????????????lt/seggt
  • lt/hlgt
  • ltpgt
  • ltseg id"2"gt??????????????,?????????????????????
    ????????????????,???????lt/seggt
  • lt/pgt
  • ltseg id"6"gt???????????????????,?????????????????
    ??lt/seggt
  • lt/pgt
  • lt/docgt

14
Open MT Evaluation 2008
  • Outpu
  • ltdoc docid"AFP_CMN_20070702.0022"gt
  • lthlgtltseg id"1"gtWhite House Pushes for Nuclear
    Inspectors to Be Sent as Soon as Possible to
    Monitor North Korea's Closure of Its Nuclear
    Reactorslt/seggtlt/hlgtltpgt
  • ltseg id"2"gtThe White House today called for
    nuclear inspectors to be sent as soon as possible
    to monitor North Korea's closure of its nuclear
    reactors. The White House made this call after US
    President Bush had telephone conversations with
    South Korean President Roh Moo-hyun.lt/seggt .
  • ltseg id"6"gtHill, the US envoy to the six-party
    talks, said after a visit to Pyongyang last week
    that he expected the Yongbyon nuclear reactors
    would be shut down in the middle of July.lt/seggt
  • lt/pgtlt/docgt

15
Ambiguity
  • Language is full of ambiguity at all levels
  • Toke boundary ice creem vs. I screem
  • Part of speech walk as verb vs. noun
  • Word sense ambiguity money bank vs river bank
  • Fundamental problem of computational linguistics
  • Resolving ambiguity is a crucial goal
  • Example Find at least 5 meanings of this
    sentence
  • I made her duck

16
Ambiguity
  • Find at least 5 meanings of this sentence
  • I made her duck
  • I cooked waterfowl for her benefit (to eat)
  • I cooked waterfowl belonging to her
  • I created the (plaster?) duck she owns
  • I caused her to quickly lower her head or body
  • I waved my magic wand and turned her into
    undifferentiated waterfowl
  • At least one other meaning thats inappropriate
    for gentle company.

17
Ambiguity is Pervasive
  • I caused her to quickly lower her head or body
  • Lexical category duck can be a N or V
  • I cooked waterfowl belonging to her.
  • Lexical category her can be a possessive (of
    her) or dative (for her) pronoun
  • I made the (plaster) duck statue she owns
  • Lexical Semantics make can mean create or
    cook

18
Ambiguity is Pervasive
  • Grammar Make can be
  • Transitive (verb has a noun direct object)
  • I cooked waterfowl belonging to her
  • Ditransitive (verb has 2 noun objects)
  • I made her (into) undifferentiated waterfowl
  • Action-transitive (verb has a direct object and
    another verb)
  • I caused her to move her body

19
Ambiguity is Pervasive
  • Phonetics!
  • I mate or duck
  • Im eight or duck
  • Eye maid her duck
  • Aye mate, her duck
  • I maid her duck
  • Im aid her duck
  • I mate her duck
  • Im ate her duck
  • Im ate or duck
  • I mate or duck

20
Models and Algorithms
  • Models formalisms used to capture the various
    kinds of linguistic structure.
  • State machines (FSA, FS Transducers, Markov
    models)
  • Formal rule systems
  • Context-Free Grammars, Feature-based Grammars)
  • Logic (predicate calculus, inference)
  • Probabilistic versions of all of these others
  • Gaussian Mixture Models, Probabilistic relational
    models, etc etc)
  • Algorithms used to manipulate representations to
    create structure.
  • Search (A, dynamic programming)
  • Supervised learning, etc etc

21
Language, thought, and Machine Understanding
  • A Gedanken Experiment Turing Test
  • Question can a machine think is not
    operational.
  • Operational version
  • 2 people and a computer
  • Interrogator talks to contestant and computer via
    teletype
  • Task of machine is to convince interrogator it is
    human
  • Task of contestant is to convince interrogator
    she and not machine is human.

Wikepedia A thought experiment (from the German
term Gedankenexperiment, coined by Hans Christian
Ørsted) in the broadest sense is the use of a
hypothetical scenario to help us understand the
way things actually are.
22
Eliza Weizenbaum (1966)
  • User You are like my father in some ways.
  • ELIZA What resemblance do you see
  • User You are not very aggressive but I think
    you dont want me to notice that.
  • ELIZA What makes you think I am not aggressive
  • User You dont argue with me
  • ELIZA Why do you think I dont argue with you
  • User You are afraid of me
  • ELIZA Does it please you to believe I am afraid
    of you

23
History foundational insights 1940s-1950s
  • Automaton
  • Turing 1936 Turing Machine (the most powerful
    machine in abstract terms)
  • McCulloch-Pitts neuron (1943)
  • http//diwww.epfl.ch/mantra/tutorial/english/mcpit
    s/html/
  • Kleene (1951/1956) FSA recognizes a regular
    language.
  • Shannon (1948) link between automata and Markov
    models
  • Chomsky (1956)/Backus (1959)/Naur(1960) CFG
  • Probabilistic/Information-theoretic models
  • Shannon (1948)
  • Bell Labs speech recognition (1952)

24
History the two camps 1957-1970
  • Symbolic
  • Zellig Harris 1958 TDAP first parser?
  • Cascade of finite-state transducers
  • Chomsky Generative Grammar
  • AI workshop at Dartmouth (McCarthy, Minsky,
    Shannon, Rochester)
  • Newell and Simon Logic Theorist, General Problem
    Solver
  • Statistical
  • Bledsoe and Browning (1959) Bayesian OCR
  • Mosteller and Wallace (1964) Bayesian authorship
    attribution
  • Denes (1959) ASR combining grammar and acoustic
    probability

25
Four paradigms 1970-1983
  • Stochastic
  • Hidden Markov Model 1972
  • Independent application of Baker (CMU) and
    Jelinek/Bahl/Mercer lab (IBM) following work of
    Baum and colleagues at IDA
  • Logic-based
  • Colmerauer (1970,1975) Q-systems
  • Definite Clause Grammars (Pereira and Warren
    1980)
  • Kay (1979) functional grammar, Bresnan and Kaplan
    (1982) unification
  • Natural language understanding
  • Winograd (1972) Shrdlu
  • Schank and Abelson (1977) scripts, story
    understanding
  • Influence of case-role work of Fillmore (1968)
    via Simmons (1973), Schank.
  • Discourse Modeling
  • Grosz and colleagues discourse structure and
    focus
  • Perrault and Allen (1980) BDI model

26
Empiricism and Finite State Machines return
1983-1993
  • Finite State Models
  • Kaplan and Kay (1981) Phonology/Morphology
  • Church (1980) Syntax
  • Return of Probabilistic Models
  • Corpora created for language tasks
  • Early statistical versions of NLP applications
    (parsing, tagging, machine translation)
  • Increased focus on methodological rigor
  • Cant test your hypothesis on the data you used
    to build it!
  • Training sets and test sets

27
The field comes together 1994-2007
  • NLP has borrowed statistical modeling from speech
    recognition, is now standard
  • ACL conference
  • 1990 39 articles 1 statistical
  • 2003 62 articles 48 statistical
  • Machine learning techniques key
  • NLP has borrowed focus on web and search and bag
    of words models from information retrieval
  • Unified field
  • NLP, MT, ASR, TTS, Dialog, IR

28
How this course fits in
  • This is our new introductory course in natural
    language processing
  • Covers applications
  • information retrieval
  • machine translation
  • educational application
  • For speech, and dialog processing, check other
    courses by ???

29
Some brief demos
  • Machine Translation
  • http//translate.google.com/translate_t

30
Regular expressions
  • A formal language for specifying text strings
  • How can we search for any of these?
  • woodchuck
  • woodchucks
  • Woodchuck
  • Woodchucks

Figure from Dorr/Monz slides
31
Regular Expressions
  • Basic regular expression patterns
  • Perl-based syntax (slightly different from other
    notations for regular expressions)
  • Disjunctions /wWoodchuck/

Slide from Dorr/Monz
32
Regular Expressions
  • Ranges A-Z
  • Negations Ss

Slide from Dorr/Monz
33
Regular Expressions
  • Optional characters ? , and
  • ? (0 or 1)
  • /colou?r/ ? color or colour
  • (0 or more)
  • /ooh!/ ? oh! or Ooh! or Ooooh!
  • (1 or more)
  • /oh!/ ? oh! or Ooh! or Ooooh!
  • Wild cards .- /beg.n/ ? begin or began or begun

Slide from Dorr/Monz
34
Regular Expressions
  • Anchors and
  • /A-Z/ ? Ramallah, Palestine
  • /A-Z/ ? verdad? really?
  • /\./ ? It is over.
  • /./ ? ?
  • Boundaries \b and \B
  • /\bon\b/ ? on my way Monday
  • /\Bon\b/ ? automaton
  • Disjunction
  • /yoursmine/ ? it is either yours or mine

Slide from Dorr/Monz
35
Disjunction, Grouping, Precedence
  • Column 1 Column 2 Column 3 How do we
    express this?
  • /Column 0-9 /
  • /(Column 0-9 )/
  • Precedence
  • Parenthesis ()
  • Counters ?
  • Sequences and anchors the my end
  • Disjunction
  • REs are greedy!

Slide from Dorr/Monz
36
Example
  • Find me all instances of the word the in a
    text.
  • /the/
  • Misses capitalized examples
  • /tThe/
  • Returns other or theology
  • /\btThe\b/
  • /a-zA-ZtThea-zA-Z/
  • /(a-zA-Z)tThea-zA-Z/

Slide from Dorr/Monz
37
Errors
  • The process we just went through was based on two
    fixing kinds of errors
  • Matching strings that we should not have matched
    (there, then, other)
  • False positives
  • Not matching things that we should have matched
    (The)
  • False negatives

38
Errors cont.
  • Well be telling the same story for many tasks,
    all quarter. Reducing the error rate for an
    application often involves two antagonistic
    efforts
  • Increasing accuracy (minimizing false positives)
  • Increasing coverage (minimizing false negatives).

39
More complex RE example
  • Regular expressions for prices
  • /0-9/
  • Doesnt deal with fractions of dollars
  • /0-9\.0-90-9/
  • Doesnt allow 199, not word-aligned
  • \b0-9(\.0-90-9)?\b)

40
Advanced operators
Slide from Dorr/Monz
41
Conclusion
  • Overview and history of the field
  • Knowledge of language
  • The role of ambiguity
  • Models and Algorithms
  • Eliza, Turing, and conversational agents
  • History of computational linguistics
  • The merger of 4 fields NLP, Speech Recognition,
    Dialog, Information Retrieval
  • Regular expressions
  • Finite State Automata
Write a Comment
User Comments (0)
About PowerShow.com