Title: LING/C SC/PSYC 438/538 Computational Linguistics
1LING/C SC/PSYC 438/538Computational Linguistics
- Sandiway Fong
- Lecture 1 8/21
2Part 1
3Administrivia
- Where
- S SCI 224
- When
- TR 1230145PM (Computer Lab)
- No Class Scheduled For
- Thursday October 18th
- Thursday November 22nd (Thanksgiving)
- Office Hours
- catch me after class, or
- by appointment
- Location Douglass 311
4Administrivia
- Map
- Office (Douglass)
- Classroom (S SCI)
5Administrivia
- Email
- sandiway_at_email.arizona.edu
- Homepage
- http//dingo.sbs.arizona.edu/sandiway
- Lecture slides
- available on homepage after each class
- in both PowerPoint (.ppt) and Adobe PDF formats
- animation in powerpoint
6Administrivia
- Course Objectives
- Theoretical
- Introduction to a broad selection of natural
language processing techniques - Survey course
- Practical
- Acquire some expertise
- Use of tools
- Parsing algorithms
- Write grammars and machines
7Administrivia
- Reference Textbook
- Speech and Language Processing, Jurafsky
Martin, Prentice-Hall 2000 - 21 chapters (900 pages)
- Concepts, algorithms, heuristics
- This course concentrates on the sentence level
stuff - Sound/speech side
- Prof. Y. Lin Speech Tech LING 578 (this semester)
- Prof. Y. Lin Statistical NLP LING 539 (Spring
2008) - More advanced course
- LING 581 Advanced Computational Linguistics
- required for HLT Masters Program students
8Administrivia
- Laboratory Exercises
- To run tools and write grammars
- you need access to computational facilities
- use your PC or Mac
- run Windows, Linux or MacOSX
- Homework exercises
9Administrivia
- Grading
- 3 homeworks
- Exams
- a mid-term
- a final
- mix of theoretical and practical exercises
10Administrivia
- Homeworks
- Homeworks will be presented/explained in class
- (good chance to ask questions)
- Please attempt homeworks early
- (then you can ask questions before the deadline)
- you have one week to do the homework
- (midnight deadline)
- (email submission to me)
- e.g. homework comes out on Thursday,
- it is due in my mailbox by next Thursday midnight
11Administrivia
- Homework Policy
- You may discuss your homework with others
- You must write up your homework by yourself
- You must cite sources and references
- Code of Academic Integrity
- http//dos.web.arizona.edu/uapolicies/cai1.html
- Late homeworks are subject to points deduction
- Really late homeworks, e.g. a week late, will not
be accepted - Emergencies and scheduled absences inform
instructor to make alternative arrangements
12Administrivia
- Requirements 438 vs. 538
- 538
-
- 438
-
- 1 classroom presentation of a selected chapter
from the textbook -
- 438 extra credit homework and exam questions are
obligatory
13Administrivia
14Class Questionnaire
- Ill pass my laptop around ...
- Use PhotoBooth
- Fill in Excel spreadsheet
- Name
- PhotoBooth
- Email
- Major
- Any programming expertise?
- Have a laptop?
- Knowledge of Linguistics?
click on red button to take a picture of yourself
15Part 2
16Human Language Technology (HLT)
- ... is everywhere
- information is organized and accessed using
language
17Human Language Technology (HLT)
- Beginnings
- c. 1950 (just after WWII)
- Electronic computers invented for
- numerical analysis
- code breaking
- Grand Challenges for Computers...
- Killer Apps
- Language comprehension tasks and Machine
Translation (MT) - References
- Readings in Machine Translation
- Eds. Nirenburg, S. et al. MIT Press 2003.
- (Part 1 Historical Perspective)
- Read Chapter 1 of the textbook
- www.cs.colorado.edu/martin/SLP/slp-ch1.pdf
18Human Language Technology (HLT)
- Cryptoanalysis Basis
- early optimism
- Translation. Weaver, W.
- Citing Shannons work, he asks
- If we have useful methods for solving almost any
cryptographic problem, may it not be that with
proper interpretation we already have useful
methods for translation?
19Human Language Technology (HLT)
- Popular in the early days and has undergone a
modern revival - The Present Status of Automatic Translation of
Languages (Bar-Hillel, 1951) - I believe this overestimation is a remnant of
the time, seven or eight years ago, when many
people thought that the statistical theory of
communication would solve many, if not all, of
the problems of communication - Much valuable time spent on gathering statistics
20Human Language Technology (HLT)
- uneasy relationship between linguistics and
statistical analysis - Statistical Methods and Linguistics (Abney, 1996)
- Chomsky vs. Shannon
- Statistics and low (zero) frequency items
- Smoothing
- No relation between order of approximation and
grammaticality - Parameter estimation problem is intractable (for
humans) - IBM (17 million parameters)
21Human Language Technology (HLT)
- recent exciting developments in HLT
- precipitated by progress in
- computers stochastic machine learning methods
- storage large amounts of training data
- general available of corpora (Linguistic Data
Consortium) - University of Arizona Library System is a
subscriber - you can borrow many CDROMs of data
22Human Language Technology (HLT)
23Natural Language Processing (NLP)Computational
Linguistics
- Question
- How to process natural languages on a computer
- Intersects with
- Computer science (CS)
- Mathematics/Statistics
- Artificial intelligence (AI)
- Linguistic Theory
- Psychology Psycholinguistics
- e.g. the human sentence processor
24Natural Language Properties
- which properties are going to be difficult for
computers to deal with? - Grammar (Rules for putting words together into
sentences) - How many rules are there?
- 100, 1000, 10000, more
- Portions learnt or innate
- Do we have all the rules written down somewhere?
- Lexicon (Dictionary)
- How many words do we need to know?
- 1000, 10000, 100000
25Computers vs. Humans
- Knowledge of language
- Computers are way faster than humans
- They kill us at arithmetic and chess
- But human beings are so good at language, we
often take our ability for granted - Processed without conscious thought
- Exhibit complex behavior
IBMs Deep Blue
26Examples
- Innate Knowledge?
- Which report did you file without reading?
- (Parasitic gap sentence)
- file(x,y)
- read(u,v)
the report was filed without reading
x you y report u x you v y report and
there are no other possible interpretations
27Examples
- Changes in interpretation
- John is too stubborn to talk to
- John is too stubborn to talk to Bill
talk_to(x,y) (1) x arbitrary person, y
John (2) x John, y Bill
28Examples
- Ambiguity
- Where can I see the bus stop?
- stop verb or part of the noun-noun compound bus
stop - Context (Discourse or situation)
- Where can I see the NN bus stop?
- Where can I see the bus V stop?
29Examples
- Ungrammaticality
- Which book did you file the report without
reading? - ?Which book did you file it without reading?
- ungrammatical
- ungrammatical vs. incomprehensible
30Example
- The human parser has quirks
- Ian told the man that he hired a secretary
- Ian told the man that he hired a story
- Garden-pathing a temporary ambiguity
- tell multiple syntactic frames for the verb
Ian told the agent that he unmasked a secret
- Ian told the man that he hired a story
- Ian told the man that he hired a secretary
31Frequently Asked Questions from the Linguistic
Society of America (LSA)
- http//www.lsadc.org/info/ling-faqs.cfm
32- LSA (Linguistic Society of America) pamphlet
- by Ray Jackendoff
- A Linguists Perspective on Whats Hard for
Computers to Do - is he right?
33If computers are so smart, why can't they use
simple English?
- Consider, for instance, the four letters read
they can be pronounced as either reed or red. How
does the machine know in each case which is the
correct pronunciation? Suppose it comes across
the following sentences - (l) The girls will read the paper. (reed)
- (2) The girls have read the paper. (red)
- We might program the machine to pronounce read as
reed if it comes right after will, and red if it
comes right after have. But then sentences (3)
through (5) would cause trouble. - (3) Will the girls read the paper? (reed)
- (4) Have any men of good will read the paper?
(red) - (5) Have the executors of the will read the
paper? (red) - How can we program the machine to make this come
out right?
34If computers are so smart, why can't they use
simple English?
- (6) Have the girls who will be on vacation next
week read the paper yet? (red) - (7) Please have the girls read the paper. (reed)
- (8) Have the girls read the paper?(red)
- Sentence (6) contains both have and will before
read, and both of them are auxiliary verbs. But
will modifies be, and have modifies read. In
order to match up the verbs with their
auxiliaries, the machine needs to know that the
girls who will be on vacation next week is a
separate phrase inside the sentence. - In sentence (7), have is not an auxiliary verb at
all, but a main verb that means something like
'cause' or 'bring about'. To get the
pronunciation right, the machine would have to be
able to recognize the difference between a
command like (7) and the very similar question in
(8), which requires the pronunciation red.
35Berkeley Parser
- http//nlp.cs.berkeley.edu/Main.htmlParsing
The Berkeley Parser is the most accurate and one
of the fastest parsers for a variety of languages.
36Berkeley Parser
- l) The girls will read the paper. (reed)
Verb Tags (Part of Speech Labels) VB - Verb, base
form? VBD - Verb, past tense? VBG - Verb, gerund
or present participle? VBN - Verb, past
participle? VBP - Verb, non-3rd person singular
present? VBZ - Verb, 3rd person singular present
37Berkeley Parser
- (2) The girls have read the paper. (red)
Verb Tags (Part of Speech Labels) VB - Verb, base
form? VBD - Verb, past tense? VBG - Verb, gerund
or present participle? VBN - Verb, past
participle? VBP - Verb, non-3rd person singular
present? VBZ - Verb, 3rd person singular present
38Berkeley Parser
- (3) Will the girls read the paper? (reed)
Verb Tags (Part of Speech Labels) VB - Verb, base
form? VBD - Verb, past tense? VBG - Verb, gerund
or present participle? VBN - Verb, past
participle? VBP - Verb, non-3rd person singular
present? VBZ - Verb, 3rd person singular present
39Berkeley Parser
- (4) Have any men of good will read the paper?
(red)
Verb Tags (Part of Speech Labels) VB - Verb, base
form? VBD - Verb, past tense? VBG - Verb, gerund
or present participle? VBN - Verb, past
participle? VBP - Verb, non-3rd person singular
present? VBZ - Verb, 3rd person singular present
40Berkeley Parser
- (5) Have the executors of the will read the
paper? (red)
Verb Tags (Part of Speech Labels) VB - Verb, base
form? VBD - Verb, past tense? VBG - Verb, gerund
or present participle? VBN - Verb, past
participle? VBP - Verb, non-3rd person singular
present? VBZ - Verb, 3rd person singular present
41Part 3
- software already installed here
42Your Homework for Today
- Download and Install Perl
- Active State Perl
- Install SWI-Prolog
http//www.SWI-Prolog.org/
43Perl Resources
- http//www.perl.com/
- tutorials etc.
- http//perldoc.perl.org/perlintro.html
44Perl Resources
Google is your friend many resources out there!
45Prolog Resources
- Useful Online Tutorials
- An introduction to Prolog
- (Michel Loiseleur Nicolas Vigier)
- http//invaders.mars-attacks.org/boklm/prolog/
- Learn Prolog Now!
- (Patrick Blackburn, Johan Bos Kristina
Striegnitz) - http//www.coli.uni-saarland.de/kris/learn-prolog
-now/lpnpage.php?pageidonline