Title: CSA2050 Introduction to Computational Linguistics
1CSA2050 Introduction to Computational Linguistics
2Lecture 1
- Course Information
- What is CL?
- What is L?
- Course Contents
3Course Information
- Webhttp//www.cs.um.edu.mt/mros/csa2050
- Lecturersmike.rosner_at_um.edu.mtray.fabri_at_um.edu.m
tangelo.dalli_at_um.edu.mt - Book (nominally)Jurafsky Martin, Speech and
Language Processing, Prentice Hall 2000, ISBN
0-13-095069-6 - NLTK
4Human Language Technologies
- Natural Language Processing (NLP)
- Computational models of language analysis,
interpretation, and generation. - syntax/semantics interface
- Natural Language Engineering
- emphasis on large-scale performance
- example Google
- Speech Technology
- Computational Linguistics
- Emphasis on mechanised linguistic theories.
- Grew out of early Machine Translation efforts
5CL Two Main Disciplines
COMP SCI
LINGUISTICS
6Linguistics
- Phonetics The study of speech sounds
- Phonology The study of sound systems
- Morphology The study of word structure
- Syntax The study of sentence structure
- Semantics The study of meaning
- Pragmatics The study of language use
7Noam Chomsky
- Noam Chomskys work in the 1950s radically
changed linguistics, making syntax central. - Chomsky has been the dominant figure in
linguistics ever since. - Chomsky invented the generative approach to
grammar.
8Generative GrammarKey Points
- A language is a possibly infinite set of strings.
- Grammar is a finite description of that set.
- Grammar is precisely defined.
- Theory of Grammar is a theory of human linguistic
abilities. - Grammar should generate all and only the strings
of the language. - source Sag Wasow
9A Simple Grammar Lexicon
- grammar
- S ? NP VP
- NP ? N
- VP ? V NP
- lexicon
- V ? kicks
- N ? John
- N ? Bill
S
10Generative Power of a Grammar
G
L
G
L
overgeneration all but not only
undergeneration only but not all
G
L
all and only
11Formal v. Natural Languages
- Formal Languages
- Numbers3290 1 1010101
- Logic?x man(x) ? mortal(x)
- Cif (i gt10) exit(0)
- Natural Languages
- EnglishJohn saw the dog
- GermanJohann hat den hund gesehen
- MalteseGianni ra kelb
12Points of Similarity
- A language is considered to be a (possibly
infinite) set of sentences. - Sentences are sequences of tokens.
- Formation rules determine which sequences are
valid sentences. - Sentences have a definite structure.
- Sentence structure related to meaning.
13Structure Affects Meaning
I shot an elephant in my trousers
14Points of Difference
- Formal Languages
- The grammar defines the language
- Restricted application
- Non ambiguous
- Natural Languages
- The language defines the grammar
- Universal application
- Highly ambiguous
15Ambiguity
- Lexical Ambiguitythe sheep is in the pen
- Syntactic Ambiguitysmall animals and children
laugh - Semantic Ambiguityevery girl loves a sailor
- Pragmatic Ambiguitycan you pass the salt?
- The management of ambiguity is central to the
success of CL
16Algorithms and Linguistics
- Pure linguistics deals with
- data
- grammar rules
- theories about grammar rules
- Putting knowledge to some use involves
processing. - Linguistic theory is silent about implementation
issues - Implementation is central to Computational
Linguistics
17Computational Linguistics Issues
- Representation of grammar and a lexicon
- How is the structure of a given sentence actually
discovered? - Generation of a sentence to express a particular
meaning? - Learning a language with limited exposure to
grammatical sentences?
18Unimplemented theoriescan be dangerous
- Representational details omitted.
- Computer memory/complexity issues omitted.
- Nature of individual steps may be unclear.
- Difficult to test.
- Potentially unimplementable
19Computational LinguisticsTwin Goals
- Scientific GoalContribute to Linguistics by
adding a computational dimension. - Technological Goal Develop basis for machinery
capable of handling human language that can
support language engineering
20Applications of Computational Linguistics
- Machine Translation
- Information Retrieval/Extraction
- Document Classification
- Question Answering
- Style and Spell Checking
- Dialogue Systems
- Speech
21LECTURES
1 Overview
2 IE
3 POS RF
4 Tagging
5 Chunking
6 SyntaxRF
7 NL Parsing
8 NL Generation
9 MorphologyRF
10 Lexicon
11 Spell Checking
12 Dialogue
13 Speech
14 Revision