Title: Introduction to Computational Linguistics
1Introduction to Computational Linguistics
2Agenda
- Introduction to Computational Linguistics (CL)
- Common CL applications
- Using CL in theoretical linguistics
(computational modeling)
3What is Computational Linguistics?
- CL is interdisciplinary
- Linguistics
- Computer Science
- Mathematics
- Electrical Engineering
- Psychology
- Speech and Hearing Science
4What is Computational Linguistics?
- Computational Linguistics covers many areas
- Essentially, CL is any task, model, algorithm,
etc. that attempts to place any type of language
processing (syntax, phonology, morphology, etc.)
in a computational setting
5Core Areas of CL
- Machine Translation
- Speech Recognition
- Text-to-Speech
- Natural Language Generation
- Human-Computer Dialogs
- Information Retrieval
- Computational Modeling
-
6Machine Translation
- Using computers to automate some or all of
translating from one language to another
7- Three general models or tasks
- Tasks for which a rough translation is adequate
- Tasks where a human post-editor can be used to
improve the output - Tasks limited to a small sublanguage
8Machine Translation (cont.)
- Linguistic knowledge is extremely useful in this
area of CL - MT benefits from knowledge of language typology
and language-specific linguistic information
9Speech Recognition
- Taking spoken language
- as input and outputting the corresponding text
10Architecture
- SR takes the source speech and produces guesses
as to which words could correspond to the source
via some type of acoustic model - The word with the highest probability is selected
as the optimal candidate
11Why use SR?
- Allow for hands-free human-computer interaction
12Text-to-Speech
- Taking text as input and outputting the
corresponding spoken language
13Three types of TTS
- Articulatory- models the physiological
characteristics of the vocal tract - Concatenative- uses pre-recorded segments to
construct the utterance(s)
14Three types of TTS (cont.)
- Parametric/Formant- models the formant
transitions of speech - baj
15Why is TTS so difficult?
- Spelling
- through, rough
- Homonyms
- PERmit (n) vs. perMIT (v)
- Prosody
- Pitch, duration of segments, phrasing of
segments, intonational tune, emotion - I am so angry at you. I have never been more
enraged in my life!!
16Why use TTS?
- Allows for text to be read automatically
- Extremely useful for the visually impaired
17Natural Language Generation
- Constructing linguistic outputs from
non-linguistic inputs
18Natural Language Generation
- Maps meaning to text
- Nature of the input varies greatly from one
application to another (i.e documenting structure
of a computer program) - The job of the NLG system is to extract the
necessary information to drive the generation
process
19NLG systems have to make choices
- Content selection- the system must choose the
appropriate content for input, basing its
decision on a pre-specified communicative goal - Lexical selection- the system must choose the
lexical item most appropriate for expressing a
concept
20- Sentence Structure
- Aggregation- the system must apportion the
content into phrase, clause, and sentence-sized
chunks - Referential expression- the system must determine
how to refer to the objects under discussion (not
a trivial task)
21- Discourse structure- many NLG systems have to
deal with multi-sentence discourses, which must
have a coherent structure
22Sample NLG output
- To save a file
- 1. Choose save from the file menu
- 2. Choose the appropriate folder
- 3. Type the file name
- 4. Click the save button
- The system will save the document.
23Human-Computer Dialogs
- Uses a mix of SR, TTS, and pre-recorded prompts
to achieve some goal
24Human-Computer Dialogs
- Uses speech recognition, or a combination of SR
and touch tone as input to the system - The system processes the spoken information and
outputs appropriate TTS or pre-recorded prompts
25- Dialog systems have specific tasks, which limit
the domain of conversation - This makes the SR problem much easier, as the
potential responses become very constrained
26Sample dialog system for banking
-
- Sys would you like information for checking or
savings? - User Checking, please.
- Sys Your current balance is 2,568.92. Would you
like another transaction? - User Yes, has check 2431 cleared?
27Linguistic knowledge in dialog systems
- Discourse structure- ensuring natural flowing
discourse interaction - Building appropriate vocabularies/lexicons for
the tasks - Ensuring prosodic consistencies (i.e. questions
sound like questions and spliced prompts sound
continuous)
28Why use human-computer systems?
- Automate simple tasks- no need for a teller to be
on the other end of the line! - Allow access to system information from anywhere,
via the telephone
29Information Retrieval
- Storage, analysis, and retrieval of text documents
30Information Retrieval
- Most current IR systems are based on some
interpretation of compositional semantics - IR is the core of web-based searching, i.e.
Google, Altavista, etc.
31Information Retrieval Architecture
- User inputs a word or string of words
- System processes the words and retrieves
documents corresponding to the request
32Bag of Words
- The dominant approach to IR systems is to ignore
syntactic information and process the meaning of
individual words only - Thus, I see what I eat and I eat what I see
would mean exactly the same thing to the system!
33Linguistic Knowledge in IR
- Semantics
- Compositional
- Lexical
- Syntax (depending on the model used)
34Computational Modeling
- Computational approaches to problem solving,
modeling, and development of theories
35How can we use computational modeling?
- Test our theories of language change synchronic
or diachronic - Develop working models of language evolution
- Model speech perception, production, and
processing - Almost any theoretical model can have a
computational counterpart
36Why Use Computational Modeling?
- Forces explicitness no black boxes or behind
the scenes magic - Allows for modeling that would otherwise be
impossible - Allows for modeling that would otherwise be
unethical
37Conclusions
- CL applications utilize linguistic knowledge from
all of the major subfields of theoretical
linguistics - Computational modeling can aid linguists
theories of language processing and structure