Computational Lexicography - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Computational Lexicography

Description:

6. The use of lexica in text-to-speech. 1. Tokens vs. types ... grapheme-to-phoneme. sequence of phonemes, incl. lexical stress. speech synthesis. fluent speech ... – PowerPoint PPT presentation

Number of Views:450
Avg rating:3.0/5.0
Slides: 15
Provided by: cclKul
Category:

less

Transcript and Presenter's Notes

Title: Computational Lexicography


1
Computational Lexicography
  • Frank Van Eynde
  • Centre for Computational Linguistics
  • K.U.Leuven

2
OUTLINE
  • 1. The token/type distinction
  • 2. Lexicographic practice
  • 3. Computational lexica
  • 4. Lexical knowledge bases
  • 5. Lexical knowledge acquisition
  • 6. The use of lexica in text-to-speech

3
1. Tokens vs. types
  • (1) The girl gave the flowers to the athlete.
  • - 3 tokens the properties are context specific
  • - 1 type ltTHEgt properties are generalizations
    over the various uses
  • Heracleitos vs. Plato
  • (2) The sooner they come, the better it is.
  • ltTHE, articlegt vs. ltA, articlegt NL de,
    het
  • ltTHE, adverbgt vs. ltFAR, adverbgt NL hoe

4
1. Tokens vs. types
  • (3) I do not think that the dog of that man is
    really that dangerous.
  • ltTHAT, complgt vs. ltIF, complgt FR que
  • ltTHAT, detgt vs. ltTHIS, detgt FR
    ce/cet(te)
  • ltTHAT, adverbgt vs. ltSO, adverbgt FR si
  • (4) Je ne pense pas que le chien de cet homme est
    vraiment si dangereux.

5
1. Tokens vs. types
  • The abstraction problem given a word W, how many
    types ltW,POSgt do we have to distinguish?
  • (5) It is not far from here.
  • (6) We didn't go far.
  • (7) He's living in the Far West.
  • (8) Paris is far more expensive than Dublin.
  • ltFAR, adjgt vs. ltNEAR, adjgt NL ver
  • ltFAR, advgt vs. ltLITTLE, advgt NL veel

6
1. Tokens vs. types
(9) The ball of the finals will be sold at the
ball of the FIFA. (10) De bal van de finale
wordt verkocht op het bal van de FIFA. ltBAL,
noun non-neutergt IT palla ltBAL,
noun neutergt IT ballo (11)
That girl has been very lucky. (12) That girl has
a lot of luck. ltW,POS,VALgt ltHAVE, verb aux,
_VPPSPgt IT avere/essere ltHAVE, verb
main, _NPgt IT avere
7
1. Tokens vs. types
  • (13) The pen is in my pocket.
  • (14) The pig is in the pen.
  • ltW, POS, VAL, SENSEgt
  • ltPEN, noun, writing implementgt NL pen
  • ltPEN, noun, fenced enclosuregt NL hok

8
2. Lexicographic practice
The entries of pen and peg in the Oxford Advanced
Learner's Dictionary of Current English. ltORTHn,
PHON, POS, m, (VAL,) SENSEgt Homonymy vs.
polysemy Problem for any given ORTH, how many n
and how many m does one have to distinguish? The
entries of pen and peg in the Collins Cobuild
Dictionary of the English Language. ltORTH, PHON,
m, SENSEgt
9
3. Computational Lexica
Dictionaries are made for people who already
understand (much of) the language. Computational
lexica are made for machines that do not
understand (anything of) the language Consequence
an NLP system can only make sense of information
which is presented in the notation (or format)
which it employs for processing the language.
10
3. Computational Lexica
lttwo hundred fifty-six, 256gt lttwo hundred
fifty-six, CCLVIgt POS tagger The entry for ik
in Van Dale The entry for ik in the lexicon of
the Spoken Dutch Corpus
11
4. Lexical Knowledge Base
Computational lexica are often task-specific and
application-dependent. The need for reusability,
maintainability, extensibility Creation of a
lexical knowledge base which is sufficiently
general and abstract to be reusable, maintainable
and easily extensible Two aspects of
abstractness theory-neutral and
level-independent
12
4. Lexical Knowledge Base
  • Lexical knowledge representation languages
  • DATR (Gazdar and Evans)
  • Typed feature structures (HPSG)
  • The number of lexical entries for any given
    natural language is enormous.
  • The information to be captured in each lexical
    entry is detailed and complex.

13
5. Lexical knowledge acquisition
  • from scratch
  • from a machine-readable dictionary
  • from an agency for the distribution of resources
    (ELRA and LDC)
  • inductive from a partial lexicon and a corpus

14
6. Lexica in text-to-speech
  • written text
  • ? text normalisation
  • expanded graphemic representation
  • ? tagging
    syntactic analysis
  • graphemic representation with prosody
  • ? grapheme-to-phoneme
  • sequence of phonemes, incl. lexical stress
  • ? speech synthesis
  • fluent speech
Write a Comment
User Comments (0)
About PowerShow.com