Kirrkirr: a Bidirectional Warlpiri-English Dictionary - PowerPoint PPT Presentation

About This Presentation
Title:

Kirrkirr: a Bidirectional Warlpiri-English Dictionary

Description:

Kirrkirr uses a Warlpiri-English dictionary developed by linguists in Australia, ... The dictionary uses 'fuzzy spelling' to catch spelling errors made by the user ... – PowerPoint PPT presentation

Number of Views:175
Avg rating:3.0/5.0
Slides: 11
Provided by: acade119
Learn more at: https://nlp.stanford.edu
Category:

less

Transcript and Presenter's Notes

Title: Kirrkirr: a Bidirectional Warlpiri-English Dictionary


1
Kirrkirr a Bidirectional Warlpiri-English
Dictionary

Kristen Parton
2
Kirrkirr Objectives
  • Kirrkirr aims to present the contents of a
    dictionary in a way which is flexible,
    interactive, customizable, and (especially) fun
  • Kirrkirr has diverse target users, with varying
    levels of literacy, for example professional
    linguists, elementary school children, teachers,
    and native speakers
  • Currently, Kirrkirr is used with the Australian
    Aboriginal language Warlpiri, spoken by about
    3,000 people in northern Australia
  • Kirrkirr uses a Warlpiri-English dictionary
    developed by linguists in Australia, with
    detailed information about each word, including
    glosses, definitions, dialects, grammatical
    comments and cross-references between words for
    synonyms, antonyms, see also and other
    relationships
  • Unlike paper dictionaries, electronic
    dictionaries can provide an interactive
    educational tool customizable to various
    audiences

3
Dictionary Usability
  • The interface has a colorful, clickable panel
    which links words related in different ways,
    rather than just relying on the alphabetical list
    of words this also makes the dictionary more
    interactive
  • Many words are linked to pictures and sounds,
    which reinforce the meaning of the words through
    non-textual means
  • The dictionary uses fuzzy spelling to catch
    spelling errors made by the user when searching
    for a word
  • User modes tailor the appearance of the formatted
    entries to each target audience
  • English meaning only,for novice users with
    English backgrounds
  • In Warlpiri, for native speakers of Warlpiri
  • Basic details, for intermediate users such as
    students
  • Full details, for advanced users such as teachers
    or linguists

4
Lexicon Structure
  • The dictionary is maintained by linguists in
    Australia in an ad-hoc text format, which is
    converted to a structured XML dictionary by a
    Perl script
  • Rather than load the large (10Mb) XML file in
    memory, each headwords XML entry is loaded
    individually as needed
  • The rich structure of the XML allows XSLT
    stylesheet manipulation of the dictionary entries
    to produce output formatted differently for
    different users
  • The XSLT stylesheet outputs HTML pages, which
    make use of the cross-references in the
    dictionary by creating hyperlinks between
    different words

5
Customizing Format with XSLT
  • At run-time, the XML entries are processed by an
    XSLT stylesheet, which selects which elements of
    the entry to show, determines the order to show
    them in, and formats each field differently
    depending on the user mode
  • For example, Meaning only outputs the english
    glosses of a word in large font, whereas Full
    details outputs all of the information in the
    dictionary in a normal sized font in a specific
    order.
  • Since the XML is parsed at run-time, more
    information can be added to the XML to allow
    parameter passing from the program to the XSLT
  • For example, the location of the images folder
    can only be determined at run-time, but by adding
    an ltIMAGE-DIRgt field to the XML at run-time, the
    XSLT can create an ltIMG SRCgt tag to display an
    image in the HTML output

6
English-Warlpiri Dictionary
  • The original dictionary is one-way Warlpiri to
    English, but a bidirectional bilingual dictionary
    is more useful for most users
  • An English index was built from glosses in the
    dictionary such that each gloss links to the
    equivalent Warlpiri entries.
  • Rather than being two separate monolingual
    dictionaries, these dictionaries share the same
    data, thus eliminating conflicting entries and
    maintaining consistency
  • The XML entries of all the Warlpiri equivalents
    to an English word are merged, and passed to an
    XSLT spreadsheet, which creates an HTML page for
    the English word

7
English-Warlpiri Dictionary
  • To make the English dictionary symmetric to the
    Warlpiri, Kirrkirr now has an English word list,
    English formatted entries, a much faster English
    search, and the capability to do fuzzy spelling
    in English
  • Problems arise because most Warlpiri words have
    several English equivalents, and also because
    phrases in English might be indexed under several
    different terms
  • For example, yawarrangi meaning large male
    kangaroo should be indexed under kangaroo
    rather than large or male
  • However, the jawirdiki and other words that
    mean stay put should be indexed under stay
    and not put
  • Words like kirany-kiranypa meaning spinifex
    lizard should be indexed under spinifex (the
    type) and lizard

8
Warlpiri Morphology
  • Warlpiri is an agglutinating language, meaning
    that grammatical suffixes get added on to words
  • nyangulparnangku
  • nya- ngu- lpa- rna- ngku
  • See- PAST- IPFV- 1SG.SUBj- 2SG.OBJ
  • I was looking at you.
  • Root word nya-nyi meaning to see
  • For lookup in the dictionary, users have to know
    the root word
  • This is difficult for learners of Warlpiri, given
    that morphemes are not always separated by
    hyphens and verbs are indexed with non-past tense
    inflections
  • To make Kirrkirr more usable, a morphological
    analyzer was implemented to accept well-formed
    Warlpiri words and find the possible root words
    to look up

9
Morphological Analysis
  • Suffixes from the dictionary are stored in a trie
    for quick lookup
  • Each time an affix is stripped, the remaining
    string is checked to see whether it is in the
    dictionary
  • Each possible morpheme is added to a lattice
    structure which holds all possible morphological
    decompositions of the word
  • Grammar rules are applied to eliminate many
    impossible parses
  • Some properties of Warlpiri make parsing more
    difficult, and show the need for a different
    indexing system
  • Verbs are stored with non-past inflections but
    are seen with different inflections. For example,
    nya-nyi may show up as nya-ngu. But indexing
    nya-nyi under nya creates more abiguity,
    since nya is another word.
  • Some words have optional suffixes, such as
    l(pa) which may be seen as l or lpa. These
    words must be indexed under both entries.

10
Conclusions
  • Making Kirrkirr a bidirectional English-Warlpiri
    and Warlpiri-English dictionary increases its
    usability and practicality, by making it easier
    for users who are more comfortable in English to
    browse and search in English.
  • Allowing lookup of Warlpiri words from actual
    speech using the morphological analysis also
    increases usability, especially for users who are
    learning Warlpiri, since they do not have to
    figure out the root word.
  • Future work
  • Improving the morphological analysis to provide
    roughly ranked possible parses of all morphemes
    of an entire word, using more grammatical
    information and frequency information
  • Extending Kirrkirr to other languages
Write a Comment
User Comments (0)
About PowerShow.com