Metadata generation and glossary creation in eLearning - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Metadata generation and glossary creation in eLearning

Description:

We simulate a tutor who adds a learning objects and generates and edits ... an alternative agreement weighting suggested by Debra Haley at OU, based on Gwet ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 41
Provided by: Cristin101
Category:

less

Transcript and Presenter's Notes

Title: Metadata generation and glossary creation in eLearning


1
Metadata generation and glossary creation in
eLearning
  • Lothar Lemnitzer
  • Review meeting, Zürich, 25 January 2008

2
Outline
  • Demonstration of the functionalities
  • Where we stand
  • Evaluation of tools
  • Consequences for the development of the tools in
    the final phase

3
Demo
  • We simulate a tutor who adds a learning objects
    and generates and edits additional data

4
Where we stand (1)
  • Achievements reached in the first year of the
    project
  • Annotated corpora of learning objects
  • Stand-alone prototype of keyword extractor (KWE)
  • Stand-alone prototype of glossary candidate
    detector (GCD)

5
Where we stand (2)
  • Achievements reached in the second year of the
    project
  • Quantitative evaluation of the corpora and tools
  • Validation of the tools in user-centered usage
    scenarios for all languages
  • Further development of tools in response to the
    results of the evaluation

6
Evaluation - rationale
  • Quantitative evaluation is needed to
  • Inform the further development of the tools
    (formative)
  • Find the optimal setting / parameters for each
    language (summative)

7
Evaluation (1)
  • Evaluation is applied to
  • the corpora of learning objects
  • the keyword extractor
  • the glossary candidate detector
  • In the following, I will focus on the tool
    evaluation

8
Evaluation (2)
  • Evaluation of the tools comprises of
  • measuring recall and precision compared to the
    manual annotation
  • measuring agreement on each task between
    different annotators
  • measuring acceptance of keywords / definition
    (rated on a scale)

9
KWE Evaluation step 1
  • On human annotator marked n keywords in document
    d
  • First n choices of KWE for document d extracted
  • Measure overlap between both sets
  • measure also partial matches

10
Best method F-Measure
Bulgarian TFIDF/ADRIDF 0.25
Czech TFIDF/ADRIDF 0.18
Dutch TFIDF 0.29
English ADRIDF 0.33
German TFIDF 0.16
Polish ADRIDF 0.26
Portuguese TFIDF 0.22
Romanian TFIDF/ADRIDF 0.15
11
KWE Evaluation step 2
  • Measure Inter-Annotator Agreement (IAA)
  • Participants read text (Calimera Multimedia)
  • Participants assign keywords to that text
    (ideally not more than 15)
  • KWE produces keywords for text

12
KWE Evaluation step 2
  • Agreement is measured between human annotators
  • Agreement is measured between KWE and human
    annotators
  • We have tested two measures / approaches
  • kappa according to Bruce / Wiebe
  • AC1, an alternative agreement weighting suggested
    by Debra Haley at OU, based on Gwet

13
IAA human annotators IAA of KWE with best settings
Bulgarian 0.63 0.99
Czech 0.71 0.78
Dutch 0.67 0.72
English 0.62 0.82
German 0.64 0.63
Polish 0.63 0.67
Portuguese 0.58 0.67
Romanian 0.59 0.61
14
KWE Evaluation step 3
  • Humans judge the adequacy of keywords
  • Participants read text (Calimera Multimedia)
  • Participants see 20 KW generated by the KWE and
    rate them
  • Scale 1 4 (excellent not acceptable)
  • 5 not sure

15
20 kw First 5 kw First 10 kw
Bulgarian 2.21 2.54 2.12
Czech 2.22 1.96 1.96
Dutch 1.93 1.68 1.64
English 2.15 2.52 2.22
German 2.06 1.96 1.96
Polish 1.95 2.06 2.1
Portuguese 2.34 2.08 1.94
Romanian 2.14 1.8 2.06
16
GCD Evaluation - step 1
  • A human annotator marked definitions in document
    d
  • GCD extracts defining contexts from same document
    d
  • Measure overlap between both sets
  • Overlap is measured on the sentence level,
    partial overlap counts

17
Is-definitions Recall Precision
Bulgarian 0.64 0.18
Czech 0.48 0.29
Dutch 0.92 0.21
English 0.58 0.17
German 0.55 0.37
Polish 0.74 0.22
Portuguese 0.69 0.30
Romanian 1.0 0.53
18
GCD Evaluation step 2
  • Measure Inter-Annotator Agreement
  • Experiments run for Polish and Dutch
  • Prevalence-adjusted version of kappa used as a
    measure
  • Polish 0.42 Dutch 0.44
  • IAA rather low for this task

19
GCD Evaluation step 3
  • Judging quality of extracted definitions
  • Participants read text
  • Participants get definitions extracted by GCD for
    that text and rate quality
  • Scale 1 4 (excellent not acceptable)
  • 5 not sure

20
defin. testers Av. value
Bulgarian 25 7 2.7
Czech 24 6 3.1
Dutch 14 6 2.8
English 10 4 3.3
German 5 5 2.1
Polish 11 5 2.7
Portuguese 36 6 2.2
Romanian 9 7 3.0
21
GCD Evaluation step 3
  • Further findings
  • relatively high variance (many 1 and 4)
  • Disagreement between users about the quality of
    individual definitions

22
  • Individual user feedback - KWE
  • The quality of the generated keywords remains an
    issue
  • Variance in the responses from different language
    groups
  • We suspect a correlation between language of the
    users and their satisfaction
  • Performance of KWE relies on language settings,
    we have to investigate them further

23
  • Individual user feedback GCD
  • Not all the suggested definitions are real
    definitions.
  • Terms are ok, but definitions cited are often not
    what would be expected.
  • Some terms proposed in the glossary did not make
    any sense.
  • The ability to see the context where a definition
    has been found is useful.

24
Consequences - KWE
  • Use non-distributional information to rank
    keywords (layout, chains)
  • Present first 10 keywords to user, more keywords
    on demand
  • For keyphrases, present most frequent attested
    form
  • Users can add their own keywords

25
Consequences - GCD
  • Split definitions into types and tackle the most
    important types
  • Use machine learning alongside local grammars
  • Look into the part of the grammars which extract
    the defined term
  • Users can add their own definitions

26
Plans for final phase
  • KWE, work with lexical chains
  • GCD, extend ML experiments
  • Finalize documentation of the tools

27
Validation
  • User scenarios with NLP tools embedded
  • Content provider adds keywords and a glossary for
    a new learning object
  • Student uses keywords and definitions extracted
    from a learning object to prepare a presentation
    of the content of that learning object

28
Validation
  1. Students use keywords and definitions extracted
    from a learning objects to prepare a quiz / exam
    about the content of that learning object

29
Validation
  • We want to get feedback about
  • The users general attitude towards the tools
  • The users satisfaction with the results obtained
    by the tools in the particular situation of use
    (scenario)

30
User feedback
  • Participants appreciate the option to add their
    own data
  • Participants found it easy to use the functions

31
Plans for the next phase
  • Improve precision of extraction results
  • KWE implement lexical chainer
  • GCD use machine learning in combination with
    local grammars or substituting these grammars
  • Finalize documentation of the tools

32
Corpus statistics full corpus
  • Measuring lengths of corpora ( of documents,
    tokens)
  • Measuring token / tpye ratio
  • Measuring type / lemma ratio

33
of documents of tokens
Bulgarian 55 218900
Czech 1343 962103
Dutch 77 505779
English 125 1449658
German 36 265837
Polish 35 299071
Portuguese 29 244702
Romanian 69 484689
34
Token / type Types / Lemma
Bulgarian 9.65 2.78
Czech 18.37 1.86
Dutch 14.18 1.15
English 34.93 2.8 (tbc)
German 8.76 1.38
Polish 7.46 1.78
Portuguese 12.27 1.42
Romanian 12.43 1.54
35
Corpus statistics full corpus
  • Bulgarian, German and Polish corpora have a very
    low number of tokens per type (probably problems
    with sparseness)
  • English has by far the highest ratio
  • Czech, Dutch, Portuguese and Romanian are in
    between
  • type / lemma ration reflects richness of
    inflectional paradigms

36
To do
  • Please check / verify this numbers
  • Report, for the M24 deliverable, about
    improvements / recanalysis of the corpora (I am
    aware of such activities for Bulgarian, German,
    and English)

37
Corpus statistics annotated subcorpus
  • Measuring lenghts of annotated documents
  • Measuring distribution of manually marked
    keywords over documents
  • Measuring the share of keyphrases

38
of annotated documents Average length ( of tokens)
Bulgarian 55 3980
Czech 465 672
Dutch 72 6912
English 36 9707
German 34 8201
Polish 25 4432
Portuguese 29 8438
Romanian 41 3375
39
of keywords Average of keywords per doc.
Bulgarian 3236 77
Czech 1640 3.5
Dutch 1706 24
English 1174 26
German 1344 39.5
Polish 1033 41
Portuguese 997 34
Romanian 2555 62
40
Keyphrases
Bulgarian 43
Czech 27
Dutch 25
English 62
German 10
Polish 67
Portuguese 14
Romanian 30
Write a Comment
User Comments (0)
About PowerShow.com