A' Colombo, L' Sbattella, and R' Tedesco - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

A' Colombo, L' Sbattella, and R' Tedesco

Description:

The effort needed to understand a text deeply influence its accessibility. In particular, for persons with congenital ... Numeric estimations of the complexity ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 20
Provided by: alessandr83
Category:

less

Transcript and Presenter's Notes

Title: A' Colombo, L' Sbattella, and R' Tedesco


1
Department of Electronics and Information Adaptabl
e, Relational, and Cognitive Software LABoratory
(ARCSLAB)
The authoring of highly accessible texts
  • A. Colombo, L. Sbattella, and R. Tedesco

Adaptive Content Processing Conference (ACP
2008). November 7th, Amsterdam
2
Motivation
  • The effort needed to understand a text deeply
    influence its accessibility
  • In particular, for persons with congenital or
    acquired cognitive or learning syndromes
  • Long sentences, unusual words, complex linguistic
    structures can be very hard to understand
  • Measuring the readability and understandability
    of documents is useful
  • SPARTA2 supports the authoring phase
  • Integrated with commonly used writing tools
  • Easy to use

3
Readability formulas
  • The problem of readability is under investigation
    since 1920s
  • Several formulas have then been proposed
  • Two key aspects in measuring readability
    structure and semantics
  • The main goals of such formulas was simplicity
  • Today, NLP tools permit to include complex
    syntactic structures as a parameter of new
    readability formulas

4
SPARTA2
  • Provides two kinds of information
  • Numeric estimations of the complexity
  • Warnings and advices about phrase structures
    which are hard to understand
  • SPARTA2 makes a distinction between readability
    and understandability
  • Readability gives an evaluation about the
    structure of sentences
  • Understandability captures the lexical aspects

5
The Readability Index Gulpease index
  • The Readability Index is composed of three
    sub-indexes
  • Gulpease Index
  • Chunk Index
  • Chunk Type Index
  • Gulpease Index a widely used readability formula
    for the Italian language (Lucisano and
    Piemontese, 1988)

NC of characters in the text NW of words in
the text NS of sentences in the text
6
The Readability Index Gulpease index
  • Similar to the classic Fleschs formula for the
    English language
  • Widely adopted in the literature on readability
  • Simple to implement
  • But, the deep structure of the sentences is not
    taken in account

7
The Readability Index Chunks
  • Analysing the structure of a sentence
  • Full parsing
  • Shallow parsing (chunking)
  • E.g. the book is on the table
  • We choose the shallow parsing approach

NP The book VP is PP on NP the table
8
The Readability Index Chunk Indexes
  • The Chunk Index and the Chunk Type Index take in
    account the structure of the sentences in terms
    of chunks
  • The CHAOS Italian language shallow parser (Tor
    Vergata University, Rome)
  • The Chunk Index

NK of chunks in the text NS of sentences in
the text
9
The Readability Index Chunk Indexes
  • Different chunk types could have different
    readability
  • The Chunk Type Index, is based on the
    distribution of chunk types in the text

NK of chunks in the text Ni of chunks of
type i in the text wi the weight assigned
to chunks of type i
10
The Readability Index Chunk Indexes
  • wi have been calculated analysing the statistical
    distribution of the chunk types
  • We analysed 3686 docs from AltaFrequenza
  • Chunks with high frequenciesin this
    collectionare morereadable

11
The Readability Index formula
  • The final Readability Index is given by a
    weighted mean
  • For our initial experimentations, the three
    weights have value 1/3
  • We are evaluating several training approaches for
    learning the values from texts

12
The Understandability Index De Mauro
  • The Understandability Index measures the
    complexity related to the lexicon
  • The index is based on the De Mauro basic Italian
    dictionary
  • Contains the 4700 more used lemmas
  • Three sections basic vocabulary, highly used
    words vocabulary, and less used words vocabulary

Nb of words in the basic voc. Nh of words
in the higly used voc. Nl of words in the less
used voc.
13
Detecting and reporting readability issues
  • Indexes are useful for giving a measure of the
    complexity of the text
  • Indexes say nothing about where precisely are the
    critic parts in the text, and how to fix them
  • SPARTA2 detects and reports readability issues
  • analyses the structure and the lexicon of the
    sentences
  • Plug-in based

14
SPARTA2 Architecture
  • The server is composed of four Java modules
  • CHAOS parser
  • The indexes generator
  • The sentence analyser
  • The warning advice generator
  • The client runs inside Word 2007
  • C and .NET
  • SmartTag

15
SPARTA2 User interface
16
SPARTA2 User interface
17
SPARTA2 A use case (slide nascosta)
18
Conclusions
  • SPARTA2 a tool supporting the authoring of
    highly accessible texts
  • Computes indexes about the complexity of Italian
    texts
  • Exploits the syntactic structure reported by a
    shallow parser
  • Detects the readability issues, providing
    warnings and advices
  • Simple to use
  • Integrated with Word 2007

19
Future work
  • Validation of our indexes the parameters a, b,
    c, ?, ?, and ?
  • More plug-ins
  • More languages

20
http//arcslab.dei.polimi.it/
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com