A' Colombo, L' Sbattella, and R' Tedesco

1 / 19

About This Presentation

Title:

A' Colombo, L' Sbattella, and R' Tedesco

Description:

The effort needed to understand a text deeply influence its accessibility. In particular, for persons with congenital ... Numeric estimations of the complexity ... –

Number of Views:22

Avg rating:3.0/5.0

Slides: 20

Provided by: alessandr83

Category:

more less

Transcript and Presenter's Notes

Title: A' Colombo, L' Sbattella, and R' Tedesco

1
Department of Electronics and Information Adaptabl
e, Relational, and Cognitive Software LABoratory
(ARCSLAB)
The authoring of highly accessible texts

A. Colombo, L. Sbattella, and R. Tedesco

Adaptive Content Processing Conference (ACP
2008). November 7th, Amsterdam
2
Motivation

The effort needed to understand a text deeply
influence its accessibility
In particular, for persons with congenital or
acquired cognitive or learning syndromes
Long sentences, unusual words, complex linguistic
structures can be very hard to understand
Measuring the readability and understandability
of documents is useful
SPARTA2 supports the authoring phase
Integrated with commonly used writing tools
Easy to use

3
Readability formulas

The problem of readability is under investigation
since 1920s
Several formulas have then been proposed
Two key aspects in measuring readability
structure and semantics
The main goals of such formulas was simplicity
Today, NLP tools permit to include complex
syntactic structures as a parameter of new
readability formulas

4
SPARTA2

Provides two kinds of information
Numeric estimations of the complexity
Warnings and advices about phrase structures
which are hard to understand
SPARTA2 makes a distinction between readability
and understandability
Readability gives an evaluation about the
structure of sentences
Understandability captures the lexical aspects

5
The Readability Index Gulpease index

The Readability Index is composed of three
sub-indexes
Gulpease Index
Chunk Index
Chunk Type Index
Gulpease Index a widely used readability formula
for the Italian language (Lucisano and
Piemontese, 1988)

NC of characters in the text NW of words in
the text NS of sentences in the text
6
The Readability Index Gulpease index

Similar to the classic Fleschs formula for the
English language
Widely adopted in the literature on readability
Simple to implement
But, the deep structure of the sentences is not
taken in account

7
The Readability Index Chunks

Analysing the structure of a sentence
Full parsing
Shallow parsing (chunking)
E.g. the book is on the table
We choose the shallow parsing approach

NP The book VP is PP on NP the table
8
The Readability Index Chunk Indexes

The Chunk Index and the Chunk Type Index take in
account the structure of the sentences in terms
of chunks
The CHAOS Italian language shallow parser (Tor
Vergata University, Rome)
The Chunk Index

NK of chunks in the text NS of sentences in
the text
9
The Readability Index Chunk Indexes

Different chunk types could have different
readability
The Chunk Type Index, is based on the
distribution of chunk types in the text

NK of chunks in the text Ni of chunks of
type i in the text wi the weight assigned
to chunks of type i
10
The Readability Index Chunk Indexes

wi have been calculated analysing the statistical
distribution of the chunk types
We analysed 3686 docs from AltaFrequenza
Chunks with high frequenciesin this
collectionare morereadable

11
The Readability Index formula

The final Readability Index is given by a
weighted mean
For our initial experimentations, the three
weights have value 1/3
We are evaluating several training approaches for
learning the values from texts

12
The Understandability Index De Mauro

The Understandability Index measures the
complexity related to the lexicon
The index is based on the De Mauro basic Italian
dictionary
Contains the 4700 more used lemmas
Three sections basic vocabulary, highly used
words vocabulary, and less used words vocabulary

Nb of words in the basic voc. Nh of words
in the higly used voc. Nl of words in the less
used voc.
13
Detecting and reporting readability issues

Indexes are useful for giving a measure of the
complexity of the text
Indexes say nothing about where precisely are the
critic parts in the text, and how to fix them
SPARTA2 detects and reports readability issues
analyses the structure and the lexicon of the
sentences
Plug-in based

14
SPARTA2 Architecture