Yvan Rose - PowerPoint PPT Presentation

1 / 35

About This Presentation

Title:

Yvan Rose

Description:

David Graham, Dean of Arts. Robert Lucas, Dean of Science. Jim Black, ... liane Lebel, Heather Goad, Paula Fikkert, Clara Levelt, Katherine Demuth, Mark ... – PowerPoint PPT presentation

Number of Views:127

Avg rating:3.0/5.0

Slides: 36

Provided by: speechPsy

Category:

more less

Transcript and Presenter's Notes

Title: Yvan Rose

1
The Phon and PhonBank initiatives A scheme for
sharing and archiving child language phonology
data

Yvan Rose
Memorial University of Newfoundland
Laboratoire Dynamique Du Langage, UMR 5596 CNRS
- Université Lumière Lyon 2
Brian MacWhinney
Carnegie Mellon University

2
Presentation roadmap

Why study child language
Empirical challenges
A promising solution
Phon, a software program for the study of
phonological development
The PhonBank database project
Potential

3
Why study language development?

Special gift universals
How is language acquired so easily by children?
Typology, variation their origins
Emergentism processes that show up in the course
of language development
Sometimes, no correlates in language typology
Language disorders
Developmental speech problems
Atypical outcomes (e.g. stammered speech)
Acquired language disorders
Socialization, literacy, maintenance,

4
What everyone needs

Most current hypotheses must be tested against a
large body of data.
Lots of data in comparable format, from, e.g.
Different languages
Different acquisition contexts
Typical versus disordered speech
Monolingual versus multilingual populations
Different age periods
Different methods of investigation
Cross-sectional (e.g. variation studies)
Longitudinal (e.g. developmental studies)

5
CHILDES a good departure point

Child Language Data Exchange System
http//childes.psy.cmu.edu
Founded in 1984 in Concord MA
Director Brian MacWhinney macw_at_mac.com
Programmers Leonid Spektor, Franklin Chen
Components of CHILDES
DATABASE over 190 ( 80 TalkBank) corpora
CHAT system for speech notation and coding
CLAN software suite for analysis
Impact
4500 members worldwide
2000 articles based on CHILDES data

6
However...

Current CHILDES support for phonology almost
non-existing
No decent database of phonological development
Database of phonological data is far from
satisfactory
No automatized method of data compilation
Elaboration of such a database extremely
difficult
No tools facilitating corpus creation
No data exchange standard established

7
Proposed solutions

CHILDES extension into phonological realm
Phon (Rose and colleagues)Software for
transcription, compilation and analysis of
phonological data
Specialized for research in acquisition
Standardized for data sharing
PhonBank (MacWhinney and Rose)Publicly-available
database on phonological development
Multilingual
Several acquisition contexts and periods

8
The Phon project

Prerequisite to the building of PhonBank
Multi-disciplinary team at Memorial University of
Newfoundland
Close collaboration with Brians team at Carnegie
Mellon University
Design and implementation criteria
Reliability
Simplicity / flexibility / adaptability
Analytical neutrality
Compatibility
Availability

9
Phon an overview

Intuitive graphical user interface
Dynamic interaction between software and user
Flexible project customization functions
Support for multiple alphabets (IPA, Roman,
Arabic, Chinese, Japanese, Cyrillic, Algonquin)
User-defined data fields
Functionality for phonological data
Descriptive features for transcribed segments
Support for both segmental and prosodic info
Easily expandable modular architecture
FREE!!!!!

10
Phon technical aspects

Cross-platform compatible(Macintosh, Windows,
Unix/Linux platforms)
Programmed in Java
100 Unicode compliant
Support for playback of audio and video recorded
data in several formats
Data storage in XML format
Compatible with the CHILDES TalkBank schema
Data can be accessed by other applications
IPA transcription standards supported
Open-source

11
Phon general interface
Project navigator
Records for linguistic data and other
annotations
Multimediacentre
Session metadata
Navigation between records
12
User management
Password-protectedaccess
Task management
13
Multimedia data segmentation

Enables the delimitation of the recorded portions
that are relevant for research
Functionality to edit the segments identified

Segment playback
Segment export(AIFF/WAVE formats)
14
Phonetic transcription
Jaime les ours en peluche bruns
??m le zu?s ? p?l?? b?
? nus py? ba

Support for
Multiple-blind transcriptions (using user ID and
task management functionality)
Phonetic dictionaries of target forms (e.g. CMU
Pronouncing Dictionary)

15
Merging of multiple-blind transcriptions
Selection of most accurate transcriptions
Access and comparison of all transcriptions of
target and actual forms
Refinement of selected transcriptions (if needed)
16
Segmentation of transcribed utterances

Multiple-word utterances often must be divided
into smaller portions
Access to precise domainsof analysis
Enables an analysisof several levels (e.g.
utterance, phrase, word, morpheme, )
Example of divisioninto lexical items

17
Syllabification

Phonological research must consider prosodic
factors, including
Number of segments in syllables
Shapes of syllables (ex. CV, CVC, CCV)
Positions of segments within syllables
Positions of syllables within the word
Stressed versus unstressed status of syllables
Manual coding is tedious and time-consuming
We need a reliable, automatic system

18
Syllabification algorithm (Hedlund OBrien 2004)

Automatic parsing of segmental strings into
syllables
Several parses possible based on parameters
modifiable by the user(no theoretical bias
imposed)
Possibility to test different hypotheses for
target and actual forms
Labelling of syllables and their segments for
Word-level prosodic information
Syllable constituency
Manual modification of spurious results

19
Syllabification et modification interface
Syllable constituent labels colour-codes
Labelling modifiable through contextual menus
20
Alignment of target and actual forms

Several investigations require systematic
comparisons
Segment per segment (ex. /b?næn?/ b _ _ æn?)
Syllabe per syllabe (ex. /e?p??k?t/ e _ _ _ _
ko _ )
Comparisons not always easy to obtain

Wrong alignment!
Valid alignment!
ko
e
ko
21
Alignment algorithm (Maddocks 2005)

Segments and syllables aligned based on their
featural similarity
Dynamic programming Complex problems solved
through resolution of their simpler sub-parts
e.g.e?p??k?t/eko(e?/e)(e?p??/eko)(e?p??k?t/ek
o)
Rewards and penalties
Reward example Alignment of stressed syllables
Penalty example Alignment with nothing (empty
featural set)

22
Effects of rewards and penalties
23
Algorithm optimization

Problem syllable alignment in different corpora
require different parameter settings, difficult
to adjust manually
Solution genetic algorithm (GA)
1- generate alignments from a representative
corpus
2- revise results manually
3- GA automatically optimizes parameters based on
manually revised corpus
96 ? 98 efficiency on English corpus
85 ? 96 efficiency on Dutch corpus (initially
generated with English settings)

24
Modification of spurious alignments

Alignment algorithm provides reliable results for
preliminary analyses
Remaining cases must be aligned manually

Select syllable
Add to alignment
Complete alignment
?
?
25
Query language

PhonBasic (Hedlund OBrien 2004)
Characteristics
Selectors and predicates terms commonly used by
linguists
Syllable, stressed, voiced, labial,
Boolean connectives
Custom predicates
Prevocalic LApp, Onset
Postvocalic Coda, RApp, OEHS
Sample of queries pre-installed
Memorization of recent queries
Saving and sharing of queries

26
Query results

Visualization from within the application
Generation of textual reports
Recording session
Exemplification of a given process
Time period
Exemplification of an acquisition stage
Entire database
Establishment of a learning curve
Exportation of results
Text format (Unicode encoding)
CVS format (ex. Excel, Access, FileMaker Pro, ...)

27
Future functionality

Support for importation of existing corpora
Additional of dictionaries of target forms
supporting other languages/dialects
Incorporation of basic statistical functions
(using already-existing Java packages)
Schema/graph generation
Bar graphs (e.g. to illustrate the relative
prominence of patterns)
Line graphs (e.g. to illustrate learning curves)

28
Future functionality

Interoperability with Praat and/or SFS
Basic goal compilation of acoustic parameters
relative to phonological domains
Alignment of transcriptions with
waveforms/spectrograms (TextGrid-like function)
Exportation of samples for speech analysis
Importation of acoustic measurement data
Web interface
Data sharing at a distance
Query of PhonBank without the need of downloading
corpora
Automatic detection of patterns

29
Timeline

Late July / early August, 2005
Release of the first complete version of Phon 1.0
(beta) for Macintosh, Windows, Linux, UNIX
Partial compatibility with existing CHILDES
corpora
August - October, 2005
Testing/debugging of the beta version
Extension of CHILDES compatibility
November, 2005
Official release of Phon 1.0 at BUCLD
Beginning of the PhonBank initiative

30
PhonBank project

Project leaders
Brian MacWhinney (CMU)
Yvan Rose (MUN)
Barbara Davis (Texas-Austin)
Rodrigue Byrne (MUN)
Research consortium
26 collaborators, 16 languages
Monolingual, bilingual, clinical, babbling,
second language,
Awaiting results from grant application to NIH

31
Immediate potential

Scientific exchanges between researchers working
in related areas made easier
Research based on
Much stronger empirical base
Combination of various experimental methods
Systematic comparisons of various corpora
Within and across languages
Within and across populations
Within and across age groups

32
Long term potential

Better understanding of
Language acquisition process
Developmental and acquired language disorders
Contribution to development of more adequate
theoretical models
Establishment of more accurate baselines for
early detection of language delays/disorders
More rapid and efficient educational and
therapeutic interventions

33
Thanks for your attention
34
Acknowledgements

People at MUN
David Graham, Dean of Arts
Robert Lucas, Dean of Science
Jim Black, Associate Dean of Arts
Barbara Cox et son équipe, Office of Research
Marguerite MacKenzie, Head of Linguistics, as
well as all members of the department
Wolfgang Banzhaf, Head of Computer Science
For their feedback and encouragementÉliane
Lebel, Heather Goad, Paula Fikkert, Clara Levelt,
Katherine Demuth, Mark Johnson, Carrie Dyck, Phil
Branigan, Brian MacWhinney, Bryan Gick, Sophie
Wauquier-Gravelines, Sharon Inkelas, Conxita
Lleó, Sónia Frota, Maria João Freitas, Ronald
Sprouse, Joe Pater, John Archibald, Éliane Lebel,
Susana Correia, Laetitia Almeida, Teresa da
Costa, Barbara Davis, Christophe dos Santos,
Sophie Kern, Christine Champdoizeau, Jennifer
Parsons, Carla Dunphy, Lindsay Babcock, Allison
Strong, Megan Maloney, Marina Vigário hoping
that we didnt forget anyone

35
Acknowledgements

The team behing Phon
Rod Byrne, Todd Wareham, Gregory Hedlund, Philip
OBrien, Keith Maddocks
The CHILDES computer guys
Franklin Chen (Carnegie Mellon University)
Leonid Spektor
Financial support
Arts Faculty, Memorial University (Y. Rose)
VP Research, Memorial University (Y. Rose)
Social Sciences and Humanities Research Council
of Canada (J. Brittain, C. Dyck, Y. Rose M.
MacKenzie)
Natural Sciences and Engineering Research Council
of Canada (T. Wareham)
National Science Foundation (B. MacWhinney)
Canada fund for Innovation (Y. Rose)