Yvan Rose - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Yvan Rose

Description:

David Graham, Dean of Arts. Robert Lucas, Dean of Science. Jim Black, ... liane Lebel, Heather Goad, Paula Fikkert, Clara Levelt, Katherine Demuth, Mark ... – PowerPoint PPT presentation

Number of Views:127
Avg rating:3.0/5.0
Slides: 36
Provided by: speechPsy
Category:
Tags: graham | heather | rose | yvan

less

Transcript and Presenter's Notes

Title: Yvan Rose


1
The Phon and PhonBank initiatives A scheme for
sharing and archiving child language phonology
data
  • Yvan Rose
  • Memorial University of Newfoundland
  • Laboratoire Dynamique Du Langage, UMR 5596 CNRS
    - Université Lumière Lyon 2
  • Brian MacWhinney
  • Carnegie Mellon University

2
Presentation roadmap
  • Why study child language
  • Empirical challenges
  • A promising solution
  • Phon, a software program for the study of
    phonological development
  • The PhonBank database project
  • Potential

3
Why study language development?
  • Special gift universals
  • How is language acquired so easily by children?
  • Typology, variation their origins
  • Emergentism processes that show up in the course
    of language development
  • Sometimes, no correlates in language typology
  • Language disorders
  • Developmental speech problems
  • Atypical outcomes (e.g. stammered speech)
  • Acquired language disorders
  • Socialization, literacy, maintenance,

4
What everyone needs
  • Most current hypotheses must be tested against a
    large body of data.
  • Lots of data in comparable format, from, e.g.
  • Different languages
  • Different acquisition contexts
  • Typical versus disordered speech
  • Monolingual versus multilingual populations
  • Different age periods
  • Different methods of investigation
  • Cross-sectional (e.g. variation studies)
  • Longitudinal (e.g. developmental studies)

5
CHILDES a good departure point
  • Child Language Data Exchange System
  • http//childes.psy.cmu.edu
  • Founded in 1984 in Concord MA
  • Director Brian MacWhinney macw_at_mac.com
  • Programmers Leonid Spektor, Franklin Chen
  • Components of CHILDES
  • DATABASE over 190 ( 80 TalkBank) corpora
  • CHAT system for speech notation and coding
  • CLAN software suite for analysis
  • Impact
  • 4500 members worldwide
  • 2000 articles based on CHILDES data

6
However...
  • Current CHILDES support for phonology almost
    non-existing
  • No decent database of phonological development
  • Database of phonological data is far from
    satisfactory
  • No automatized method of data compilation
  • Elaboration of such a database extremely
    difficult
  • No tools facilitating corpus creation
  • No data exchange standard established

7
Proposed solutions
  • CHILDES extension into phonological realm
  • Phon (Rose and colleagues)Software for
    transcription, compilation and analysis of
    phonological data
  • Specialized for research in acquisition
  • Standardized for data sharing
  • PhonBank (MacWhinney and Rose)Publicly-available
    database on phonological development
  • Multilingual
  • Several acquisition contexts and periods

8
The Phon project
  • Prerequisite to the building of PhonBank
  • Multi-disciplinary team at Memorial University of
    Newfoundland
  • Close collaboration with Brians team at Carnegie
    Mellon University
  • Design and implementation criteria
  • Reliability
  • Simplicity / flexibility / adaptability
  • Analytical neutrality
  • Compatibility
  • Availability

9
Phon an overview
  • Intuitive graphical user interface
  • Dynamic interaction between software and user
  • Flexible project customization functions
  • Support for multiple alphabets (IPA, Roman,
    Arabic, Chinese, Japanese, Cyrillic, Algonquin)
  • User-defined data fields
  • Functionality for phonological data
  • Descriptive features for transcribed segments
  • Support for both segmental and prosodic info
  • Easily expandable modular architecture
  • FREE!!!!!

10
Phon technical aspects
  • Cross-platform compatible(Macintosh, Windows,
    Unix/Linux platforms)
  • Programmed in Java
  • 100 Unicode compliant
  • Support for playback of audio and video recorded
    data in several formats
  • Data storage in XML format
  • Compatible with the CHILDES TalkBank schema
  • Data can be accessed by other applications
  • IPA transcription standards supported
  • Open-source

11
Phon general interface
Project navigator
Records for linguistic data and other
annotations
Multimediacentre
Session metadata
Navigation between records
12
User management
Password-protectedaccess
Task management
13
Multimedia data segmentation
  • Enables the delimitation of the recorded portions
    that are relevant for research
  • Functionality to edit the segments identified

Segment playback
Segment export(AIFF/WAVE formats)
14
Phonetic transcription
Jaime les ours en peluche bruns
??m le zu?s ? p?l?? b?
? nus py? ba
  • Support for
  • Multiple-blind transcriptions (using user ID and
    task management functionality)
  • Phonetic dictionaries of target forms (e.g. CMU
    Pronouncing Dictionary)

15
Merging of multiple-blind transcriptions
Selection of most accurate transcriptions
Access and comparison of all transcriptions of
target and actual forms
Refinement of selected transcriptions (if needed)
16
Segmentation of transcribed utterances
  • Multiple-word utterances often must be divided
    into smaller portions
  • Access to precise domainsof analysis
  • Enables an analysisof several levels (e.g.
    utterance, phrase, word, morpheme, )
  • Example of divisioninto lexical items

17
Syllabification
  • Phonological research must consider prosodic
    factors, including
  • Number of segments in syllables
  • Shapes of syllables (ex. CV, CVC, CCV)
  • Positions of segments within syllables
  • Positions of syllables within the word
  • Stressed versus unstressed status of syllables
  • Manual coding is tedious and time-consuming
  • We need a reliable, automatic system

18
Syllabification algorithm (Hedlund OBrien 2004)
  • Automatic parsing of segmental strings into
    syllables
  • Several parses possible based on parameters
    modifiable by the user(no theoretical bias
    imposed)
  • Possibility to test different hypotheses for
    target and actual forms
  • Labelling of syllables and their segments for
  • Word-level prosodic information
  • Syllable constituency
  • Manual modification of spurious results

19
Syllabification et modification interface
Syllable constituent labels colour-codes
Labelling modifiable through contextual menus
20
Alignment of target and actual forms
  • Several investigations require systematic
    comparisons
  • Segment per segment (ex. /b?næn?/ b _ _ æn?)
  • Syllabe per syllabe (ex. /e?p??k?t/ e _ _ _ _
    ko _ )
  • Comparisons not always easy to obtain

Wrong alignment!
Valid alignment!
ko
e
ko
21
Alignment algorithm (Maddocks 2005)
  • Segments and syllables aligned based on their
    featural similarity
  • Dynamic programming Complex problems solved
    through resolution of their simpler sub-parts
    e.g.e?p??k?t/eko(e?/e)(e?p??/eko)(e?p??k?t/ek
    o)
  • Rewards and penalties
  • Reward example Alignment of stressed syllables
  • Penalty example Alignment with nothing (empty
    featural set)

22
Effects of rewards and penalties
23
Algorithm optimization
  • Problem syllable alignment in different corpora
    require different parameter settings, difficult
    to adjust manually
  • Solution genetic algorithm (GA)
  • 1- generate alignments from a representative
    corpus
  • 2- revise results manually
  • 3- GA automatically optimizes parameters based on
    manually revised corpus
  • 96 ? 98 efficiency on English corpus
  • 85 ? 96 efficiency on Dutch corpus (initially
    generated with English settings)

24
Modification of spurious alignments
  • Alignment algorithm provides reliable results for
    preliminary analyses
  • Remaining cases must be aligned manually

Select syllable
Add to alignment
Complete alignment
?
?
25
Query language
  • PhonBasic (Hedlund OBrien 2004)
  • Characteristics
  • Selectors and predicates terms commonly used by
    linguists
  • Syllable, stressed, voiced, labial,
  • Boolean connectives
  • Custom predicates
  • Prevocalic LApp, Onset
  • Postvocalic Coda, RApp, OEHS
  • Sample of queries pre-installed
  • Memorization of recent queries
  • Saving and sharing of queries

26
Query results
  • Visualization from within the application
  • Generation of textual reports
  • Recording session
  • Exemplification of a given process
  • Time period
  • Exemplification of an acquisition stage
  • Entire database
  • Establishment of a learning curve
  • Exportation of results
  • Text format (Unicode encoding)
  • CVS format (ex. Excel, Access, FileMaker Pro, ...)

27
Future functionality
  • Support for importation of existing corpora
  • Additional of dictionaries of target forms
    supporting other languages/dialects
  • Incorporation of basic statistical functions
    (using already-existing Java packages)
  • Schema/graph generation
  • Bar graphs (e.g. to illustrate the relative
    prominence of patterns)
  • Line graphs (e.g. to illustrate learning curves)

28
Future functionality
  • Interoperability with Praat and/or SFS
  • Basic goal compilation of acoustic parameters
    relative to phonological domains
  • Alignment of transcriptions with
    waveforms/spectrograms (TextGrid-like function)
  • Exportation of samples for speech analysis
  • Importation of acoustic measurement data
  • Web interface
  • Data sharing at a distance
  • Query of PhonBank without the need of downloading
    corpora
  • Automatic detection of patterns

29
Timeline
  • Late July / early August, 2005
  • Release of the first complete version of Phon 1.0
    (beta) for Macintosh, Windows, Linux, UNIX
  • Partial compatibility with existing CHILDES
    corpora
  • August - October, 2005
  • Testing/debugging of the beta version
  • Extension of CHILDES compatibility
  • November, 2005
  • Official release of Phon 1.0 at BUCLD
  • Beginning of the PhonBank initiative

30
PhonBank project
  • Project leaders
  • Brian MacWhinney (CMU)
  • Yvan Rose (MUN)
  • Barbara Davis (Texas-Austin)
  • Rodrigue Byrne (MUN)
  • Research consortium
  • 26 collaborators, 16 languages
  • Monolingual, bilingual, clinical, babbling,
    second language,
  • Awaiting results from grant application to NIH

31
Immediate potential
  • Scientific exchanges between researchers working
    in related areas made easier
  • Research based on
  • Much stronger empirical base
  • Combination of various experimental methods
  • Systematic comparisons of various corpora
  • Within and across languages
  • Within and across populations
  • Within and across age groups

32
Long term potential
  • Better understanding of
  • Language acquisition process
  • Developmental and acquired language disorders
  • Contribution to development of more adequate
    theoretical models
  • Establishment of more accurate baselines for
    early detection of language delays/disorders
  • More rapid and efficient educational and
    therapeutic interventions

33
Thanks for your attention
34
Acknowledgements
  • People at MUN
  • David Graham, Dean of Arts
  • Robert Lucas, Dean of Science
  • Jim Black, Associate Dean of Arts
  • Barbara Cox et son équipe, Office of Research
  • Marguerite MacKenzie, Head of Linguistics, as
    well as all members of the department
  • Wolfgang Banzhaf, Head of Computer Science
  • For their feedback and encouragementÉliane
    Lebel, Heather Goad, Paula Fikkert, Clara Levelt,
    Katherine Demuth, Mark Johnson, Carrie Dyck, Phil
    Branigan, Brian MacWhinney, Bryan Gick, Sophie
    Wauquier-Gravelines, Sharon Inkelas, Conxita
    Lleó, Sónia Frota, Maria João Freitas, Ronald
    Sprouse, Joe Pater, John Archibald, Éliane Lebel,
    Susana Correia, Laetitia Almeida, Teresa da
    Costa, Barbara Davis, Christophe dos Santos,
    Sophie Kern, Christine Champdoizeau, Jennifer
    Parsons, Carla Dunphy, Lindsay Babcock, Allison
    Strong, Megan Maloney, Marina Vigário hoping
    that we didnt forget anyone

35
Acknowledgements
  • The team behing Phon
  • Rod Byrne, Todd Wareham, Gregory Hedlund, Philip
    OBrien, Keith Maddocks
  • The CHILDES computer guys
  • Franklin Chen (Carnegie Mellon University)
  • Leonid Spektor
  • Financial support
  • Arts Faculty, Memorial University (Y. Rose)
  • VP Research, Memorial University (Y. Rose)
  • Social Sciences and Humanities Research Council
    of Canada (J. Brittain, C. Dyck, Y. Rose M.
    MacKenzie)
  • Natural Sciences and Engineering Research Council
    of Canada (T. Wareham)
  • National Science Foundation (B. MacWhinney)
  • Canada fund for Innovation (Y. Rose)
Write a Comment
User Comments (0)
About PowerShow.com