TKE 2005 - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

TKE 2005

Description:

venir de gauche (come from left ) ; diriger vers infrastructure (direct to ... Direction (direction) : droite (right), gauche (left), devant (in front of) ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 22
Provided by: isv2
Category:
Tags: tke | gauche

less

Transcript and Presenter's Notes

Title: TKE 2005


1
Towards a Text Mining Driven Approach for
Terminology Construction
  • Valentina Ceausu, Sylvie Desprès
  • CRIP 5, René Descartes University

2
Overview
3
Why a terminology of road accidents ?
  • Exploited by a case based reasoning system
  • CBR
  • Case base (collection of source cases)
  • Created from accident scenarios
  • Accident scenarios natural language description
    of sets of similar accidents
  • Created by experts in road safety
  • New problem (target case)
  • Created from accident reports
  • Accident reports created by policemen

4
 Scope and available resources
  • Scope
  • To compare cases created from accident reports
    with cases created from accident scenarios
  • Problem scenarios and reports are created by
    different communities
  • Available resources
  • Meta-model to represent accidents
  • Ontology of road accidents (Protege 2000)
  • To solve the problem
  • Create a terminology of road accidents from a
    set of accident reports

5
Knowledge extraction patterns recognition
algorithm
  • Available corpora 250 reports of accidents in
    and around Lille
  • Goal to extract knowledge from natural language
    corpora
  • Recognition of lexical patterns
  • Pattern association of lexical types
  • Nominal (Noun, Preposition, Noun)
  • Verbal (Verb, Preposition, Noun )
  • Input
  • Annotated corpora (TreeTagger, Cordial)
  • Output
  • Important number of word regroupings
  • Refining approaches

Extract of Accident Report Le cycle de marque GO
SPORT conduit par M XXXXXXXXXXXXXXXXld d'Auteuil,
vient du carrefour des Anciens Combattants et se
dirige vers l'ave Robert Schuman. Au niveau du Nø
31 du dit boulevard le cycle s'arrête sur le côté
droit du côté des num XXXXXXXXXXXXXXXXXe long des
véhicules en stationnement se préparant à
traverser vers le num XXXXXXXXXXXXXXXXXcycle et
sur le passage piétons. Lorsque le cycle commence
sa manoeuvre la voiture de marque Volkswagen Nø
381 LTL 75 conduite par Me XXXXXXXXXXXXXXXXcule,
vient et se dirige dans le même sens de
progression que le cycle, heurte de son avant la
roue arrière du vélo. Suite au choc le cycliste
est blessé légèrement. Transport à l'hôpital
A.Paré à Boulogne par les sapeurs pompiers
locaux. Non admis. Le changement de direction
sans précaution de la part du cycliste et la non
maîtrise de son véhicule de la part de
l'automobiliste semblent être à l'origine de
l'accident.
6
Lexical patterns and corresponding regroupings
  • Lexical Patterns
  • Noun , Noun
  • Noun, Preposition, Noun
  • Noun, Preposition, Adjective
  • Verb, Preposition, Noun
  • Verb, Preposition, Adjective
  • Corresponding regroupings
  • accident , agent (accident, policeman)
  • usager de route (road user)
  • groupe de piéton ( group of pedestrians) 
  • trottoir de droite (right side pavement)
  • diriger vers place (direct to square)
  • virer à gauche (turn left)
  • virer à droite (turn right)

7
Apriori algorithm (1/3)
  • Association rules extraction
  • Agrawal Srikant, 1994
  • Adaptation to text mining Maedche Staab, 2000
  • Basic association rules algorithm
  • Set of transactions
  • Set of words
  • véhicule, conducteur,(vehicle, driver)
  • Association (XgtY)
  • X and Y are word regroupings
  • X conducteur (driver) Y de véhicule (of
    vehicle )

8
Apriori algorithm (2/3)
  • Linguistic rule word co-occurrences
  • Quality measures
  • Thresholds defined by user
  • Intervention of an expert to select threshold
    values
  • Support and confidence exceed user-defined
    thresholds gtassociation rule

9
Apriori algorithm (3/3)
  • Steps of Apriori algorithm
  • Generate the association set (according to
    patterns )
  • For each association
  • Determinate support
  • Determinate confidence
  • Output association rules that exceed user-defined
    confidence and support
  • Apriori output
  • véhicule, automobile ( vehicle, car)
  • volant, véhicule (steering wheel,
    vehicle)
  • conducteur, véhicule  (driver, vehicle)
  • conducteur, camion (driver, van)
  • conducteur, cyclomoteur (driver, motorbike)
  • Output interpretation
  • terms of field
  • trottoir de droite (right side pavement)
  • Relations
  • conducteur, véhicule (driver, vehicle)
  • Type of relations
  • IS-A
  • véhicule, automobile ( vehicle, car)
  • PART-OF
  • volant, véhicule (steering wheel, vehicle)
  • Functional
  • conducteur, propriétaire (driver, owner)

    conducteur, véhicule (driver, vehicle)
  • Particular form
  • conducteur, camion  (driver, van)

10
Refining the set of verbal syntagms (1/4)
  • Verbal syntagms instances of verbal patterns
  • Verb classes identification
  • Class of verbs a set of regroupings generated
    by the same verb
  • Two-term regroupings diriger vers (direct to),
    venir de (come from)
  • Three-term regroupings
  • Instances of Verb, Preposition, (Argument)
    patterns
  • Extensions of two term regroupings
  • venir de gauche (come from left ) diriger vers
    infrastructure (direct to infrastructure )
  • Important number of three term regroupings
  • Extremely fine level of granularity

11
Refining the set of verbal syntagms (2/4)
  • Using a domain model to refine the set of verbal
    syntagms
  • extensions of three-term associations can be
    organized in homogeneous lists
  • Direction (direction)  droite (right), gauche
    (left), devant (in front of) 
  • Lieu (place)   usine (factory),
    parc (parc), domicile (home)
  • Humain enfant (child ),
    piéton (pedestrian), personne (person)
  • Associating each list to a concept of ontology of
    road accidents
  • Ontology previously created from experts
    knowledge
  • Manual intervention to assign lists to concepts

12
Refining the set of verbal syntagms (3/4) Venir
(to come) class
  • venir de hau bourdin (come from hau bourdin )
  • venir de i (come from i)
  • venir de abbaye (come from abbey )
  • venir de résidence (come from residence )
  • venir de rue (come from street )
  • venir de gauche (come from left)
  • venir par (come by )
  • venir par droite (come by right)
  • venir vers enfant (come to child )
  • Noise, instances are eliminated
  • venir de lieu (come from place)
  • venir de infrastructure (come from
    infrastructure)
  • venir de direction (come from direction)
  • venir par direction (come by direction)
  • venir vers humain (come towards human)

13
Refining the set of verbal syntagms (4/4)
  • Decreasing the number of three-term regroupings
  • Many arguments assigned to the same concept
  • Eliminate parasitic regroupings and noise
  • Created lists will not contain terms out of the
    field
  •  diriger vers 12  (direct to 12) 12 will
    be not included in a list
  • - Eliminating valuable regroupings if created
    lists are incomplete

14
Text mining driven terminology construction
15
 Linguistic analysis integrating text mining
results
  • Input of linguistic analysis phase
  • Syntex and Cordial output
  • Goal of this phase
  • Selection of domain terms and
  • Identification of lexical relations
  • Difficulties of this phase
  • Manual treatment difficult for large corpora
  • No information available to guide the selection
  • To solve difficulties
  • Integrate Apriori results
  • Selection of terms
  • Identification of lexical relations

16
Linguistic analysis
17
Normalization phase integrating text mining
results
  • Input of linguistic analysis phase
  • Previously selected terms
  • Lexical relations between terms
  • Goal
  • Definition of terminological concepts
  • Semantic relations modeling
  • Difficulties
  • No information for semantic relations
  • To solve difficulties
  • Integrate lexical relations
  • Integrate previously identified verb classes
  • Integrate non-taxonomic relations provided by
    Apriori

18
Formalization phase integrating text mining
results
19
Conclusion
  • Semi-automatic approach to build a terminology
  • Construction process supported by text mining
    results
  • Association rules results to guide selection of
    terms
  • Lexical patterns improve work with Linguae module
  • Identify non-taxonomic relations
  • Results obtained are more general
  • Syntex output SE DIRIGER vers la Commune de
    Wahagnies (Direct to Wahagnies village )
  • Text mining output diriger vers lieu (direct to
    a place)
  • Semantic relation modeling 
  • Guided by verbs of domain
  • Apriori output

20
Future work
  • Tools in the pre-treatment phase
  • Definition and identification of syntactic
    patterns
  • New heuristics to generate associations
  • Using other quality measures to rank extracted
    rules
  • Towards an automatic approach to assign lists of
    terms to ontology concepts
  • Towards identifying functional and structural
    properties

21
Thank you
  • ceausu_at_math-info.univ-paris5.fr
Write a Comment
User Comments (0)
About PowerShow.com