Title: Geographic reference analysis for geographic document querying
1Geographic reference analysis for geographic
document querying
- F.Bilhaut , T.Charnois, P.Enjalbert Y.Mathet
- bilhaut, charnois, enjalbert, mathet_at_info.unicae
n.fr - GREYC, CNRS UMR 6072
- University of Caen
2The "GéoSem" project
- Passage extraction from geographical documents
- From a query to a ranked set of passages
- Queries are concerned with
- - time
- - phenomenon
- - space
3Excerpt from "Hérin" corpus
- From 1965 to 1985, the number of high-school
students has increased by 70, but at different
rythms and intensities depending on academies and
departments. Lower in South-West and Massif
Central, moderate in Brittany and Paris, the rise
has been considerable in Mid-West and Alsace.
Also occurs the schooling duration increase which
was more important in departments where, in the
middle of the 60's, study continuation after
primary school was far from beeing systematic.
4Excerpt from "Hérin" corpus
- From 1965 to 1985, the number of high-school
students has increased by 70, but at different
rythms and intensities depending on academies and
departments. Lower in South-West and Massif
Central, moderate in Brittany and Paris, the rise
has been considerable in Mid-West and Alsace.
Also occurs the schooling duration increase which
was more important in departments where, in the
middle of the 60's, study continuation after
primary school was far from beeing systematic.
Time
5Excerpt from "Hérin" corpus
- From 1965 to 1985, the number of high-school
students has increased by 70, but at different
rythms and intensities depending on academies and
departments. Lower in South-West and Massif
Central, moderate in Brittany and Paris, the rise
has been considerable in Mid-West and Alsace.
Also occurs the schooling duration increase which
was more important in departments where, in the
middle of the 60's, study continuation after
primary school was far from beeing systematic.
Time
Phenomenon
6Excerpt from "Hérin" corpus
- From 1965 to 1985, the number of high-school
students has increased by 70, but at different
rythms and intensities depending on academies and
departments. Lower in South-West and Massif
Central, moderate in Brittany and Paris, the rise
has been considerable in Mid-West and Alsace.
Also occurs the schooling duration increase which
was more important in departments where, in the
middle of the 60's, study continuation after
primary school was far from beeing systematic.
Time
Phenomenon
Space
7Queries
- Which passages address educational difficulties
in west of France in the 50's ? - Which passages address variations of the number
of pupils in rural areas ? - Which passages address Calvados district?
8Queries
- Which passages address educational difficulties
in west of France in the 50's? - Which passages address variations of the number
of pupils in Paris area? - Which passages address Calvados district?
9Some Signifiant Spatial Expressions
Paris in north of France from south of
Loire Some seabord towns The quarter of The
districts in north of France Fifteen All
Some seabord towns of Normandy The most rural
districts situated from south of Loire
10The type "zone"a georeferenced area anchored in
a named place
Paris in north of France Normandy Fro
m Normandy to Alsace
from south of Loire
11The LocGeo type
- The canonical form
- quantificationtypezone
Quant
Type Zone
qualification
administrative Position
named geo. entity
The quarter of / districts in north
of France Fifteen / All / Some
seabord towns of Normandy The
most rural districts situated from south
of Loire Some seabord towns
12The LocGeo type
quant
type
zone
Quant
Type Zone
qualification
administrative Position
named geo. entity
The quarter of / districts in north
of France Fifteen / All / Some
seabord towns of Normandy The
most rural districts situated from south
of Loire Some seabord towns
13Semantic Representation
Paris
ty_zone town
egn
nom Paris
zone
loc internal
Lat 45.633333
coord
Long 5.733333
14Semantic Representation
Some seabord towns in north of Normandy
type relative
quant
ty_zone town
type
geo seabord
locgeo
ty_zone region
egn
nom Normandy
zone
loc internal
position north
15Implementation and (first) Results
- A tokenisation and a morphological analysis
- A DCG to perform altogether syntactic and
semantic analysis the grammar contains 160
rules an internal lexical base of 200
entries a gazetteer of 100000 named places
(France) - 9OO expressions recognised and analysed from a
geographical corpus (200 text pages) - Good results but a precise and quantitative
evaluation to be done
16Semantic matching Why ?
corpora
the south of a Bordeaux-Genève line
Text A
the northern half of France
3
a query
1
In Paris and Toulouse
"Which passages address Paris ?"
2
Text B
In Ile de France region
17Semantic matching How ?
- Spatial compatibility
- Is the zone denoted by the passage spatially
compatible with the one of the query? (is there,
at least, an intersection?) - Relevance degree
- if this zone is compatible, how relevant is
it w.r.t.the query? - - probability
- - granularity
18Compatibility computation
- Q1) Which passages address Paris ?
- P1) the capital city
- P2) big cities in France.
- P3) the northern half of France
- P4) South of a Bordeaux-Genève line.
YES
gazetteer
YES
gazetteer computation
YES
NO
giscomputation
19"the northern half of France"
20"the south of a Bordeaux-Genève line"
21Relevance degree (1)Quantification
- Query "Calvados" (french district)
- P1 "The quarter of districts in north of France"
- P2 "All districts in north of France"
- P3 "Some districts in north of France"
- P4 "Fifteen districts in north of France"
rank
3
r25
1
r100
4
ri/n5/529.6
2
ri/n15/5229
22Relevance degree (2)Granularity
country region district city "zone"
 the northern half of France
"Basse Normandie"
"Caen"
"Calvados"
23locgeo(locgeo(detDet..typeType..Zone)) --gt
prep, det(Det), type(Type), zone(Zone). det(Sem)
--gt X,lexique(X,XR,det,Sem). type(X)
--gt typeQualif(X). type(ty_zoneN) --gt
nomtype(N). typeQualif(ty_zoneN..Q) --gt option,
nomtype(N), prep, qualif(Q). nomtype(Sem) --gt
X, lexique(X,XR,nom,Sem). zone(X)--gt
egn(X). egn(egn(ty_zoneT..nomY..coordC))
--gt --gt ls_lexiconExtDCG(np,
type_semegn..type_zoneT..nomY..coordC
). egn(egn(ty_zoneT..nomY)) --gt
X,lexique(X,XR,np, type_semegn..type_zoneT
..nomY). Â
24lexique(quelque,quelque,det,type_semrelatif..ty
perelatif_qualifie ..nb'qualitatiffaible').
lexique(tout,tout,le,det,type_semexhaustif).
lexique(région,région,nom,type_semzone(administ
rative) ..nom_zonerégion). lexique(ville,ville
,nom,type_semzone(administrative) ..nom_zonevi
lle). Lexique('Bretagne','Bretagne', np,type_
semegn..type_zonerégion..nom'Bretagne').