Title: ISOTC37SC4TDG6 Language Resource Ontologies
1ISO/TC37/SC4/TDG6Language Resource Ontologies
- 2008-09-27, Pisa
- HASIDA Koiti
- hasida.k_at_aist.go.jp
- CfSR, AIST, Japan
2Ontologization
- reformulation in terms of ontology
- provide standard way to convert annotations to
labeled directed graphs - DCR, LAF, LMF, FS, MAF, SemAF, SynAF, MLIF, etc.
- Cf. LMF and MAF have UML-based schemas.
- not XML but RDF as base description and modeling
tool - standard semantic interpretation for RDF
- highlight semantics rather than syntax
3Purposes of Ontologization
- interoperability
- among ISO/TC37 standards
- with ontologies from elsewhere
- with any data containing linguistic content
- RDF data are easier to integrate than XML data.
- e.g. external annotation of texts in SMIL data
without including linguistic description in SMIL
specification - fuller formalization of IS specifications
- semantic extension of DCR
4Semantic Extension of DCR
- sorts of DCs
- unary predicate ? class
- binary relation ? property
- symmetric binary relation, etc.
- types of the domain (1st arg.) and the range (2nd
arg.) of binary relations (properties)
5XML Mess
- Semantic interpretation of XML is not
standardized but defined ad hoc. - Many inconsistent standards on overlapping
issues. - Huge standards containing many different semantic
interpretation manners. - e.g., MPEG-7 gt 2000 pages
6RDF
- Resource Description Framework
- labeled directed graph
- W3C recommendation http//www.w3.org/RDF/
- Schemas are provided by RDFS, OWL, etc.
- textual representation
- XML, N3, etc.
7RDF Graph
http//meetings.example.com/m1/hp
mhomePage
http//meetings.example.com/calm1
mattending
Fred
mgivenName
http//www.example.org/peoplefred
mhasEmail
mailtofred_at_example.com
8Conversion of XML to RDF
- AnyURI- and IDREF(S)-type attribute
- ? object property (link)
- other attribute ? datatype property
- embedded element
- ? object/datatype property
924610 Feature Structure
- typed feature structure as in HPSG, etc.
- ISO 24610-1 Feature Structure Representation
- ISO 24610-2 Feature System Declaration
- labeled directed graph
- AVM (attribute-value matrix)
- textual encoding by XML
10FS Graph RDF Graph
determiner
POS
ORTH
la
SPECIFIER
AGR
NUMBER
singular
AGR
HEAD
noun
POS
ORTH
pomme
11FS in AVM
SPECIFIER HEAD
POS determiner ORTH la AGR 1NUMBER singular
POS noun ORTH pomme AGR 1
12Ontologies Subsume Feature Systems
- Features are partial functions, whereas RDF
properties are relations in general (possibly
partial functions). - Usual feature systems have no taxonomy of
features, whereas usual ontologies have
taxonomies of properties (e.g., due to
rdfssubPropertyOf).
13Feature-System Declaration
ltfsDecl type"word" baseTypes"sign"gt
ltfsDescrgtThe fundamental type for individual
wordslt/fsDescrgt ltfDecl name"orth"gt
ltfDescrgtThe orthographic representation for this
wordlt/fDescrgt ltvRangegtltstring/gtlt/vRangegt
lt/fDeclgt lt/fsDeclgt
The fundamental type for individual words
sign
rdfscomment
rdfssubClassOf
The orthographic representation for this word
word
rdfscomment
owlFunctionalProperty
rdftype
rdfsdomain
orth
string
rdfsrange
14Constraint (Conditional)
ltcondgt ltfsgt ltf name"inv"gt ltbinary
value"true"/gt lt/fgt lt/fsgt ltthen/gt ltfsgt
ltf name"aux"gt ltbinary value"true"/gt
lt/fgt ltf name"vform"gt ltsymbol
value"fin"/gt lt/fgt lt/fsgt lt/condgt
inv
true
X
cond
aux
true
X
fin
vform
SWRL representation inv(?X,true) -gt aux(?X,true)
vform(?X,fin)
15FS Ontologization (Summary)
- RDF ? FS
- Use ontologies for feature-system declarations.
- SWRL to encode constraints
- Defaults are outside of ontology.
1624612 Linguistic Annotation Framework
17GrAF in RDF
TOKEN
rdfstype
DET
POS
The
rdfstype
BASE
THE
clock
NN
POS
BASE
CLOCK
rdfstype
NP
NUMBER
SING
possibly stand-off annotation
18SemAF-DActs
Dialogue
sender
1..
1..1
Turn
0..
Agent
overhearer
addressee
1..
1..
Utterance
0..
func.dep.
1..
DialogueAct
19TODOs (projects in TDG6?)
- include ontologies in documents
- FSD
- just check UML (as far as no property hierarchy
is necessary) - LMF, MAF
- finish ontologization (possibly in UML)
- SynAF
- ontologize from scratch, forgetting XML
- DCR, SemAF-Time, SemAF-DActs, MLIF, etc.
20Issues
- Who should ontologize individual WIs?
- ontologize future WIs from the beginning
- TDG6 should exemplify how.
- whether and how to make ontologization mandatory?
- Where to include ontologies of ongoing WIs?
- depending on their stages (WD, CD, ...)
- How to keep ontologizing DCs?
- replace DC metamodel by ontology?
- modify ISOCat?