Title: From thesauri to ontologies: semantic standards for law
1From thesauri to ontologies semantic standards
for law
- Daniela Tiscornia
- tiscornia_at_ittig.cnr.it
2Index of arguments
- 1. Linguistic barriers
- 2. Language-dependent approaches
- - traditional tools (thesauri, keywords)
- - metadata
- - lexicons
- 3. Language-independent tools ontologies
- 4.The Lois database
- 5. Conclusions
3Linguistic barriers
- hamper
- access to content for non expert users
- semantic interoperability in e-government
- cross-lingual Legal Information Searching
- commercial Exploitation of Public Sector
Information
4References
- Fellbaum C. (editor), WordNet An electronic
lexical database, Cambridge, MA The MIT Press,
1998, 305, downloadable from http//mitpress.mit.
edu/book-home.tcl?isbn026206197X. - Gangemi A., Guarino N., Masolo C., Oltramari, A.
Sweetening WordNet with DOLCE, AI Magazine 24(3)
Fall 2003, 13-24Legal Knowledge and Information
Systems, Proceedings of JURIX Conferences,
Amsterdam, IOS Press. - Gangemi A., Sagri M.T., Tiscornia D., Metadata
for Content Description in Legal Information,
Workshop Legal Ontologies, ICAIL2003, Edinburgh.
In press for Journal of Artificial Intelligence
and Law, Kluwer. - Hirst G., Ontology and the lexicon In Staab,
Steffen and Studer, Rudi (editors) Handbook on
Ontologies in Information Systems, Berlin
Springer, 2003, p.14.
5Limits of the language-based retrieval tools
- Terminology vs common language the Italian Code
on Data Protection doesnt contain the term
privacy - Polisemy the Italian term ordine (order) has 4
legal senses - Cross lingual IR the Italian term diritto means
right and law.
6Thesauri
- Vertical (systematic) Thesauri (e.g.Eurovoc)
- mono hierarchic tree structure of terms
interlinked with broad and basic relationship
(BT,NT,RT) no distinction is made between terms
and concepts, semantic specification of relations
is missing. - Horizontal thesauri (e.g. The Italgiure semantic
area) unstructured collection of terms without
any distinction among words, concepts, types,
part of speech .
7Semantic metadata
- Semantic metadata are expected to support search
engines for legal information retrieval,
providing legal knowledge to include into their
search strategies - Conceptual search strategies based on keywords
are still missing a clear semantics of terms, and
this does not allow a conceptual query expansion - there is no semantic relationship between
information needs of the user and the information
content of documents, apart from text pattern
matching
8Sense distinction
- From EU Legislation texts, four senses of
'worker' are defined - any worker as defined in Article 3 (a) of
Directive 89/391/EEC who habitually uses display
screen equipment as a significant part of his
normal work. - any person employed by an employer, including
trainees and apprentices but excluding domestic
servants - any person carrying out an occupation on board a
vessel, including trainees and apprentices, but
excluding port pilots and shore personnel
carrying out work on board a vessel at the
quayside - any person who, in the Member State concerned, is
protected as an employee under national
employment law and in accordance with national
practice - The corresponding lexical entry is defined as
follows - a person who works at a specific occupation
9From words to concepts
- A semantic theory requires an ontology of all the
concepts or predicates expressed by the words of
a language - Concepts are organized in structure that
represent knowledge about the world - Lexicons map words to concepts words are
lexicalizations of a concept a concept can be
represented by many terms (words or phrases) in
multiple languages one term can identify several
concepts - A lexicon it is a bridge between a language and
the knowledge expressed in that language (Sowa
2000), but it is still language dependent!
10Describing concepts
- By Lexical and semantic relations (e.g. Wordnet)
- By semantic roles among predicates (verbs) and
their arguments(FrameNet) - By properties linked by formal relations
(ontologies)
11WordNets Family
- WordNet (WN)(freeware, American English)
(Cognitive Science Laboratory Princeton
University) - EuroWordNet (EUW) (proprietary, European
languages) (ILC Institute of Computational
Linguistic-Pisa for Italian language) - ItalWordNet (IWN)(Italian part of EWN)
(IRST-ICT-Trento) - Jur-(Ital)WordNet (JWN) (C.N.R. Project ITTIG
Institute of Theory and Techniques for Legal
Information, ILC, LOA Laboratory of Applied
Ontology)
12Frame Net (Fillmore 1997)
- FrameNet is a frame-semantic description of
lexical items based on semantically tagged
corpora. Semantic roles (case roles, thematic
roles, theta roles) characterize the semantic
relation that a predicate can have to its
arguments - Mapping between the syntactic constituents of a
sentence and the frame semantic elements
13Thematic roles
- Based on
- syntactic patterns subj verb object
- semantic patterns agent action (action or
state associated with the verb the participants
the roles of participants) - ontological assumptions role as participant
relation and roles as ontological classes
(Guarino 2004)
14Conceptual-oriented vscontext-sensible
representation
- The traditional 'standardisation oriented' and
'concept centred' approach, where (ideally) only
one term is assigned to a concept, has proved to
fail in cross-lingual conceptualizations - Termino-ontographers' need an intermediate
structure of the dominion, to distinguish
language-independent concepts and relations from
concepts and relations which are not (Kerremans
and Temmerman, 2004)
15The importance of context
- It is necessary anchoring of term extraction,
term definition and inter-term relation
identification on the contexts of use -
- In law, legislative definitions are contexts
which have a prescriptive force. This fact
influences the determination of the number of
senses of terms, and the equivalence setting
between legal concepts and lexical concepts
16 Lightweight ontologies
- Lexicons are considered lightweight ontologies,
linguistic expansions of the description of a way
of perceiving reality, with limited formal
modelling. - It is possible that a lexicon with a semantic
hierarchy might serve as the basis for a useful
ontology, and that an ontology may serve as a
grounding for a lexicon. This is particularly the
case in technical domains, in which vocabulary
and ontology are more closely tied than in more
general domains (Hirst 2003).
17The proposed approach
- Define a shareable conceptual model based on a
semantic structure (classes of concepts and of
semantically constrained relations). - Concepts in the model are lexicalized by a
multilingual lexicon which provide a source of
legal semantic metadata (e.g. The Lois data
base), - locally and dynamically incremented,
- integrated by existing resources,
- to support a semantically structured Google for
Law.
18The Lois project
- The Lois project (EDC 22161) aims at developing
a multi-language legal thesaurus based on WordNet
and EuroWordNet technology - WordNets lexicons pertain to the class of
computational lexicons that aim at making word
content machine-understandable via the highly
structured semantic representation of concepts.
These are represented by synsets, a set of all
the terms expressing the same conceptual area,
linked by a semantic relation of meaning
equivalence. A synset is a set of one or more
uninflected word forms (lemmas) with the same
part-of-speech that can be interchanged in a
certain context. - Cross-lingual equivalence relations are made
explicit in the so-called Inter-Lingual-Index
(ILI). The ILI is the superset of all concepts
from all wordnets, and the concepts from
indigenous wordnets are linked into one or more
ILI records by means of equivalence relations. - ILI is an unordered list of concepts, i.e., it
does not have any internal structuring. The
reason behind this is that we assume that each
language imposes its own language specific
structural constraints on the concepts.
Therefore, any ordering of ILI concepts needs to
be retrieved from knowledge bases that link into
the ILI (or from ontological classification).
19Lois Architecture
20Multiple Levels in Legal Language
Philosophy of Law
Lexical Data Base
Judges discourse
EU-National Legal Concept
Legislators language
21National legal WNs
- The Lexical Data Base conceptualizes general
language entities pertaining to legal theory and
legal dogmatics (structured according to the EWN
methodology). - The Legislative data base (EU-National Legal
Concept) is populated by concepts defined in
European and national legislations.
22LEXDB Lexical DB
- Lexical legal concepts 1944 ILI records.
- first nucleus translated from the Italian
JurWN739 synsets - new concepts selected by legal expert provided by
Universities of Vienna, Evora, Praha and
Sheffield.
23The Lexical Data Base
- synsets are linked by
- internal relations
- cross-lingual relations
- eq_synonym,eq_near_synonym,eq_has_hyperonym,
- eq-has_hyponym, etc.
Lexical relations (syn, antonym,
near-syn) Semantic relations (Hyper, Hypo, role,
instance, etc.)
24EULX EU lexical concepts
- Terminology automatically extracted from EU texts
which do not occur as explicitly defined - The selection has been automated by analysing the
English EU directives, and extracting salient
terms, mapping them to WordNet and selecting
only the ones with one legal meaning in WordNet - This process has created an automatic import of
terms with gloss, and a plug-in synonym relation
into WordNet
25EULG EU legal concepts
- Concepts from EU directives with explicit
definition, obtained by a process of
semi-automatic alignment of the EU directives in
the different languages 2332 ILI records
26NATLG National legal concepts
- Concepts defined in national legislation within
the domain of consumer law or implementing the
EU legislation in the the domain automatically
or manually extracted.
27EU-National Legal Concept
EU-National legal document
National Legislation
ID
Celex Def.s about 2478 concepts
Implemented_as
National Legal Concepts
eq_synonym ,eq_near_synonym,has_hyperonym
28Kinds of equivalence in Lois
- 1. Between lexical concepts
- near-equivalence
- hypo/hyper-equivalence
- functional equivalence
- 2. Between legal concepts
- legal equivalence
29(No Transcript)
30(No Transcript)
31(No Transcript)
32Consumer Protection Law structuring the domain
(I)
- Lexical Def. ILI GLOSS - worker_1 a person who
works at a specific occupation. - EU Def.s
- 8.2005-02-02 worker_2 any person who, in the
Member State concerned, is protected as an
employee under national employment law and in
accordance with national practice - 23 2005-02-02 worker_3 any person carrying out
an occupation on board a vessel, including
trainees and apprentices, but excluding port
pilots and shore personnel carrying out work on
board a vessel at the quayside - 22. 2005-02-02 worker 4 any person employed by
an employer, including trainees and apprentices
but excluding domestic servants21. 2005-02-02
worker_5 any worker as defined in Article 3 (a)
of Directive 89/391/EEC who habitually uses
display screen equipment as a significant part of
his normal work.
Has_hyper
Has_hyper
33Consumer Protection Law structuring the domain
(II)
Implemented-as
EU concept device National concept medical
device National concept active implantable
medical device
Near-syn
Has_hyper
Has_hyper
Has_hyper
34Consumer Protection Law structuring the domain
(III)
Core Ontology physical object social
object EU concept device National
concept medical device National concept
active implantable medical device
Has_hyper
Has_hyper
35The Core Legal Ontologyas ordering principle
- Creating ILI records from WordNET high-level
concepts. - Creating ILI records from the upper concepts of
the IT-LEXDB linked to the Core Legal Ontology
(together with their LCO links), used as
hypernyms in local hierarchies. - Link WordNET high-level to CLO categories.
36Why do we need a core legal ontology as ordering
principle
- Disadvantages
- Manually performed
- Limited improvement of searching capabilities
- Advantages
- Aid in harmonizing lexical concepts proposed by
national legal experts and existing lexical
resources - Added value future use of the lexical resources
in Semantic Tagging, Information Extraction,
Ontologies building, Knowledge-Based Systems.
37Dolce D S and the Core Legal Ontology (CLO)
- DOLCE (a Descriptive Ontology for Linguistic and
Cognitive Engineering) is a foundational ontology
(FO) developed originally in the EU WonderWeb
project - DOLCE, extended by means of the Description
and Situation(DS). ontology, is suited to
conceptualize domains (such as Law) that are
mainly constituted by non Physical (Mental,
Social) objects. - A Description in Dolce DS is a social object,
which represents a conceptualization.
Differently from physical objects, social objects
are dependent on some agentive physical object
that is able to conceive them. Descriptions have
typical components, called concepts. Concept is
also a social object, which is defined by a
description and can be used in other
descriptions. Figures, or social individuals
(either agentive or not) are other social
objects, defined by descriptions. Typical
agentive figures are societies, organizations,
and in general all socially constructed persons.
(Gangemi et al.2005)
38Dolce
39CLO Cathegories
In CLO a norm is a Legal Description which has
components such as a Task (the set of actions
the norm aims to regulate) legal roles (played
by legal subjects involved) and parameters, as
temporal and physical locations. Legal
Descriptions are satisifed by Situations
(Fattispecie) composed by entities pertaining to
real word (Legal Subjects as Persond, Bodies,
etc.) and by Behaviours performed by them.
40'Translating' legal concept
- The Italian term contratto is, in terms of CLO
concepts, a legal description, an information
content and a physical object (the material
support of the information content). - A legal institution, for instance the Prime
Minister, is a figure, created by norms, but it
is also a social role.
41Comparing WordNet High-level and CLO classes
WN
CLO Artificial Person
Artificial Person .
Social Figure Person
Social concept
Being2
Non-Physical Object Living thing1
Endurant
Object1
Entity Physical entity1
Entity
42Comparing WordNet High-level and CLO classes
WN CLO Lease
Lease contract
contract written
agreement social description
agreement 1
social concept statement 1
non physical Endurant
message2
Endurant communication2
Entity
abstraction abstract
entity
43Comparing WordNet High-level and CLO classes
WN CLO Consumer
Consumer User1
Social Role Person
Social Concept Being2
Non-Physical
Endurant Living thing1
Endurant Object1
Entity Physical
entity1 Entity
44Conclusions (I) the importance of semantic
metadata
- Structural documentary standards (Legal XML) must
be integrated with semantic ones for the
description of content, to achieve a high level
of semantic interoperability between sectors in
order to - improve communication between areas and services
of the Public Administration - make it possible for the user to access
information and to make that information
available for further use by other sections of
the Public Administration - develop easy-to-access tools to incorporate and
organize the data the users themselves are asked
to supply.
45Conclusions(II) the semantic lexicon role
- A semantic lexicon for law should be a source of
semantic metadata, shared by multilingual and
multinational legal information systems. - It needs to be based on a common conceptual model
of legal and world knowledge - The Lois project aims at defining a methodology
to achieve this goal.
46Conclusions(III) lesson learned
- One of the main methodological point to be faced
is the harmonization between - lexical and legislative concepts,
- linguistic and ontological levels
- domain and world entities
- and the integration between
- new and existing resources
- manual and semi-automatic procedures.
47References
- Breuker, J. and Hoekstra, R. (2004) Epistemology
and ontology in core ontologies exemplified by
two core ontologies for law FOLaw and LRI-Core.
In Coront-Wes Ekaw 2004. - Gangemi, A., Sagri, M.-T., Tiscornia, D.,
(2005), A Constructive Framework for Legal
Ontologies . In Law and the Semantic Web (
Benjamins, Casanovas, Breuker and Gangemi eds.)
Springer Verlag, 2005. - Gangemi, A., Guarino, N., Masolo, C., Oltramari,
A., Schneider, L. (2002), Sweetening Ontologies
with DOLCE. In proceedings of EKAW 2002. - Hirst, G. (2004), Ontology and the Lexicon, in
(Staab and Studer eds.)HAndbook on Ontologies,
Springer, 2004. - Kerremans K. and Temmerman R.(2004) Towards
Multilingual, Termontological Support in Ontology
Engineering. In Proceeding of Termino 2004 ,
workshop on Terminology, (2004). - Peters W., M. T. Sagri,Tiscornia D.,The
Structuring of Legal Knowledge in LOIS, in
Artificial Intelligence and Law Journal ,
forthcoming. - Vossen, P., Peters, W. and DÃez-Orzas, P. (1997),
The Multilingual design of the EuroWordNet
Database, in Mahesh, K. (ed.), Ontologies and
multilingual NLP, Proceedings of IJCAI-97
workshop, Nagoya, Japan, August 23-29.