LINFORMAZIONE GIURIDICA - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

LINFORMAZIONE GIURIDICA

Description:

CONSIGLIO NAZIONALE DELLE RICERCHE. Istituto di Teoria e Tecniche dell'Informazione Giuridica ... (ITTIG CNR) Florence, Italy ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 42
Provided by: Perugi
Category:

less

Transcript and Presenter's Notes

Title: LINFORMAZIONE GIURIDICA


1
A SEMANTIC MODELING APPROACH FOR THE RETRIEVAL OF
LEGISLATION AND DOCTRINE AN ITALIAN EXPERIENCE
Ginevra Peruginelli, Enrico Francesconi Institute
of Legal Information Theory and Techniques
(ITTIG CNR) Florence, Italy
8th Int. Conf. Law Via The Internet, 26
October 2007
2
Contents
  • The Project of a Portal to Legal Literature
  • Phases
  • Standards for legal document identification
  • Software architecture of the Portal to Legal
    Literature
  • Standard implementation
  • Metadata generation

3
LEGAL INFORMATION CONTENT
  • Legislation
  • Case law
  • Administrative acts
  • Legal literature

Links between these legal sources are highly
required
4
PROJECT FOR A PORTAL TO LEGAL LITERATURE
Developed as a feasibility study by the Institute
of Legal Information Theory and Techniques
  • Develop integrated tools for generating and
    capturing metadata for structured and
    un-structured web documents

1 phase
  • Providing multilingual access to foreign legal
    resources.

2 phase
  • Enhancement of the existing solutions oriented to
    semantic searching
  • Implementation of specific facilities to support
    legal users of the Portal in semantic querying

3 phase
  • Linking legal sources in a shared web environment
  • A resolution mechanism for legal doctrine

4 phase
5
THE 1st PHASE A FEDERATION SYSTEM
  • Data of the Portal can be divided into two
    classes
  • - structured data (repositories in libraries,
    using specific metadata schemes)
  • - web documents (usually they do not contain
    any particular metadata schemes). Selected by a
    group of ITTIG legal experts. They are used to
  • - perform a selective exploration on the web
  • train the modules aimed at selecting and
    classifying web documents

Different data sources are retrieved by the
exploiting a common metadata scheme DC Dublin
Core Metadata Scheme and harvested according to
the OAI-PMH Protocol
6
THE 2sd PHASE MULTILINGUAL ACCESS
Aim To foster and facilitate world wide
communication in the legal academic world, in the
legal professional sector, in business world and
in public administration services to citizens
Obstacles
1) Complexity and richness of each legal language
2) Differences between legal concepts inherent to
the diverse national legal systems
System-bound nature of legal terminology
Example - In England a mortgagee becomes a
conditional owner of the property mortgaged to
him, but not its possessor - In Spain, in France
the hypothécaire gains neither ownership nor
possession of the mortgaged property unless he
enforces the mortgage.
7
CROSS LANGUAGE RETRIEVAL OF LEGAL INFORMATION
  • Querying and retrieving multi-language documents
    involves problems of query translation
  • Especially in legal domain, a word in a native
    query language can be ambiguous
  • A word can have different translations in a
    target language, each corresponding to a legal
    category in the target legal system

8
QUERY EXAMPLE
  • Italian user query
  • Give me back all the documents related to dolo

Italian system
English system
fraud (private law)
dolo
Documents related to dolo
Documents related to fraud
Documents related to malice
Ambiguous word
malice (criminal law)
Query contextualization is a key issue for a
focused multi language document retrieval.
9
THE 3rd PHASE SEMANTIC QUERYING
  • Assigning high quality metadata ( DCsubject
    category of law)
  • Adoption of DC metadata
  • Query contextualization according to a
    category of law
  • Exploitation of a controlled legal vocabulary
    allowing query
  • interpretation based on approximation

10
DC METADATA ANDRETRIEVAL PRECISION
DC metadata (in particular dcsubject) allow
mainly to enhance the Precision in the retrieval
Evidence
Criminal Procedure
dcdescription
Evidence
Evidence
dcsubject
Criminal Procedure
Civil Procedure
11
LEGAL VOCABULARY AND RETRIEVAL RECALL
Legal Controlled Vocabulary allows to enhance
mainly the Recall in the retrieval
Evidence
NO DOC
Criminal Procedure
dcdescription
Evidence
witnesses
Related terms (from acontrolled vocabulary)
Proof Criminal Invest.
Witness
Criminal Procedure
dcsubject
Criminal Procedure
12
THE 4th PHASE LINKING LEGAL SOURCES
  • URL Subset of URI that identify resources via a
    representation of their primary access mechanism
    (e.g., their network location). NO ANYMORE
  • URN an unambiguous identifier, allowing the
    references to be expressed in a stable way,
    independently of document physical location

Goal a system of references based on a
resolution mechanism (RDS Resolver Discovery
Service) able to retrieve the corresponding object
regardeless of resources physical location.
13
THE URN-NIR STANDARD FOR LEGISLATION
  • The URN-NIR is defined as a combination of
    elements, according to a specific grammar
  • Examples
  • Decree of Ministry of Finance of 20.12.99
    urnnirministero.finanzedecreto1999-12-20
  • Communications and Health Inter-ministerial
    Regulation, 9 September 1998
  • urnnirministero.comunicazioniministero.saluter
    egolamento1998-09-09

- Parser Available on-line, automatically
detects references within laws - Resolution
service Resolves URNs into URLs
14
DEVELOPMENT OF A STANDARD FOR LEGAL LITERATURE
  • Many identifiers available (ISBN, ISSN, BICI
    SICI DOI)
  • But
  • an intelligent, unique, self-explanatory standard
    for bibliographic materials has not been
    developed so far.
  • Approach
  • Agreement on a common document identification
    technique and a standard data scheme
  • A cooperative layer for the availability of
    common services and shared information

15
GRAMMAR OF THE STANDARD FOR IDENTIFYING LEGAL
LITERATURE
Grammar of URN-Legal Doctrine reflects the
features of legal doctrine material
Example Sorge, C. and Bergfelder, M. (2004),
Signatures by Electronic Agents A Legal
Perspective, In Cevenini, C. (editor), The Law
and Electronic Agents Proceedings of the LEA
workshop, Bologna, June 2004, pp. 141-153
urnlldsorgebergfeldersignatures.by.electronic.
agents.a.legal.perspective200406the.law.and.ele
ctronic.agents.proceeding.of141_at_inproceedings
The URN can be created automatically from the
bibliographic citation and be easily adopted by
all providers of legal doctrine.
16
SERVICES OF OPENURL RESOLVER
  • Search authors in other databases
  • Check how many citations
  • Look for authors email

Which library owns the printed review?
Approriated Full Text
If full-text not available, Document delivery
service?
17
WHAT IS NEEDED
  • From the point of view of users there is need
    for
  • Knowledge based tools for supporting semantic
    searching
  • Persistent identifiers for reliable access to
    digital objects in a context sensitive way.
  • Navigation across legislation, case law and
    literature materials

18
FEDERATED DATA
  • Data of the prototype can be divided into two
    classes
  • Structured data
  • Web documents

19
Metadata Approach
  • Different data sources are described using common
    metadata scheme (DC Metadata XML encoded, as
    target bridging format.)
  • Integration and uniform view on different data
    sources
  • Implementation of search facilities based on
    those standards.
  • Metadata for structured resources
  • Proprietary metadata mapping towards
  • Dublin Core simple
  • DCMI Cite (journal articles)
  • Metadata for Web documents
  • specific module generating a subset of DC
    metadata, without imposing on data providers to
    adopt a predefined format

20
The architecture of the federation system
Portal
Index
Selection of metadatafrom structured data
providers (OAI-PMH)?
Indexer
Selection ofdocuments of interest from web sites
Service Provider
Metadata
DC-Qualified
DC-XML
Web
HTML pages
harvester
records
focused crawler
Automatic metadata
generator
DC mapping
Data Providers
Publisher
Library
DOGI
Catalog
catalog
WWW Documents
Structured Data Repositories
21
Structured Resource Management the Open
Archives Initiative (OAI)
  • Interoperability among structured resources
    through Metadata Exchange
  • OAI Protocol for Metadata Harvesting (OAI-PMH)
  • Data Providers are repositories that expose
    structured metadata through OAI-PMH
  • Service Provider uses OAI-PMH to harvest
    metadata
  • OAI-PMH is a set of six verbs that are invoked
    within HTTP.

22
(No Transcript)
23
OAI-PMH verbs
  • Identify http//localhost8080/OaiDataProvider/s
    ervlet?verbidentify
  • ListMetadataFormathttp//localhost8080/OaiDataPr
    ovider/servlet?verbListMetadataFormatsmetadataPr
    efixoai_dc
  • ListSetshttp//localhost8080/OaiDataProvider/ser
    vlet?verbListSets
  • ListIdentifiershttp//localhost8080/OaiDataProvi
    der/servlet?verbListIdentifiersmetadataPrefixoa
    i_dc
  • ListRecordshttp//localhost8080/OaiDataProvider/
    servlet?verbListRecordsmetadataPrefixoai_dc
  • GetRecordhttp//localhost8080/OaiDataProvider/se
    rvlet?verbGetRecordidentifieroaidogi.ittig.cnr
    .it/1999Z0529metadataPrefixoai_dc

24
Selection and Harvesting ofStructured Resources
Metadata
  • Selection
  • Bibliographic repositories in libraries
  • UDC (Universal Decimal Classification)
  • DDC (Dewey Decimal Classification)
  • 34X notations plus a subset of additional
    classes
  • Harvesting
  • Structured repositories are adapted as OAI Data
    Providers
  • OAI - Protocol of Metadata Harvesting (OAI-PMH)

25
The architecture of thefederation system
Portal
Index
Selection of metadatafrom structured data
providers (OAI-PMH)?
Indexer
Selection ofdocuments of interest from web sites
Service Provider
Metadata
DC-Qualified
DC-XML
Web
HTML pages
harvester
records
focused crawler
Automatic metadata
generator
DC mapping
Data Providers
Publisher
Library
DOGI
Catalog
catalog
WWW Documents
Structured Data Repositories
26
A Focused-Crawler to SelectLegal Web Documents
  • The exploration of the web started from a subset
    of sites of interest, selected by a group of
    legal experts
  • Then we have considered a policy aimed at
    following the hyperlinks with a high probability
    of leading to documents of interest
  • Probability is obtained by means of an automatic
    classifier on a set of words in the vicinity of
    the hyperlink.

27
Metadata for Web Documents
  • Web documents metadata implementation is key
    issue to improve search facilities
  • In literature different approaches have been
    proposed
  • Reliable mapping of metadata originally included
    in web documents, combined with the collaboration
    of the authors
  • Automatic metadata generation on the basis of
    keywords supplied by the authors

28
Automatic DC Metadata Generation for Web
Documents
  • Our approach
  • automatic generation of a DC metadata subset
  • it supports the intellectual activity in
    organizing web documents.
  • Main criteria used for DC metadata generation
  • properties of the documents (URL as
    dcidentifier)
  • mapping between particular tags (lttitlegt html-tag
    for dctitle)
  • html-metatags (as meta description for
    dcdescription).
  • Particular attention to document classification
    (dcsubject). We have tested
  • Naive Bayes
  • Multiclass Support Vector Machine (Vapnik,
    Crammer and Singer)
  • machine learning approaches for document
    classification.

29
C0 C1 Cn-1
Naive Bayes Classifier
P(c0 / dj)
Class Ranking
P(c1 / dj)
Web document dj
...
P(cn-1 / dj)
dj (w0j, , wkj,)
Bag of words wkj is a function of the kth word
frequency in document dj
where
Naive hypothesiswords in a document
occur independently of each other given the
class
Bayes Theorem
30
Support Vector Machine (SVM)
  • Given a training set of positive (S) and
    negative (S-) examples, SVM determines the
    surface si which divides S from S- with the
    widest distance
  • Linear case

S




-

-
S-

-

-
-
-
-
-
-
-
si
31
Multiclass Support Vector Machine (MSVM)
32
Document Representation
  • A document is represented by a vector of term
    weights dj (w1, , wT) and three different
    types of weights have been tested
  • Binary weights (presence/absence)
  • Term frequency weight (tf)
  • TF-IDF weight (tfidf, penalizes terms occuring in
    many different documents, being less
    disciminative)
  • Pre-processing to increase statistical qualities
    of terms
  • Stemming (reduction of terms to their
    morphological root)
  • Stopwords (deletion of very frequent terms)
  • Digits and non alphanumeric characters
    represented by a unique special character

33
The Experiments
  • Data set of 2478 documents from Web sites of
    interest, belonging to 11 classes (categories of
    law)
  • c0 Environmental law c6 International law
  • c1 Administrative law c7 Labour law
  • c2 Constitutional law c8 Criminal law
  • c3 Ecclesiastic law c9 Private law
  • c4 European law c10 Taxation law
  • c5 Computer law

34
Train and Test of the Classifier
  • Training
  • All the 2478 examples have been used to train the
    classifier
  • Test
  • Two experiments to calculate
  • Train accuracy
  • LOO (Leave-One-Out) accuracy
  • (Test of the classifier generalization
    capability)

35
Train Accuracy
36
Test of the MSVM classifierusing the LOO approach
LOO accuracy 74.7
37
The architecture of thefederation system
Portal
Index
Selection of metadatafrom structured data
providers (OAI-PMH)?
Indexer
Selection ofdocuments of interest from web sites
Service Provider
Metadata
DC-Qualified
DC-XML
Web
HTML pages
harvester
records
focused crawler
Automatic metadata
generator
DC mapping
Data Providers
Publisher
Library
DOGI
Catalog
catalog
WWW Documents
Structured Data Repositories
38
User Access Modalities
  • Advanced searchMetadata-Based Document Querying
    (MBDQ)
  • Simple searchKeyword (KBDQ) Category
    (CBDQ)

Based Document Querying
  • Both query modalities can be implemented using a
    Legal Vocabulary
  • Semantics and Legal Vocabulary are exploited to
    enhance precision and recall in retrieval

39
Conclusions andFuture Developments
  • Portal to Italian Legal Literature is the
    result of a federative architecture for
    integrating structured repositories and web
    documents in a unique point of access
  • A Dublin Core metadata approach has been used to
    give a uniform view on different data
  • The federation system combines
  • the harvesting of structured data using OAI-PMH
  • the gathering and automatic qualification of web
    documents using a machine learning approach
  • CLIR facilities are expected to be studied, for
    developing user interface facilities based on a
    multilingual ontological support.

40
Announcement
  • On the basis of the experiences of legislative
    standard projects (NIR, Metalex, Akomantoso,
    LexDania, eLaw)
  • Workshop on Legislative XML
  • Within Jurix 2007 Conference (http//www.jurix2007
    .org)
  • December 15th, 2007, Leiden University, The
    Netherlands
  • Deadline submission position papers 23rd
    November 2007
  • Topics
  • Unique identification of (parts of) sources of
    law, URI and URN
  • Lowest grain size of identifiable elements in
    sources of law
  • What set of metadata should be part of an
    interchange format
  • Ontologies for legislation

41
Thank you for your attention !
peruginelli_at_ittig.cnr.it francesconi_at_ittig.cnr.it
Write a Comment
User Comments (0)
About PowerShow.com