Title: Integration of complementary archaeological sources
1Integration of complementary archaeological
sources
- Martin Doerr
- Maria Theodoridou
- ICS-FORTH, Heraklion, Crete, Greece
- Kurt Schaller
- Magistrat der Stadt WienGeschäftsgruppe Kultur
und Wissenschaft Stadtarchäologie, Wien,
Austria -
2Outline
- Problem statement Working context
- Objective
- Approach
- Technical description
- Results
- Conclusion, future work
3Project VBI ERAT LVPA The Internet Tracks of the
Roman She-Wolf
- Traditional corpora
- very high quality, difficult to maintain,
difficult to search, uncorrelated to
complementary resources - New Database Projects
- varying quality, overlapping contents,
continuously updated, easy to search,
uncorrelated between each other. - Altogether
- A conglomerate of highly interrelated
archaeological sources - of overwhelming detail and volume
- Ubi-erat-lupa A European Culture 2000 Project
- An aggregation of complementary scientific
databases and corpora describing finds with
inscriptions and iconography of the Roman era - to create a body of unique archaeological
knowledge in digital form.
4VBI ERAT LVPA Objective
- creation of a global index about a set of
semi-autonomous sources for global access to the
unified knowledge - integration of complementary information under a
common ontology/schema and identification of
common elements in different sources - development of an integration algorithm that
converges to the best state of knowledge und
continuous update - creation of a research tool for formulating
queries of archaeological content to detect
contextual relationships that cannot be derived
from interpreting the sources in isolation
5Approach
- Develop a semantic network based on the CIDOC
CRM model to integrate the complementary
archaeological sources - Data, relevant to global querying over all
contents, are extracted, transformed and stored
in an RDF repository, that is incrementally
updated over time. - Integration in two phases
- source schema is intellectually interpreted in
terms of the CIDOC model - non canonical data reported to respective
source - mistakes in sources removed, quality of source
improved - actual data automatically transformed and stored
into an RDF repository - an a posteriori data cleaning process removes as
many duplicates as can be (semi-) automatically
detected
6The CIDOC CRMTop-level Entities relevant for
Integration
E55 Types
E39 Actor
E41 Appellations
refer to / identifie
affect or / refer to
E31 Document
E5 Event
7The CIDOC CRM VBI-ERAT-LVPARepository Indexing
8Complementary archaeological sources
- Stone data bases
- Lupa - 7000 archaeological records, City of
Vienna, Austria - Arachne - 40.000 archaeological records, Antike
Plastik, Cologne - Name data bases
- ONOMASTICON PROVINCIARVM EVROPAE LATINARVM (OPEL)
- Information about the amount and distribution of
Roman names in the European provinces of the
empire, City of Vienna, Austria - Epigraphic corpora
- CIL Corpus Inscriptionum Latinarum
- AE L'Année Epigraphique
- Inscriptions Clauss/Slaby University of
Frankfurt - Thesauri / Dictionaries
- TGN Getty Thesaurus of Geographic Names
- Alexandria DL Gazetteer 5.000.000 current place
names (web service) - Barrington Atlas of the Greek and Roman World
Map-by-Map Directory provides information about
every place or feature in the Atlas
9Mapping stone data bases to CIDOC-CRM
P102F.has_title
P1F.is identified_by
P2F.has_type
P106B.forms part of
P70B.is documented in
P106B.forms part of
POLUPA.5
P12B.was present at
P7F.took place at
P55F.has current location
P89F.falls within
10Mapping stone data bases to CIDOC-CRM
P65F.shows visual item
P150F shows characters
P151F has transcription
P152F has clear text
P1F.is identified by P70B.is documented in
POLUPA.5
P106B.forms part of
P106B.forms part of
11Mapping epigraphic corpora to CIDOC-CRM
P150F.shows characters
P151F.has transcription
P1F.is identified by P70B.is documented in
P106B.forms part of
P106B.forms part of
P106B.forms part of
12Mapping OPEL to CIDOC-CRM
P67B.is referred to by
P70B.is documented in
P65B.is shown by
P139F.has alternative form
P2F.has type
P106B.forms part of
P12B.was present at
P7F.took place at
13Integration Into One Resource
POLUPA.5
Stone data bases
Name data bases
Epigraphic corpora
Thesauri/Dictionaries
14Identity Problem
- Two approaches
- a) avoid taking two different items for the same
gt use local id, where uniqueness is guaranteed - b) try to find global names with a high chance
to match. - Lupa solution is a)
- We give a serial number to any new object we
insert - We use the serial number of the source database.
- Example P.O arachne.45305
- or P.O lupa.4501
- We maintain local id in the global index as valid
names and remove detected duplicates
continuously. - Cost-benefit optimization of over- and
under-identification! -
15Reactive Data Cleaning Initial Data
has title
has type
is identified by
shows visual item
shows visual item
is identified by
POARACHNE.80581
has title
16Reactive Data Cleaning Result
has title
has type
is identified by
shows visual item
is identified by
has title
17VBI ERAT LVPA Results
- A method and architecture for integration of
diverse archaeological copora on the Roman stone
monuments under the CIDOC CRM model. - We developed an efficient way for place name
recognition - We are developing a research tool suitable for
formulating queries and drawing conclusions on
archaeological data - detection of contextual relationships that cannot
be derived from interpreting the sources in
isolation - a method of identifying epigraphic references and
finds - test bed for the CIDOC CRM model - proved its
adequacy - First large scale integration project of multiple
complementary resources as a global index to the
original sources
18Future work
- integrate more data sources
- support a mechanism to visualize a source
- support an automatic mapping process so that
archaeologists will be able to maintain the
system b themselves.