Title: Mapping Museum Metadata to the CRM in Sculpteur
1Mapping Museum Metadata to the CRM in Sculpteur
- Presented By Patrick Sinclair
- IAM, University of Southampton,
UKpass_at_ecs.soton.ac.uk
2Contents
- Sculpteur Overview
- The Semantic Layer
- Based on the CRM ontology
- Our experiences with the CRM within Sculpteur
- Navigating the semantic layer
3SCULPTEUR
3 Millon Euros, 3 year project (May 2002 May
2005)
Uffizi Gallery
4Problem Domain
5Existing Systems
6Overview of Sculpteur
- Integrated concept, metadata and content based
browsing and retrieval of museum information - Use of semantic web technologies will enable
- Enhanced navigation and exploration
- Automatic data augmentation
- Flexible systems integration and interoperability
- Content based retrieval
- Image content analysis
- 3D object analysis
- E-learning
7Searching by content
- Find objects that have a pattern / colour similar
to this image - Find vases that are a similar shape to this one
- Find paintings with cracks like this
- What type of mould was used to make this figurine?
8The Semantic Layer
- Motivation
- The mapping process
- The concept browser
9Museums are rich in information
- Museums already have information about
- Works of art (title, medium, state of
restoration) - Creators of works of art (name, date of birth)
- Dates (artist date of birth, date of photograph)
- Locations (where a work of art is stored,
artists country of origin) - Digital representations (images, models, angle of
lighting, full or sub image) - Museums are creating large numbers of 2D images
and 3D models and movies
10Questions using concepts
- Which artists have lived, worked or were born in
Normandy? - Which countries do I have 18th century paintings
from? - What is the oldest Polynesian wooden object?
- Find images of the reverse of paintings which
have undergone restoration
11Navigating the Semantic Layer
12Data Augmentation
- Information about museum collections is sometimes
incomplete e.g. - Dates when paintings were created
- Places where artists where born
- Augmentation agents will automatically obtain
missing information from the semantic web - Not much semantic web around!
- Initial agent uses natural language processing on
unstructured web pages
13Benefits
- New ways to search
- Scientists, Public, Curators, Teachers
- Graphical presentation
- Searching by example
- Easy to use
- Sharing of information between museums
- Searching collections owned by different
organisations - Support for multiple languages
- Users can still use traditional and familiar
approaches
14First Prototype
15CIDOC Conceptual Reference Model (CRM)
- Ontology for documenting artefacts within the
museum and gallery domain - 81 classes and 139 properties
- 10 years of development
- Draft ISO standard
16CIDOC CRM in Sculpteur
- Comprehensive description of art domain
- Model based on formal principles of ontology
- Established and constantly being revised
- Enhances communication performance with partners
- Multilingual functionality possible
17Ontology Mapping
CIDOC Conceptual Reference Model
Knowledge Base
Instances
Actors
Events
Objects
Background knowledge / Authorities
Derived knowledge
Sources and metadata
Source ICS-Forth
18Mapping Subset
- Museums identify areas of interest
- Subset of CRM ontology provided to each museum
partner depending on the concepts present in the
legacy metadata
19Mapping in Sculpteur
- Held an ontology workshop for museum partners
- Determine ontological scopes
- Select domains of interest
- Map metadata to CRM
- Museums had trouble performing mapping
- Technical partners attempted to create initial
mapping - Contacted a CRM expert Patrick Le Boeuf
20Mapping Example
Table Field Type
Local art object metadata Width Integer
DB Schema
Master Table Primary key Connection key to secondary table
Art object Art object id Local object id
CRM
Domain Link Range
E84.Information Carrier P43F.has dimension E54.Dimension
E54.Dimension P2F.has type E55.Type
E54.Dimension P91F.has unit E58.Measurement Unit
E54.Dimension P90F.has value E60.Number
- Tool generates subset of CRM ontology from
mapping table
21Extending the CRM
- Different degree of specificity between CRM and
museum datasets - Extended E.55 Type to model activities not
covered by CRM - Added concepts and relationships
- E.g. painting, has_related_artists
22Extending the CRM (cont.)
- Handling mapping of multiple metadata fields to
one CRM concept -
- Modifications performed in Protégé
- Ontology represented in RDF Schema
Metadata Domain Link Range
MarksInscriptionsAuthor E84.Information_Carrier P65F.shows_visual_item E37.Mark
E37.Mark P105F.right_held_by E39.Actor
E39.Actor P131F.is_identified_by E82.Actor_Appellation
AssociatedName E84.Information_Carrier P62F.depicts E7.Activity
E7.Activity P11F.had_participant E39.Actor
E39.Actor P131F.is_identified_by E82.Actor_Appellation
23Populating Concept Browser
- Tools for populating concept browser with
instance information - Collect instances from museum databases
- Structure information according to ontology
- Script is used to generate instances from museum
database - Instances stored as RDF
- Generated at installation
- Used within Concept Browser
24Mapping in Sculpteur
- Complex process!
- Close collaboration between museums, technical
partners and external experts. - Expert assistance crucial for creating and
validating mappings. - Iterative and involved
- Collaboration has been time and effort consuming,
but essential for achieving an accurate and
meaningful mapping
25Mapping problems
- Getting to grips with the CRM
- Understanding the legacy data
- Mapping the process of photographing objects
- Mapping is complex for museum partners - simple
fields map to long chains - E.g. School name
E84 Information Carrier P108B was produced by E12 Production Event
E12 Production Event P14 carried out by E21 Person
E21 Person P107B is current or former member of E74 Group
E74 Group P2 has type E55 Type (value "school")
E74 Group P131 is identified by E82 Actor Appellation
26Mapping Problems (cont.)
- Complex to determine relevant mapping
- Many of the fields in the database are empty
- Several fields can contain similar information,
but from different perspectives - Inconsistencies in legacy information missing
entries, merged fields, misspelling, excessive
creativity of database maintainer
27Unresolved issues
- Merged fields
- e.g. materials field can also describes creation
event - Some metadata fields contain complex/non-atomic
values that express relationships between records - Data not handled in CRM
- e.g. genealogical data in Uffizi database
28Generating Instances Problems
- Each museum has own metadata legacy system and
format - Imported into Sculpteur mySQL relational database
- Metadata delivered in various formats (XML, CSV,
database dumps) and requires pre-processing - Includes format transformation and
cleaning/consistency checking - Manual process that is then encapsulated within a
software tool (customised for each partner) so
process can be repeated against larger datasets - Tools not ready to be deployed at user sites, so
requires metadata and mappings in advance
29Concept Browser
- Show the ontology information graphically
- Support novice users browsing domain
- Extending TouchGraph
- Open source
- Provides a dynamic graph layout interface
- Jena toolkit for ontology and instance information
30Concept Browser (cont.)
- Concept Browser
- Show associated relations when a user selects a
specific concept - Collapse concepts in the sub/super hierarchies
- Explicitly display sub/super relations between
relations - Allow users to view instances for a given class
- Display control lists as a dynamic graph
31Navigation Facilities
- Provide navigation tools to help the user browse
the ontology - Buttons
- Who - E39.Actor
- What - E18.Physical Stuff
- When - E50.Date and E52.Time-span
- Where - E53.Place
- How - E7.Activity
- List of Concept Names
- Tree View
32Property View
33Querying for Instances
34Controlled Lists
35Concept browser problems
- Extending Touchgraph
- e.g. properties require overlapping edges
- Usability
- First evaluation users had trouble
- RDF slow to load, too many instances!
- First prototype
- Only basic information stored
- Unconnected to images and other rich information
in legacy system
36Control List Problems
- Control lists contain inconsistent values
- E.g. top and toop
- Ideally chosen by museum collaboration required
to properly understand semantics of the data - Inferring may result in new instances not in
legacy database - May require further inference
- e.g. use a geographical ontology for location
- Process difficult to automate dependent on
structure and semantics of metadata, and these
semantics are not explicit in the data
37Concept Browser next steps
- Evaluation results
- Ontology simplification
- Ontology shortcuts
- Interested in suggestions!
38Demo Videos
39More Information
- Try Sculpteur yourself!
- http//piltdownman.it-innovation.soton.ac.uk/
- Sculpteur User Interest Group
- http//www.sculpteurweb.org/html/sig.htm