Title: COLLATE
1COLLATE Collaboratory for Annotation, Indexing
and Retrieval of Digitized Historical Archive
MaterialDELOS International Cooperation
Workshop, May 30, 2003
- Ingo Frommholz
- Fraunhofer IPSI, Darmstadt
- frommholz_at_ipsi.fraunhofer.de
- http//ipsi.fraunhofer.de/
2Digital Libraries in Cultural Heritage
- Valuable historic document collections exist, but
are scattered in national archives - Sources mostly not available online
- Difficult-to-use database referencing systems
- Lack of content-based indexing access
- Valuable expert domain knowledge exists, but
mostly inaccessible to externals - Tacit knowledge, insufficiently documented
- Professional communities lack technology support
for collaborative knowledge working
3The COLLATE Project (IST-1999-20882)
- Constructing a Collaborative Information Space
- Preserve historic documents in a distributed
multimedia repository - European historic film documentation (20ies and
30ies) - Historic film censorship (legal docs,
applications decisions, correspondence, etc.),
Press material (articles), Photos (stills,
portraits) film posters, Digital film/video
fragments - XML metadata (cataloguing content indexing)
- Ensure accessibility
- Work environment for content indexing
annotation - Content- and context-based retrieval
- Evaluate acceptability
- Preservation case studies by film experts
- Empirical studies of real-life user behavior
4Partners
- Content providers / pilot users
- Deutsches Filminstitut DIF, Frankfurt, Germany
- Filmarchiv Austria, Vienna, Austria
- Národnà Filmový Archiv, Prague, Czechia
- Technology developers
- Fraunhofer IPSI, Darmstadt, Germany
- University of Bari, LACAM Lab, Bari, Italy
- Sword ICT S.r.l., Bari, Italy
- Evaluation partner
- Risø National Laboratory, Systems Analysis Dept,
Denmark
5Why a Cultural Collaboratory?
- Support existing work processes in cultural
sciences - Interpretative content analysis of documents
- Reconstruct unity of cultural phenomena,
interlinking scattered knowledge sources - Offer new knowledge working environment
- Organize collaborative work
- Bring together divergent user communities roles
- Create enhanced cultural information services
- Raise awareness visibility of cultural archives
6Censorship / Registration Cards
7Newspaper Articles
8Conceptual Integration COLLATE-Ontology
Collate Entity
Location
Generic Level ABC-Model
Temporality
Abstraction
Actuality
Cultural Heritage Do-main Level CIDOC CRM, FRBR
Manifes-tation
Work
Situation
Action
Event
Agent
Film Archive Subdomain Level LC TGM II FIAF
Classification
Form, Genre Physical Cha-racteristics
Moving Image
Film Situation
Film Event
Film Activity
Film Agent
COLLATE Appli-cation Level Collate Keywords
Film Censorship Agent
Film Cen-sorship Activity
Film Censor-ship Event
Censorship Document
Film- and Censorship Topic
9Model of the Concept Film Life Cycle
hasParticipant
hasParticipant
cencorshipdecisionx
Directingx
hasAction
hasAction
shortedversionx
censor-shipx
filmcreationx
originalversionx
precedes
precedes
follows
has Result
involves
has Result
Filmcopy Bx
Filmcopy Ax
Work x
realizesWork
realizesWork
10System Architecture (OAIS)
11Collaboration in COLLATE
12Discourse Structures
- Discourses represent extended communication
between two or more participants in a shared
context. (Rich Sidner, 1998) - Establishing a discourse context
- Modeling discourse as interrelated nested
annotations - Annotation thread reflects scientific discourse
- Typed links (DSR) between
- Document and annotation
- Annotation of annotations
13Communication Acts Discourse Structure Relations
14Semantic Web Integration COLLATE RDF(S)
15Document Retrieval in COLLATE
- For a query q, a ranking of documents is
returned. Therefore, a retrieval weight r is
calculated for each document. - Documents are ranked according to descending
retrieval weights - The retrieval is based on the documents metadata
(given by film scientists or extracted from the
digitized documents) and on the annotation
thread.
16Context-based Retrieval in COLLATE
- In COLLATE, we deal with the discourse context.
- A document is seen in the light of its
interpretations - We also consider at which point of the discourse
a statement is made and what relation exists
between the statement and the entity this
statement refers to. - Example Consider the query for all censorship
decisions made for political reasons.
17Query censorship decisions for political
reasonsMetadata Only
... ltcontrolled_keywordgt obscene
actions lt/controlled_keywordgt ...
I think the reasons mentioned here are not the
real reasons. I see a political background as
the main reason.
Keyword
0.01
Inter- pretation
Counterargument
Document Interpretation
I disagree. There were a lot of similar decisions
with the same argumentation. Of course, there
might be a political background, but I think this
is not the main reason in this case.
Cataloguing
... ltassessors_chairmangt Oberregierungsrat Dr
Becker Beisitzer Justizrat Dr.
Rosenthal... lt/assessors_chairmangt ...
... ltfilmtitlegt Kuhle Wampe lt/filmtitlegt ...
18Query censorship decisions for political
reasonsMetadata Interpretation
... ltcontrolled_keywordgt obscene
actions lt/controlled_keywordgt ...
I think the reasons mentioned here are not the
real reasons. I see a political background as
the main reason.
Keyword
0.32
Inter- pretation
Counterargument
Document Interpretation
I disagree. There were a lot of similar decisions
with the same argumentation. Of course, there
might be a political background, but I think this
is not the main reason in this case.
Cataloguing
... ltassessors_chairmangt Oberregierungsrat Dr
Becker Beisitzer Justizrat Dr.
Rosenthal... lt/assessors_chairmangt ...
... ltfilmtitlegt Kuhle Wampe lt/filmtitlegt ...
19Query censorship decisions for political
reasonsAnalysis of Discourse Structure
Relations
... ltcontrolled_keywordgt obscene
actions lt/controlled_keywordgt ...
I think the reasons mentioned here are not the
real reasons. I see a political background as
the main reason.
Keyword
0.19
Inter- pretation
Counterargument
Document Interpretation
I disagree. There were a lot of similar decisions
with the same argumentation. Of course, there
might be a political background, but I think this
is not the main reason in this case.
Cataloguing
... ltassessors_chairmangt Oberregierungsrat Dr
Becker Beisitzer Justizrat Dr.
Rosenthal... lt/assessors_chairmangt ...
... ltfilmtitlegt Kuhle Wampe lt/filmtitlegt ...
20COLLATE User Interface
21Current State
- A first prototype was delivered to the archives
and is used by them - A second prototype will be delivered soon,
introducing discourse structure relations and
advanced collaboration features to the users - A third prototype will contain context-based
retrieval
22Outlook
- Evaluate collaborative approach and context-based
retrieval - Apply COLLATE technology in other domains?
23More information? http//www.collate.de