Title: Developing Digital Libraries
1II PGL DB CONFERENCE ORLANDO, June/2004
TecBD
PUC-Rio
Developing Digital Libraries Using Data
Warehousing And Data Mining Techniques
Cássia Blondet Baruque Rubens Nascimento Melo
2TecBD
PUC-Rio
1) INTRODUCTION 2) DATA WAREHOUSE DL ??
DW 3) DWING AND DATA LIBRARYING DL
DWing ?? DLing 4) DATA MINING IN THE ETL/OLAP
PROCESSES DLing Dming ?? IDLing 5)
CONCLUSION
3TecBD
PUC-Rio
1. INTRODUCTION
- IS THE INTERNET A DL?
- DL?
- DL DEVELOPMENT
- RELATED WORK
4TecBD
PUC-Rio
1. INTRODUCTION
- IS THE INTERNET A DL?
- DL?
- DESENVOLVIMENTO DE DL
- TRABALHOS RELACIONADOS
- one of the biggest information repositories
CHA95 - heterogeneous database
- structured, semistructured and nonstructured
data - difficult search and information retrieval
- is not a DL but may be its source
5TecBD
PUC-Rio
1. INTRODUCTION
- IS THE INTERNET A DL?
- DL?
- DESENVOLVIMENTO DE DL
- TRABALHOS RELACIONADOS
- new research area (focus on organizing and
promoting an easier access to documents on the
Web) - polemic term SCHARTZ2001 - a large object collection, in diverse digital
formats, persistent, managed and well organized
in a Catalog accessed through the Web (this
work focus on the textual format)
6TecBD
PUC-Rio
1. INTRODUCTION
- A INTERNET É UMA DL?
- DL?
- DL DEVELOPMENT
- RELATED WORK
7TecBD
PUC-Rio
1. INTRODUCTION
- Integrate the distributed Web-based multimedia
content - CORBA, mediators, agents
- Facilitate the information access
- semantic aids (ontology and data mining)
- A INTERNET É UMA DL?
- DL?
- DL DEVELOPMENT
- RELATED WORK
8TecBD
PUC-Rio
1. INTRODUCTION
- Standford Digital Libraries Technology
- Illinois Digital Library Initiative Project
- Alexandria Digital Library Project
- University of Michigan Digital Library Project
- THERE IS NOT A COMPREHENSIVE SOLUTION TO
ADDRESS THE MAIN ISSUES OF DIGITAL LIBRARY
DEVELOPMENT
- A INTERNET É UMA DL?
- DL?
- DESENVOLVIMENTO DE DL
- RELATED WORK
9TecBD
PUC-Rio
2. DATA WAREHOUSE
- DL ?? DW
- DL DWing ?? DLing
- DLing DMing ?? IDLing
10TecBD
PUC-Rio
2. DATA WAREHOUSE
11TecBD
PUC-Rio
2. DATA WAREHOUSE
- DL ?? DW (Immon)
- DL DWing ?? DLing
- DLing Dming ?? IDLing
- DW is a database
- subject-oriented
- integrated
- non volatile
- variant in time
- to support management decisions
12TecBD
PUC-Rio
2. DATA WAREHOUSE
- DL ?? DW (Immon)
- DL DWing ?? DLing
- DLing Dming ?? IDLing
- DL is a DW!
- subject-oriented
- integrated
- non volatile
- variant in time
- to support management decisions
13TecBD
PUC-Rio
3. DWING AND DATA LIBRARYING
- DL ?? DW
- DL DWing ?? DLing
- DLing DMing ?? IDLing
14TecBD
PUC-Rio
3. DWING AND DATA LIBRARYING
15TecBD
PUC-Rio
3. DWING AND DATA LIBRARYING
16TecBD
PUC-Rio
3. DWING AND DATA LIBRARYING
How to integrate complex data?
17TecBD
PUC-Rio
3. DWING AND DATA LIBRARYING
How to facilitate access to information?
18TecBD
PUC-Rio
4. DATA MINING IN THE ETL OLAP PROCESSES
- DL ?? DW
- DL DWing ?? DLing
- DLing Dming ?? IDLing
19TecBD
PUC-Rio
4. DATA MINING IN THE ETL PROCESS
20TecBD
PUC-Rio
4. DATA MINING IN THE ETL PROCESS
The documents are searched on the Web, filtered
and then selected. The Ontology is the entry
for this search and filtering processes.
21TecBD
PUC-Rio
4. DATA MINING IN THE ETL PROCESS
After being selected, the documents must be
catalogued. A data mining algorithm is applied
to the documents selected and then the Dublin
Core Metadata (bib and auth records) is
generated.
22TecBD
PUC-Rio
4. DATA MINING IN THE ETL PROCESS
Both library catalogue and library records are
loaded in the DW. Documents not authorized for
copying are not stored.
23TecBD
PUC-Rio
4. DATA MINING IN THE OLAP PROCESS
The users acesses patterns are analysed. This
analysis may cause the DL refreshing.
24TecBD
PUC-Rio
5. CONCLUSION (Web ? LOs repositories)
Data Librirying
Learning Objects Data Libryring
25TecBD
PUC-Rio
THE END