A New OLAP Aggregation Based on the AHC Technique - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

A New OLAP Aggregation Based on the AHC Technique

Description:

DOLAP 2004 A New OLAP Aggregation Based on the AHC Technique R. Ben Messaoud, O. Boussaid, S. Rabas da Laboratoire ERIC Universit de Lyon 2 – PowerPoint PPT presentation

Number of Views:117
Avg rating:3.0/5.0
Slides: 21
Provided by: cciDrexel
Learn more at: https://cci.drexel.edu
Category:

less

Transcript and Presenter's Notes

Title: A New OLAP Aggregation Based on the AHC Technique


1
DOLAP 2004
A New OLAP Aggregation Based on the AHC Technique
R. Ben Messaoud, O. Boussaid, S. Rabaséda
Laboratoire ERIC Université de Lyon 2 5, avenue
Pierre-MendèsFrance 69676, Bron Cedex
France http//eric.univ-lyon2.fr
2
Complex data
  • Definition
  • Data are considered complex if they are
  • Multi-formats information can be supported by
    different kind of data (numeric, symbolic, texts,
    images, sounds, videos )
  • Multi-structures structured, unstructured or
    semi-structured (relational databases, XML
    documents )
  • Multi-sources data come from different sources
    (distributed databases, web )
  • Multi-modals the same information can be
    described differently (data in different
    languages )
  • Multi-versions data are updated through time
    (temporal databases, periodical inventory )

3
General context
  • Complex data
  • Huge volumes of complex data
  • Warehousing complex data
  • OLAP facts as complex objects
  • Analyze complex data
  • Current OLAP tools arent suited to process
    complex data
  • Data mining is able to process complex data like
    images, texts, videos
  • Coupling OLAP and data mining
  • Analyze complex data on-line
  • New operator OpAC Operator of Aggregation by
    Clustering (AHC)

4
Outline
0
  • Complex data and general context
  • Related work Coupling OLAP and data mining
  • Objectives of the proposed operator
  • Formalization of the operator
  • Implementation and demonstration
  • Conclusion and future works

1
2
3
4
5
5
Related work
  • Three approaches for coupling OLAP and data
    mining
  • First approach Extending the query languages of
    decision support systems
  • Second approach Adapting multidimensional
    environment to classical data mining techniques
  • Third approach Adapting data mining methods for
    multidimensional data

6
Related work
  • These works proved that
  • Associating data mining to OLAP is a promising
    way to involve rich analysis tasks
  • Data mining is able to extend the analysis power
    of OLAP
  • Use data mining to enhance OLAP tools in order to
    process complex data
  • OpAC A new OLAP operator based on a data mining
    technique

7
Objectives
  • Classic OLAP aggregation Vs OpAC aggregation
  • Classic OLAP
  • Summarizes numerical data in a fewer number of
    values
  • Computes additive measures (Sum, Average, Max,
    Min )

Example Sales cube
8
Objectives
  • Classic OLAP aggregation Vs OpAC aggregation
  • OpAC aggregation
  • What about aggregating complex objects?
  • How to aggregate images, texts or videos with
    classic OLAP tools?
  • Complex objects are not additive OLAP measures

Example Images cube
?
9
Objectives
  • How to aggregate complex objects?
  • Using a data mining technique AHC (Agglomerative
    Hierarchical Clustering)
  • The AHC aggregates data
  • The hierarchical aspect of the AHC

10
Objectives
Images
Very high High Medium Low Very low
Very high High Medium Low Very low
Entropy
Homogeneity
11
Formalization
  • Di the ith dimension of a data cube C
  • hij the jth hirarchical level of the
    dimension Di
  • gijt the tth modality of hij
  • The set of individuals

W Ì gijt / gijt Î hij
  • The set of variables
  • Dimension retained for individuals cant generate
    variables
  • Only one hierarchical level of a dimension is
    allowed to generate variables

12
Formalization
  • Evaluation tools
  • Minimize the intra-cluster distances
  • Maximize the inter-cluster distances
  • Inter and intra-cluster inertia
  • A1 , A2 , , Ak is a partition of W
  • P(Ai) is the weight of Ai
  • G(Ai) is the gravity center of Ai

13
Formalization
  • Individuals
  • Modalities from the dimension of images
  • Variables
  • L1Normalized values of images for all possible
    modalities of the entropy dimension
  • L1Normalized values of images for all possible
    modalities of the homogeneity dimension

14
Formalization
  • Results
  • Exploits the cubes facts describing images to
    construct groups of similar complex objects
  • Highlights significant groups of objects by a
    clustering technique
  • Clusters aggregates- are defined both from
    dimensions and measures of a data cube
  • Implementation of a prototype

15
Implementation
  • Prototype
  • Data loading module
  • Connects to a data cube on Analysis Services of
    MS SQL Server
  • Uses MDX queries to import information about the
    cubes structure
  • Extract data selected by the user
  • Parameter setting interface
  • Assists the user to extract individuals and
    variables from the cube
  • Selects modalities and measures
  • Defines the clustering problem
  • Clustering module
  • Allows the definition of the clustering
    parameters like dissimilarity metric and
    aggregation criterion
  • Constructs the AHC
  • Plots the results of the AHC on a dendrogram

16
Implementation
  • Images dataset
  • 3000 images collected from the web
  • Semantic annotation Description, subject and
    theme
  • Descriptors of texture like
  • ENT Entropy
  • CON Contrast
  • L1Normalized Medium Color Characteristic
  • Three color channels RGB

17
Implementation
Demonstration
18
Conclusion
  • OpAC is a possible way to realize on-line
    analysis over complex data
  • OpAC aggregates complex objects
  • Aggregates clusters- are defined from both
    dimensions and measures of a data cube
  • Prototype available at
  • http//bdd.univ-lyon2.fr/?pagelogicielid5

19
Future works
  • The current evaluation tool may present some
    limits
  • Use other evaluation indicators to evaluate the
    quality of partitions
  • Assist user to find the best number of clusters
  • Exploit the aggregates generated by OpAC in order
    to reorganize the cubes dimensions
  • Get a new cube with remarkable regions
  • Use other data mining technique to enhance the
    OLAP power with explanation and prediction
    capabilities

20
The End
Write a Comment
User Comments (0)
About PowerShow.com