Towards Data Mining Without Information on Knowledge Structure - PowerPoint PPT Presentation

About This Presentation
Title:

Towards Data Mining Without Information on Knowledge Structure

Description:

Towards Data Mining Without Information on Knowledge ... Interpretation/ Evaluation. Models. Transformed. Data. Preprocessed. Data. Target Data. Knowledge ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 27
Provided by: iri5
Category:

less

Transcript and Presenter's Notes

Title: Towards Data Mining Without Information on Knowledge Structure


1
Towards Data Mining Without Information on
Knowledge Structure
Alexandre Vautier, Marie-Odile Cordier and René
Quiniou Université de Rennes 1 INRIA Rennes -
Bretagne Atlantique
  • Wednesday, September 19th 2007

2
Usual KD Process
  • User needs
  • A data mining task
  • Domain knowledge

Interpretation/ Evaluation
Data Mining
Knowledge
Transformation
Models
Preprocessing
Selection
Transformed Data
Preprocessed Data
Target Data
Data
3
Usual KD Process
û
  • User needs
  • A data mining task
  • Domain knowledge

Interpretation/ Evaluation
Data Mining
Knowledge
Transformation
Models
Preprocessing
Selection
Transformed Data
Preprocessed Data
Target Data
Data
What can a user extract from data without domain
knowledge ?
4
Application context Network Alarms
  • Represent network alarms
  • Understand network behavior
  • Detect new DDoS attacks
  • An alarm is composed of
  • A directed link between two IP addresses
  • A date
  • A severity (low,med,high) (related to the link
    rate)

5
Application context Network Alarms
  • Represent network alarms
  • Understand network behavior
  • Detect new DDoS attacks
  • An alarm is composed of
  • A directed link between two IP addresses
  • A date
  • A severity (low,med,high) (related to the link
    rate)

6
Application context Network Alarms
Generalized links M1 192.168.2.1 ! , !
192.168.2.5, Sequences M2 1.5.5. ! 2.2.3.
gt 2.2.3. ! 1.2.3.4 , Clustering on date and
severity M3 11/01/0511/03/05, low,
11/07/0511/15/05, high
Models
Alarms
Data Mining Algorithms
7
Objectives
  • Goal search models that fit the given data
  • Current assumption the user has sufficient
    knowledge to
  • define the type of model
  • choose the relevant DM algorithm
  • Our proposition alleviate the current assumption
    by
  • executing automatically DM algorithms to extract
    models from data
  • evaluating the resulting models in a generic
    manner to propose to the user the best suited
    model(s)

8
Framework
  • DM algorithm specifications
  • Data Specification
  • Unification of specifications

Model extraction Generic evaluation Model
ranking
9
Schemas for specification
  • Enhanced algebraic specifications (Types,
    operations and equations)
  • Category theory Mac Lane 1942
  • Sketch Ehresmann 1965
  • Use specification inheritance

10
Data specificationNetwork Alarm Schema
  • Node a type
  • Edge
  • A function
  • A relation
  • Green dotted edge projection) Cartesian product
  • Red dashed edgeinclusion) union

11
Data specificationNetwork Alarm Schema
  • Node a type
  • Edge
  • A function
  • A relation
  • Green dotted edge projection) Cartesian product
  • Red dashed edgeinclusion) union

12
DM Algorithm specification Generalized edges
13
DM Algorithm specification Generalized edges
DM algorithm
Model type
Covering relation
14
Schema unification
?
15
Schema unification
Data Type
?
Abstract Data Type
16
Unification of Schema
Data Type
?
Abstract Data Type
17
Framework
DM algorithm specifications Data
Specification Unification of specifications
Model extraction Generic evaluation Model
ranking
18
Generic evaluation
  • Compare different kinds of model
  • Inspired by Kolmogorov complexityThe complexity
    of an object x is the size s(p) of the shortest
    program p that outputs x executed on a universal
    machine f
  • Cf(x) min s(p) f(p) x

19
Generic evaluation
  • Complexity of data d in a schema S relatively to
    a model m (c M D)
  • complexity of
  • K(d,m,S)
  • k(M) the model structure
  • k(D) the data structure
  • k(c) the covering relation
  • k(mM) the model
  • k(dm,c,D) the data knowing

20
Path Indexing Covering Relation Decomposition
Null Decomposition
c(m)
c M D
m
d
M
D
k(dm,c,D) k(dc(m)) k(d\c(m)D)
21
Path Indexing Covering Relation Decomposition
Null Decomposition
c(m)
c M D
m
d
M
D
k(dm,c,D) k(dc(m)) k(d\c(m)D)
Decomposition relying on relation composition
c s t M D
t M A
s A D
d
m
t(m)
M
A
D
c(m) s t(m)
22
Path Indexing Covering Relation Decomposition
Null Decomposition
c(m)
c M D
m
d
M
D
k(dm,c,D) k(dc(m)) k(d\c(m)D)
Decomposition relying on relation composition
s(a)
c s t M D
a
t M A
s A D
d
m
t(m)
M
A
D
c(m) s t(m)
k(dm, s t ,D) k(at(m)) k(ds(a))
k(d\s(a)D)
23
Experiments
  • Extraction of clusters, generalized edges, and
    sequences
  • Dataset 10.000 alarms
  • Duration 400 seconds (without DM algorithm
    duration)
  • 6 operational algorithms
  • Experiments on datasets generated by models
  • Network alarm from real network

24
Discussions
  • Unification
  • Exponential in time with respect to the number of
    nodes in a schema
  • Generic evaluation
  • Linear in time and space
  • Adapt the evaluation method
  • User defined
  • According to a model visualization
  • According to local data instead of global data

25
What do schemas bring to Data Mining ?
  • Describe data and DM algorithms with a common
    language
  • Allow to unify data structure with DM algorithms
    input
  • Provide a way to compute the model complexity
    relatively to a type in a schema
  • Provide a way to compute the data complexity
    relatively to
  • A model
  • A covering relation and its decomposition
  • Are implementable in an efficient manner

26
Towards Data Mining Without Information on
Knowledge Structure
Alexandre Vautier, Marie-Odile Cordier and René
Quiniou INRIA Rennes - Bretagne
Atlantique Université de Rennes 1
  • Thank you !
Write a Comment
User Comments (0)
About PowerShow.com