Title: ChiMerge Discretization
1ChiMerge Discretization
Sample F K
1 1 1
2 3 2
3 7 1
4 8 1
5 9 1
6 11 2
7 23 2
8 37 1
9 39 2
10 45 1
11 46 1
12 59 1
- Statistical approach to Data Discretization
- Applies the Chi Square method to determine the
probability of similarity of data between two
intervals.
2ChiMerge Discretization Example
Intervals
Sample F K
0,2
1 1 1
2,5
2 3 2
5,7.5
- Sort and order the attributes that you want to
group (in this example attribute F).
3 7 1
7.5,8.5
4 8 1
8.5,10
5 9 1
- Start with having every unique value in the
attribute be in its own interval.
10,17
6 11 2
17,30
7 23 2
30,38
8 37 1
38,42
9 39 2
42,45.5
10 45 1
45.5,52
11 46 1
52,60
12 59 1
3ChiMerge Discretization Example
Sample F K
1 1 1
- Begin calculating the Chi Square test on every
interval
2 3 2
3 7 1
4 8 1
Sample K1 K2
2 0 1 1
3 1 0 1
total 1 1 2
5 9 1
6 11 2
7 23 2
8 37 1
Sample K1 K2
3 1 0 1
4 1 0 1
total 2 0 2
9 39 2
10 45 1
11 46 1
12 59 1
4ChiMerge Discretization Example
Sample K1 K2
2 0 1 1
3 1 0 1
total 1 1 2
E11 (1/2)1 .05 E12 (1/2)1 .05 E21
(1/2)1 .05 E22 (1/2)1 .05
X2 (0-.5)2/.5 (0-.5)2/.5 (0-.5)2/.5
(0-.5)2/.5 2
Sample K1 K2
3 1 0 1
4 1 0 1
total 2 0 2
E11 (1/2)2 1 E12 (0/2)2 0 E21 (1/2)2
1 E22 (0/2)2 0
X2 (1-1)2/1(0-0)2/0 (1-1)2/1(0-0)2/0 0
Threshold .1 with df1 from Chi square
distribution chart merge if X2 lt 2.7024
5ChiMerge Discretization Example
Intervals
Chi2
Sample F K
1 1 1
2
- Calculate all the Chi Square value for all
intervals
2 3 2
2
0
0
2
- Merge the intervals with the smallest Chi values
0
2
1
37
8
2
9 39 2
2
0
0
6ChiMerge Discretization Example
Intervals
Chi2
Sample F K
0,2
1 1 1
2
2,5
2 3 2
4
3 7 1
4 8 1
5,10
5 9 1
5
6 11 2
10,30
7 23 2
3
30,38
8 37 1
2
38,42
9 39 2
4
10 45 1
42,60
11 46 1
12 59 1
7ChiMerge Discretization Example
Intervals
Chi2
Sample F K
1 1 1
0,5
2 3 2
1.875
3 7 1
4 8 1
5,10
5 9 1
5
6 11 2
10,30
7 23 2
1.33
8 37 1
30,42
9 39 2
1.875
10 45 1
42,60
11 46 1
12 59 1
8ChiMerge Discretization Example
Intervals
Chi2
Sample F K
0,5
1.875
5,10
3.93
6 11 2
7 23 2
10,30
8 37 1
9 39 2
3.93
10 45 1
42,60
11 46 1
12 59 1
9ChiMerge Discretization Example
Intervals
Chi2
Sample F K
1 1 1
2 3 2
0,10
3 7 1
- There are no more intervals that can satisfy the
Chi Square test.
4 8 1
2.72
5 9 1
6 11 2
7 23 2
10,30
8 37 1
9 39 2
3.93
10 45 1
42,60
11 46 1
12 59 1