Chapter 11: The Data Survey - PowerPoint PPT Presentation

About This Presentation
Title:

Chapter 11: The Data Survey

Description:

8 variables about different car properties (brand, weight, cubic inch size, production year etc. ... Different brands have different levels of certainty. Case 1: ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 38
Provided by: jsah4
Category:
Tags: brands | car | chapter | data | survey

less

Transcript and Presenter's Notes

Title: Chapter 11: The Data Survey


1
Chapter 11 The Data Survey
  • Supplemental Material

Jussi Ahola Laboratory of Computer and
Information Science
2
Contents
  • Information theoretic measures and their
    calculation
  • Features used in the data survey
  • Cases

3
Good references
  • Claude E. Shannon and Warren Weawer The
    Mathematical Theory of Communication
  • Thomas M. Cover and Joy A. Thomas Elements of
    Information Theory
  • David J.C. MacKay Information Theory,
    Probability and Neural Networks

4
Entropy
  • Measure of information content or uncertainty
  • H(x) 0, with equality iff pi1 for one i
  • max H(x), when pi is same for every i

5
Calculating entropy
6
Calculating entropy
BIN 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
X 2 4 7 5 3 4 4 4 3 3 1
P(X) 0.05 0.1 0.175 0.125 0.075 0.100 0.100 0.100 0.075 0.075 0.025
BIN 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Y 1 1 2 2 2 1 7 7 9 5 3
P(Y) 0.025 0.025 0.05 0.05 0.05 0.025 0.175 0.175 0.225 0.125 0.075
MEASURE ACTUAL NORMALIZED
Hmax(X)Hmax(Y) 3.459 1
H(x) 3.334 0.964
H(y) 3.067 0.887
7
Joint and conditional entropies and mutual
information
  • Joint entropy H(X,Y) describes information
    content of the whole data
  • Conditional entropy H(XY) measures the average
    uncertainty that remains about x when y is known
  • Mutual information I(XY)H(X)-H(XY) measures
    the amount of information that y conveys about x,
    or vice versa

8
Calculating conditional entropy
BIN 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
P(y) 0.025 0.025 0.05 0.05 0.05 0.025 0.175 0.175 0.225 0.125 0.075
P(xy) 1 1 0.5 0.5 0.5 1 0.143 0.143 0.111 0.2 0.333
BIN 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
P(x) 0.025 0.025 0.05 0.05 0.05 0.025 0.175 0.175 0.225 0.125 0.075
P(yx) 1 1 0.5 0.5 0.5 1 0.143 0.143 0.111 0.2 0.333
MEASURE ACTUAL NORMALIZED
H(X,Y) 5.322 1
H(XY) 2.255 0.676
H(YX) 1.988 0.648
I(XY) 1.079 0.3518
9
Relationships of entropies
H(X,Y)
H(X)
H(Y)
H(XY)
I(XY)
H(YX)
10
Features
  • Entropies calculated from raw input and output
    signal states
  • Signal H(X), H(Y) Indicates how much entropy
    there is in one data set input/output signal
    without regard to the output/input signal(s),
    ratio sH/sHmax

11
Features
  • Channel H(X),H(Y) Measures the average
    information per signal at the input/output of the
    communication channel, ratio cH/sHmax
  • Channel H(XY),H(YX) Reverse/forward entropy
    measures how much information is known about the
    input/output given the output/input, ratio
    cH()/sHmax

12
Features
  • Channel H(X,Y) The average uncertainty over the
    data set as whole, ratio cH(X,Y)/cH(X)cH(Y)
  • Channel I(XY) The amount of mutual information
    between input and output, ratio cI(X,Y)/cH(Y)

13
Case 1 CARS
  • 8 variables about different car properties
    (brand, weight, cubic inch size, production year
    etc.)
  • Three subtasks predicting origin, brand and
    weigth

14
Case 1 CARS
15
Case 1 CARS
16
Case 1 CARS
17
Case 1 CARS
  • Entropic analysis confirmed a number of
    intuitions about the data that would be difficult
    to obtain by other means
  • Only a simple model is needed

18
Case 1 CARS
19
Case 1 CARS
20
Case 1 CARS
21
Case 1 CARS
  • Requires a complex model and still the prediction
    cant be done with complete certainty
  • Different brands have different levels of
    certainty

22
Case 1 CARS
23
Case 1 CARS
24
Case 1 CARS
25
Case 1 CARS
  • Some form of generalized model has to be built
  • The survey provides the information needed for
    designing the model

26
Case 2 CREDIT
  • Included information from a credit card survey
  • Objective was to build an effective credit card
    solicitation program

27
Case 2 CREDIT
28
Case 2 CREDIT
29
Case 2 CREDIT
30
Case 2 CREDIT
31
Case 2 CREDIT
32
Case 2 CREDIT
33
Case 2 CREDIT
  • It was possible determine that a model good
    enough to solve the problem could be built
  • This model should be rather complex, even with
    the balanced data set

34
Case 3 SHOE
  • Data was about the behaviour of buyers of a
    running shoe manifacturer
  • Objective was to predict and target customers who
    fit the profile as potential members in their
    buyers program

35
Case 3 SHOE
36
Case 3 SHOE
37
Case 3 SHOE
  • A moderately good, but quite complex, model could
    be built
  • Not useful predictor in the real-world, because
    of the frequently introduced new shoe styles
Write a Comment
User Comments (0)
About PowerShow.com