Decision Making with Uncertainty and Data Mining - PowerPoint PPT Presentation

1 / 41

About This Presentation

Title:

Decision Making with Uncertainty and Data Mining

Description:

... B. Mareschal (Belgium) basically a workable ELECTRE PROMETHEE I: partial order PROMETHEE II: full ranking GAIA: graphical (concordance analysis) ... – PowerPoint PPT presentation

Number of Views:223

Avg rating:3.0/5.0

Slides: 42

Provided by: CBA478

Category:

more less

Transcript and Presenter's Notes

Title: Decision Making with Uncertainty and Data Mining

1
Decision Making with Uncertainty and Data Mining

David L. Olson University of Nebraska
Desheng Wu University of Science Technology
of China
ADMA05 Wuhan, China, 22-24 July 2005

2
Decision Making under Uncertainty

Uncertainty exists in data
Imprecise data
Missing data
Human subjectivity
Fuzzy set theory
A means to reflect uncertainty
Grey related analysis (interval vague)
A type of fuzzy set data

3
Monte Carlo Simulation

Analytic models preferred
But simulation needed if
High levels of uncertainty make analytic models
too messy to calculate
High levels of complexity make analytic models
intractable

4
Fuzzy Simulation

Fuzzy input often expressed in trapezoidal form
Minimum, range of most likely, maximum
Triangular, interval special cases
Can be analyzed through Monte Carlo

5
Fuzzy Distribution Forms

Trapezoidal

Triangular

Interval

6
Grey Related Analysis

Deng 1982
Means to incorporate uncertainty
Incomplete or unknown elements
Interval numbers
Standardize through norms
Transform index values through product operations
Minimize distance to ideal, max from nadir
Simple, practical
Dont require large sample sizes, nonparametric

7
Demonstration MCDM

MultiCriteria Decision Making
Modern decision making complex
Need to balance tradeoffs among conflicting
criteria (attributes objectives goals)
Fuzzy MCDM
Alternative scores on each criterion uncertain
Measures of weights vary across group members

8
Implementations of Fuzzy Multiattribute Idea

Fuzzy theory
DuBois Prade 1980
Rough sets
Pawlak 1982
Grey sets
Interval analysis Moore 1966 1979
Deng 1982
Vague sets Gau Buehrer 1993
Probability theory
Pearl 1988

9
PROMETHEE

J.P. Brans, P. Vincke, B. Mareschal (Belgium)
basically a workable ELECTRE
PROMETHEE I partial order
PROMETHEE II full ranking
GAIA graphical (concordance analysis)

10
criteria scales

I -0 if indifferent or worse, 1 if better
II -0 if not better by parameter q, 1 if
III -d is degree better than alternative
0 if not better by parameter q
d/p if between q p, 1 if dgtp
IV -step 0 if dltq .5 if qltdltp 1 if dgtp
V - slope
VI - normal

11
Promethee Criteria

II INTERVAL
III TRIANGULAR
V TRAPEZOIDAL
Promethee doesnt use value function
But demonstrates the incorporation of fuzzy input
into MCDM

12
Demo Model

Group Decision
Conservative, Liberal, Business
Energy Options
S1 Nuclear
S2 Coal
S3 Conservation
S4 Import
Criteria
C1 Cost (minimize)
C2 Pollution (miniimize)
C3 Risk of catastrophe (minimize)
C4 Energy Independence (maximize)

13
Weights for each group memberTrapezoidal (grey
related)
C1 Cost C2 Pollution C3 Risk C4 Independent
Conservative 0.4, 0.5, 0.7, 0.8 0, 0, 0.05, 0.1 0, 0.03, 0.05, 0.15 0.05, 0.1, 0.15, 0.25
Liberal 0.05, 0.1, 0.15, 0.2 0.2, 0.4, 0.5, 0.6 0.2, 0.3, 0.4, 0.6 0.03, 0.05, 0.1, 0.15
Business 0.25, 0.27, 0.29, 0.3 0.12, 0.15, 0.2, 0.25 0.16, 0.2, 0.25, 0.3 0.25, 0.3, 0.35, 0.4
14
Cost Scores for each group memberTrapezoidal
S11 Nuclear S12 Coal S13 Conserve S14 Import
Conservative 0, 0.05, 0.1, 0.2 0.3, 0.4, 0.5, 0.7 0.6, 0.75, 0.85, 0.9 0.6, 0.7, 0.75, 0.8
Liberal 0.3, 0.5, 0.6, 0.8 0.5, 0.6, 0.7, 0.9 0.6, 0.7, 0.85, 0.95 0.6, 0.7, 0.8, 0.9
Business 0.4, 0.5, 0.6, 0.7 0.7, 0.75, 0.85, 0.9 0.8, 0.9, 0.95, 1.0 0.75, 0.8, 0.85, 0.9
15
MethodWu, Olson, Liang

Use grey related analysis
Inputs are uncertain
Use alpha-cut method to convert trapezoidal into
interval
Simulate
Very complex preference model
Know distribution of uncertainty
Possibility that different alternatives may turn
out to be preferred

16
Simulation Output
Nuclear Coal Conserve Import
Conservative 0 0 0.99 0.01
Liberal 0.07 0.34 0.39 0.20
Business 0.24 0.36 0.19 0.21
Consensus 0.02 0.18 0.47 0.07
17
Monte Carlo Simulation of Grey Related Data

Given interval data
Draw uniform random number
Assume value that proportion from minimum to
maximum
Do this for every interval number
These become crisp numbers for this sample
Calculate outcomes
Value
Get probabilistic picture of outcomes in complex
system involving uncertainty (grey related
intervals)

18
DemonstrationOlson Wu

Hiring decision
Multiple criteria, Six applicants
Criteria
C1 Experience in business
C2 Experience in function
C3 Education
C4 Leadership
C5 Adaptability
C6 Age
C7 Aptitude for Teamwork

19
Alternative Performance Matrix
C1-bus C2-funct C3-educ C4-lead C5-adapt C6- age C7-team
Antonio .65-.85 .75-.95 .25-.45 .45-.85 .05-.45 .45-.75 .75-1.0
Fabio .25-.45 .05-.25 .65-.85 .30-.65 .30-.75 .05-.25 .05-.45
Alberto .45-.65 .20-.80 .65-.85 .50-.80 .35-.90 .20-.45 .75-1.0
Fernand .85-1.0 .35-.75 .65-.85 .15-.65 .30-.70 .45-.80 .35-.70
Isabel .50-.95 .65-.95 .45-.65 .65-.95 .05-.50 .45-.80 .50-.90
Rafaela .65-.85 .15-.35 .45-.65 .25-.75 .05-.45 .45-.80 .10-.55
20
Grey Related Weights
Criteria Weights
C1 Experience-Business 0.20-0.35
C2 Experience-Job Function 0.30-0.55
C3 Educational Background 0.05-0.30
C4 Leadership Capacity 0.25-0.50
C5 Adaptability 0.15-0.45
C6 Age 0.05-0.30
C7 Aptitude for Teamwork 0.25-0.55
21
Grey Related data

Weights interval
Scores interval
Used Grey Related model to identify best for each
simulation run
Best average weighted distance to reference point
Reflect both min to ideal, max from nadir
Ran 1,000 replications for each of 10 seeds

22
Probabilities of Best
Anton Fabio Alberto Fernand Isabel Rafaela
Crisp Grey - - - - X -
Interval avg 0.358 0 0.189 0.047 0.410 0
min 0.336 0 0.168 0.040 0.384 0
max 0.393 0 0.210 0.053 0.429 0
Trapezoidal 0.354 0 0.189 0.044 0.409 0
min 0.328 0 0.171 0.035 0.382 0
max 0.381 0 0.206 0.051 0.424 0
23
Implications

Crisp Grey Related
Isabel is the best choice
Antonio very close
Alberto, Fernando not far back
SIMULATION
Isabels probability of being best is 0.41
Antonio 0.35, Alberto 0.19, Fernando 0.05
Fabio Rafaela never won
Simulation provides better picture

24
Simulation of Grey Related Data in Data Mining

Decision tree analysis (PolyAnalyst)
Real credit card data
1,000 observations (900 train 100 test)
140 default, 860 no problem
65 available explanatory variables (used 26)
Due to imbalance, initial models degenerate
Called all test cases OK
Differential cost models also degenerate
Called all test cases default

25
Fuzzified Data

Of 26 explanatory variables
5 binary
1 categorical
20 continuous
Fuzzified into 3 categories each
Case by case, roughly equally sized categories

26
Decision Tree Models

Minimum support minimum of 1
PolyAnalyst allowed
Optimistic split of criteria
Pessimistic split of criteria
Different decision tree model each run

27
Continuous Data Output

Varied degree of perturbation (uncertainty)
Continuous Data
Many models overlapping
Three unique decision trees
Used a total of 8 explanatory variables
Categorical Data
Four unique decision trees
Used a total of 7 explanatory variables

28
Continuous Model 1

Bal/Pay ratio lt 6.44 NO
Bal/Pay ratio 6.44
Utilization lt 1.54 Default
Utilization 1.54
AvgPayment lt 3.91 NO
AvgPayment 3.91 Default

29
Continuous Model 2

Bal/Pay ratio lt 6.44 NO
Bal/Pay ratio 6.44 Default

30
Continuous Model 3

Bal/Pay ratio lt 6.44 NO
Bal/Pay ratio 6.44
Utilization lt 1.54 Default
Utilization 1.54
AvgRevolvePay lt 2.28 Default
AvgRevolvePay 2.28 NO

31
Categorical Model 1

Bal/Pay ratio high
CreditLine high
CalcIntRate I mid NO
CalcIntRate I NOT mid Default
CreditLine NOT high Default
Bal/Pay ratio NOT high NO

32
Categorical Model 2

Bal/Pay ratio high
CreditLine low
ChangeLine mid
PurchBal low Default
PurchBal NOT low NO
ChangeLine low NO
ChangeLine high Default
CreditLine high
CalcIntRate I mid NO
CalcIntRate I NOT mid Default
CreditLine mid Default
Bal/Pay ratio NOT high NO

33
Categorical Model 3

Bal/Pay ratio high Default
Bal/Pay ratio NOT high NO

34
Categorical Model 4

Bal/Pay ratio high
CreditLine low
ChangeLine mid
PurchBal low Default
PurchBal NOT low NO
ChangeLine low NO
Residence 0 Default
Residence 1 or 2 NO
ChangeLine high Default
CreditLine high
CalcIntRate I mid NO
CalcIntRate I NOT mid Default
CreditLine mid Default
Bal/Pay ratio NOT high NO

35
Continuous 1 Coincidence matrix
Model 0 Model 1
Actual 0 43 16 59
Actual 1 14 27 41
57 43 0.70
36
Simulation Output Continuous 1(Crystal Ball
test set accuracy)
37
Continuous 1

Simulation accuracy of 100 observations, 1000
simulation runs
perturbation -0.25,0.25 0.67-0.73
perturbation -0.50,0.50 0.65-0.74
perturbation -1,1 0.62-0.75
perturbation -2,2 0.58-0.74
perturbation -3,3 0.57-0.74
perturbation -4,4 0.56-0.75

38
Mean Model AccuracyMeasured on Test Set
Crisp 0.25 0.50 1.00 2.00 3.00 4.00
Con1 0.70 0.70 0.70 0.68 0.67 0.66 0.65
Con2 0.67 0.67 0.67 0.67 0.67 0.66 0.66
Con3 0.71 0.71 0.70 0.69 0.67 0.67 0.66
CON 0.693 0.693 0.690 0.680 0.670 0.667 0.657
Cat1 0.70 0.70 0.68 0.67 0.66 0.66 0.65
Cat2 0.70 0.70 0.70 0.69 0.68 0.67 0.67
Cat3 0.70 0.70 0.70 0.69 0.69 0.68 0.67
Cat4 0.70 0.70 0.70 0.69 0.68 0.67 0.67
CAT 0.700 0.700 0.700 0.688 0.678 0.670 0.665
39
Inferences