Title: Introduction to Bayesian Belief Nets
1Introduction toBayesian Belief Nets
- Russ Greiner
- Dept of Computing Science
- Alberta Ingenuity Centre for Machine Learning
- University of Alberta
- http//www.cs.ualberta.ca/greiner/bn.html
21996
1990
1980
3Â
Â
4Motivation
- Gates says LATimes, 28/Oct/96
- Microsofts competitive advantages is its
expertise in Bayesian networks
- Current Products
- Microsoft Pregnancy and Child Care (MSN)
- Answer Wizard (Office, )
- Print Troubleshooter
- Excel Workbook Troubleshooter
- Office 95 Setup Media Troubleshooter
- Windows NT 4.0 Video Troubleshooter
- Word Mail Merge Troubleshooter
5Motivation (II)
- US Army SAIP (Battalion Detection from SAR, IR
GulfWar)
- NASA Vista (DSS for Space Shuttle)
- GE Gems (real-time monitor for utility
generators) - Intel (infer possible processing problems from
end-of-line tests on semiconductor chips) - KIC
- medical sleep disorders, pathology, trauma care,
hand and wrist evaluations, dermatology,
home-based health evaluations - DSS for capital equipment locomotives,
gas-turbine engines, office equipment
6Motivation (III)
- Lymph-node pathology diagnosis
- Manufacturing control
- Software diagnosis
- Information retrieval
- Types of tasks
- Classification/Regression
- Sensor Fusion
- Prediction/Forecasting
7Outline
- Existing uses of Belief Nets (BNs)
- How to reason with BNs
- Specific Examples of BNs
- Contrast with Rules, Neural Nets,
- Possible applications of BNs
- Challenges
- How to reason efficiently
- How to learn BNs
8 9Objectives Decision Support System
- Determine
- which tests to perform
- which repair to suggest
- based on costs, sensitivity/specificity,
- Use all sources of information
- symbolic (discrete observations, history, )
- signal (from sensors)
- Handle partial information
- Adapt to track fault distribution
10Underlying Task
- Situation Given observations O1v1, Okvk
- (symptoms, history, test results, )
- what is best DIAGNOSIS Dxi for patient?
- Seldom Completely Certain
11Underlying Task, II
- Situation Given observations O1v1, Okvk
- (symptoms, history, test results, )
- what is best DIAGNOSIS Dxi for patient?
- Challenge How to express Probabilities?
12How to deal with Probabilities
- Sufficient atomic events
- for all 21N values u ? T, F, vj ?T, F
P( Dx u, O1v1,..., Ok vk,, ONvN )
- But even if binary Dx, 20 binary obs.s. ?
gt2,097,000 numbers!
13Problems with Atomic Events
- Representation is not intuitive
- ? Should make connections explicit
- use local information
-
P(Jaundice Hepatitis), P(LightDim
BadBattery),
- Too many numbers O(2N)
- Hard to store
- Hard to use
- Must add 2r values to marginalize r
variables - Hard to learn
- Takes O(2N) samples to learn 2N parameters
- ? Include only necessary connections
14 15Hepatitis Example
H Hepatitis J Jaundice B (positive) Blood
test
- Want P( H1 J0, B1 )
- , P(H1 B1, J1), P(H1 B0,J0),
16Encoding Causal Links
- Node Variable
- Link Causal dependency
17Encoding Causal Links
P(H1)
0.05
H
h P(B1 Hh)
1 0.95
0 0.03
h b P(J1h , b )
1 1 0.8
1 0 0.8
0 1 0.3
0 0 0.3
B
J
- P(J H, B0) P(J H, B1) ? J, H ! ?
P( J H, B) P(J H) - J is INDEPENDENT of B, once we know H
- Dont need B? J arc!
18Encoding Causal Links
P(H1)
0.05
H
h P(B1 Hh)
1 0.95
0 0.03
h P(J1h )
1 0.8
1
0 0.3
0
B
J
- P(J H, B0) P(J H, B1) ? J, H ! ?
P( J H, B) P(J H) - J is INDEPENDENT of B, once we know H
- Dont need B? J arc!
19Encoding Causal Links
P(H1)
0.05
H
h P(B1 Hh)
1 0.95
0 0.03
h P(J1h )
1 0.8
0 0.3
B
J
- P(J H, B0) P(J H, B1) ? J, H ! ?
P( J H, B) P(J H) - J is INDEPENDENT of B, once we know H
- Dont need B? J arc!
20Sufficient Belief Net
P(H1)
0.05
h P(B1 Hh)
1 0.95
0 0.03
h P(J1h )
1 0.8
0 0.3
- Requires P(H1) known
- P(J1 H1) known
- P(B1 H1) known
- (Only 5 parameters, not 7)
21Factoring
- B does depend on J
- If J1, then likely that H1 ? B 1
N.b., B and J ARE correlated a priori P(J B
) ? P(J) GIVEN H, they become uncorrelated
P(J B, H) P(J H)
22Factored Distribution
- Symptoms independent, given Disease
-
- ReadingAbility and ShoeSize are dependent,
- P(ReadAbility ShoeSize ) ? P(ReadAbility )
- but become independent, given Age
- P(ReadAbility ShoeSize, Age ) P(ReadAbility
Age)
23Naïve Bayes
- Classification Task
- Given O1 v1, , On vn
- Find hi that maximizes (H hi O1 v1,
, On vn)
24Naïve Bayes (cont)
- Normalizing term
- (No need to compute, as same for all hi)
- Easy to use for Classification
- Can use even if some vjs not specified
25Bigger Networks
P(I1)
0.20
P(H1)
0.32
g lt P(H1g ,lt )
1 1 0.82
1 0 0.10
0 1 0.45
0 0 0.04
h P(J1 h )
1 0.8
0 0.3
h P(B1 h )
1 0.98
0 0.01
- Intuition Show CAUSAL connections
- GeneticPH CAUSES Hepatitis Hepatitis CAUSES
Jaundice
26Belief Nets
- DAG structure
- Each node ? Variable v
- v depends (only) on its parents
- conditional prob P(vi parenti ?0,1,?
)
- v is INDEPENDENT of non-descendants,
- given assignments to its parents
27Less Trivial Situations
- N.b., obs1 is not always independent of obs2
given H - Eg, FamilyHistoryDepression causes
MotherSuicide and Depression - MotherSuicide causes Depression (w/ or w/o
F.H.Depression)
- Here, P( D MS, FHD ) ? P( D FHD ) !
- Can be done using Belief Network,
- but need to specify
- P( FHD ) 1
- P( MS FHD ) 2
- P( D MS, FHD ) 4
28Example Car Diagnosis
29MammoNet
30ALARM
- A Logical Alarm Reduction Mechanism
- 8 diagnoses, 16 findings,
31Troup Detection
32ARCO1 Forecasting Oil Prices
33ARCO1 Forecasting Oil Prices
34Forecasting Potato Production
35Warning System
36Extensions
- Find best values (posterior distr.) for
- SEVERAL (gt 1) output variables
- Partial specification of input values
- only subset of variables
- only distribution of each input variable
- General Variables
- Discrete, but domain gt 2
- Continuous (Gaussian x ?i bi yi for
parents Y ) - Decision Theory ? Decision Nets (Influence
Diagrams) Making Decisions, not just assigning
probs - Storing P( v p1, p2,,pk) General CP Tables
0(2k) Noisy-Or, Noisy-And, Noisy-Max Decision
Trees
37Outline
- Existing uses of Belief Nets (BNs)
- How to reason with BNs
- Specific Examples of BNs
- Contrast with Rules, Neural Nets,
- Possible applications of BNs
- Challenges
- How to reason efficiently
- How to learn BNs
38Belief Nets vs Rules
- Both have Locality
- Specific clusters (rules / connected nodes)
WHY? Easier for people to reason CAUSALLY even
if use is DIAGNOSTIC
- BN provide OPTIMAL way to deal with
- Uncertainty
- Vagueness (var not given, or only dist)
- Error
- Signals meeting Symbols
- BN permits different directions of inference
39Belief Nets vs Neural Nets
- Both have graph structure but
- BN Nodes have SEMANTICs
- Combination Rules Sound Probability
- NN Nodes arbitrary
- Combination Rules Arbitrary
- So harder to
- Initialize NN
- Explain NN
- (But perhaps easier to learn NN from examples
only?)
- BNs can deal with
- Partial Information
- Different directions of inference
40Belief Nets vs Markov Nets
- Each uses graph structure
- to FACTOR a distribution
- explicitly specify dependencies, implicitly
independencies
- but subtle differences
- BNs capture causality, hierarchies
- MNs capture temporality
41Uses of Belief Nets 1
- Medical Diagnosis Assist/Critique MD
- identify diseases not ruled-out
- specify additional tests to perform
- suggest treatments appropriate/cost-effective
- react to MDs proposed treatment
- Decision Support Find/repair faults in complex
machines - Device, or Manufacturing Plant, or
- based on sensors, recorded info, history,
- Preventative Maintenance
- Anticipate problems in complex machines
- Device, or Manufacturing Plant, or
- based on sensors, statistics, recorded info,
device history,
42Uses (cont)
- Logistics Support Stock warehouses
appropriatelybased on (estimated) freq. of
needs, costs, - Diagnose Software Find most probable bugs,
given program behavior, core dump, source code,
- Part Inspection/Classification based on
multiple sensors, background, model of
production, - Information Retrieval Combine information from
various sources, based on info from various
agents,
General Partial Info, Sensor fusion -Classificati
on -Interpretation -Prediction -
43Challenge 1Computational Efficiency
For given BN General problem is Given
Compute If BN is poly tree, ? efficient
alg. - If BN is genl DAG (gt1 path from X to
Y) - NP-hard in theory - slow in
practice Tricks Get approximate answer
(quickly) Use abstraction of BN Use
abstraction of query (range)
O1 v1, , On vn
D
I
P(H O1 v1, , On vn)
H
J
B
44 2aObtaining Accurate BN
- BN encodes distribution over n variables
- Not O(2n) values, but only ?i 2k_i
- (Node ni binary, with ki parents)
- Still lots of values! structure ..
- ? Qualitative Information
- Structure What depends on what?
- Easy for people (background knowledge)
- But NP-hard to learn from samples
-
- ? Quantitative Information
- Actual CP-tables
- Easy to learn, given lots of examples.
- But people have hard time
Knowledge acquisition from human experts
Simple learning algorithm
45Notes on Learning
- Mixed Sources Person provides structure
- Algorithm fills-in numbers.
- Just Human Expert People produce CP-table, as
well as structure - Relatively few values really required
- Esp. if NoisyOr, NoisyAnd, NaiveBayes,
- Actual values not that important
- Sensitivity studies
46My Current Work
- Learning Belief Nets
- Model selection
- Challenging myth that MDL is appropriate criteria
- Learning performance system, not model
- Validating Belief Nets
- Error bars around answers
- Adaptive User Interfaces
- Efficient Vision Systems
- Foundations of Learnability
- Learning Active Classifiers
- Sequential learners
- Condition Based maintenance, Bio-signal
interpretation,
47 2b Maintaining Accurate BN
- The world changes.
- Information in BN may be
- perfect at time t
- sub-optimal at time t 20
- worthless at time t 200
- Need to MAINTAIN a BN over time
- using on-going human consultant
- Adaptive BN
- Dirichlet distribution (variables)
- Priors over BNs
48Conclusions
- Provide effective way to
- Represent complicated, inter-related events
- Reason about such situations
- Diagnosis, Explanation, ValueOfInfo
- Explain conclusions
- Mix Symbolic and Numeric observations
- Challenges
- Efficient ways to use BNs
- Ways to create BNs
- Ways to maintain BNs
- Reason about time
49Extra Slides
- AI Seminar
- Friday, noon, CSC3-33
- Free PIZZA!
- http//www.cs.ualberta.ca/ai/ai-seminar.html
- References
- http//www.cs.ualberta.ca/greiner/bn.html
- Crusher Controller
- Formal Framework
- Decision Nets
- Developing the Model
- Why Reasoning is Hard
- Learning Accurate Belief Nets
50References
- http//www.cs.ualberta.ca/greiner/bn.html
- Overview textbooks
- Judea Pearl, Probabilistic Reasoning in
Intelligent Systems Networks of Plausible
Inference, Morgan Kaufmann, 1988. - Stuart Russell and Peter Norvig, Artificial
Intelligence A Modern Approach, Prentice Hall,
1995. (See esp Ch 14, 15, 19.) - General info re BayesNets
- http//www.afit.af.mil80/Schools/EN/ENG/LABS/AI/B
ayesianNetworks - Proceedings http//www.sis.pitt.edu/dsl/uai.html
- Assoc for Uncertainty in AI http//www.auai.org/
- Learning
- David Heckerman, A tutorial on learning with
Bayesian networks, 1995, - http//www.research.microsoft.com/research/dtg/he
ckerma/TR-95-06.htm - Software
- General http//bayes.stat.washington.edu/almon
d/belief.html - JavaBayes http//www.cs.cmu.edu/fgcozman/Research
/JavaBaye - Norsys http//www.norsys.com/
51Decision Net Test/Buy a Car
52Utility Decision Nets
- Given c( action, state) ? R (cost function)
- Cp(a) Es c(a,s) ? s?S p(s obs) c(a, s)
- Best (immediate) action a argmina ?A Cp(a)
- Decision Net (like Belief Net) but
- 3 types of nodes
- chance (like Belief net)
- action repair, sensing
- cost/utility
- Links for dependency
- Given observations, obs, computes best action, a
- Sequence of Actions MDPs, POMDPs,
Go Back
53Decision Net Drill for Oil?
Go Back
54Formal Framework
- Always true
- P(x1, ,xn) P(x1) P(x2 x1) P (x3 x2,
x1) P (xn xn-1,,x1) - Given independencies,
- P(xk x1,,xk-1) P (xk pak) for some
pak ?x1, , xk-1 - Hence
- So just connect each y ? pai to xi ? DAG
structure
Note -Size of BN is
.
so better to use small
pai. -pai 1,,i 1 is never incorrect
but seldom minl (so hard to store,
learn, reason with,) - Order of variables can
make HUGE difference Can have pai 1
for one ordering pai i 1 for another
Go Back
55Developing the Model
- Source of information
- (Human) Expert (s)
- Data from earlier Runs
- Simulator
- Typical Process
- 1. Develop / Refine Initial Prototype
- 2. Test Prototype ? Accurate System
- 3. Deploy System
- 4. Update / Maintain System
56Develop/Refine Prototype
- Requires expert
- useful to have data
- Initial Interview(s)
- To establish what relates to what
- Expert time ½ - day
- Iterative process (Gradual refinement)
- To refine qualitative connections
- To establish correct operations
Expert presents Good Performance KE implements
Experts claims KE tests on examples (real data
or expert), and reports to Expert
Expert time 1 2 hours / week for ??
Weeks (Depends on complexity of device, and
accuracy of model)
Go Back
57Why Reasoning is Hard
- BN reasoning may look easy
- Just propagate information from node to node
P(Zt)
0.5
z P(BtZz)
t 0.0
f 1.0
z P(AtZz)
t 1.0
f 0.0
a b P(Cta,b)
t t 1.0
t f 0.0
f t 0.0
f f 0.0
A Z B P ( A t ) P ( B f ) ½
So ? P ( C t ) P ( A t, B t)
P ( A t) P( B t) ½ ½ ¼
Wrong P ( C t ) 0 !
- Need to maintain dependencies! P ( A t, B t
) P ( A t ) P ( B t A t)
Go Back
58Crusher Controller
- Given observations
- History, sensor readings, schedule,
- Specify best action for crusher
- stop immediately, increase roller speed by ?
- Best minimize expected cost
- Initially just recommendation to human operator
- Later Directly implement (some) actions
- ?Request values of other sensors?
59Approach
- For each state s
- (Good flow, tooth about to enter, )
- for each action a
- (Stop immediately, Change p7 0.32, )
- determine utility of performing a in s
- (Cost of lost production if stopped
- of reduced production efficient if continue )
- Use observations to estimate (dist over) current
states - Infer EXPECTED UTILITY of each action, based on
distr. - Return action with highest Expected Utility
60Details
- Inputs
- Sensor Readings (history)
- Camera, microphone, power-draw
- Parameter settings
- Log files, Maintenance records
- Schedule (maintenance, anticipated load, )
- Outputs
- Continue as is
- Adjust parameters
- GapSize, ApronFeederSpeed, 1J_ConveyorSpeed
- Shut down immediately
- Step adding new material
- Tell operator to look
- State CrusherEnvironment
- UncrushableThingsNowInCrusher
- TeethMissing
- NextUncrushableEntry
- Control Parameters
61Benefits
- Increase Crusher Effectiveness
- Find best settings for parameters
- To maximize production of well-sized chunks
- Reduce Down Time
- Know when maintain/repair is critical
- Reduce Damage to Crusher
- Usable Model of Crusher
- Easy to modify when needed
- Training
- Design of next generation
- Prototype for design of control, diagnostician
of other machines
Go Back
62My Background
- PhD, Stanford (Computer Science)
- Representational issues, Analogical Inference
- everything in Logic
- PostDoc at UofToronto (CS)
- Foundations of learnability, logical inference,
DB, control theory, - everything in Logic
- Industrial research (Siemens Corporate Research)
- Need to solve REAL problems
- Theory Revision, Navigational systems,
- logic is not be-all-and-end-all!
- Prof at UofAlberta (CS)
- Industrial problems (Siemens, BioTools, Syncrude)
- Foundations of learnability, probabilistic
inference
63Less Trivial Situations
- N.b., obs1 is not always independent of obs2
given H - Eg, FamilyHistoryDepression causes
MotherSuicide and Depression - MotherSuicide causes Depression (w/ or w/o
F.H.Depression)
FHD
f P(MS1 FHDf)
1 0.10
0 0.03
MS
f m P(D1 FHDf, MSm)
1 1 0.97
1 0 0.90
0 1 0.08
0 0 0.04
D
- Here, P( D MS, FHD ) ? P( D FHD ) !
- Can be done using Belief Network,
- but need to specify
- P( FHD ) 1
- P( MS FHD ) 2
- P( D MS, FHD ) 4