Title: Bayesian%20Network
1Bayesian Network
- David Grannen
- Mathieu Robin
- Micheal Lynch
- Sohail Akram
- Tolu Aina
2Bayesianism is a controversial but increasingly
popular approach of statistics that offers many
benefits, although not everyone is persuaded of
its validity
3- Bayesians Networks based on a statistical
approach presented by a mathematician, Thomas
Bayes in 1763. -
- This is an approach for calculating
probabilities among several variables that are
causally related but for which the relationships
can't easily be derived by experimentation. - Bayes formula provides the mathematical tool
that combines prior knowledge with current data
to produce a posterior distribution
4 It most likely seemed to be a complicated
formula that looked something like this P(ab)
L(ba)P(a) / L(ba)P(a) L(bnot a)P( not a)
Following medical example, we have a patient
who is concerned about his/her chances of
experiencing a heart attack. Historical data that
we have Population experiences heart attacks
20 Smokers experience heart attacks 90
(of all) Without experience of a heart attack
smokers 60 P(heart attack smoker)
L(smoker heart attack)Prior(heart attack) /
L(smoker heart attack)Prior(heart attack)
L(smoker no heart attack)Prior(no heart attack)
or P(heart attack smoker) (90 20) /
(90 20) (60 80) P(heart attack
smoker) 27
5- Bayesian networks are complex diagrams that
organize the body of knowledge in any given area
by mapping out cause-and-effect relationships
among key variables and encoding them with
numbers that represent the extent to which one
variable is likely to affect another. - This approach allows scientists to combine new
data with their existing knowledge or expertise.
6- In the late 1980 on the basis of work of Judea
Pearl, a professor of computer science at UCLA, - AI researchers discovered that Bayesian networks
offered an efficient way to deal with the lack or
ambiguity of information that has hampered
previous systems. - Bayesian networks provide "an overarching
graphical framework" that brings together diverse
elements of AI and increases the range of its
likely application to the real world
7Bayesian applications
8- Decision-making using Bayesian methods has many
applications in software applications. Best-known
example is Microsoft's Office Assistant .When a
user calls up the assistant, Bayesian methods are
used to analyse recent actions in order to try to
work out what the user is attempting to do, with
this calculation constantly being modified in the
light of new actions. - Microsoft is the most aggressive in exploiting
Bayesian approach. The company offers a free Web
service that helps customers diagnose printing
problems with their computers and recommends the
quickest way to resolve them. Another Web service
helps parents diagnose their children's health
problems.
9- Scott Musman, a computer consultant in Arlington,
Va., recently designed a Bayesian network for the
Navy that can identify enemy missiles, aircraft
or vessels and recommend which weapons could be
used most advantageously against incoming
targets. - General Electric is using Bayesian techniques to
develop a system that will take information from
sensors attached to an engine and, based on
expert opinion built into the system as well as
vast amounts of data on past engine performance,
pinpoint emerging problems
10Representation of Graphical Models
- Graphical models are graphs in which nodes
represent random variables. - A Bayesian Network is kind of directed graphical
model , which takes into account the
directionality of the arcs. (arrows between
nodes) - Advatage ofa directed graphical model is that one
can regard an arc from A to B as indicating that
A causes'' B..
A
B
11 Graphical Models 2
- Along with Graph , it is necessary to specify the
parameters of the model. - For a directed model, we must specify the
Conditional Probability Distribution (CPD) at
each node. - If the variables are discrete, this can be
represented as a table (CPT), which lists the
probability that the child node takes on each of
its different values for each combination of
values of its parents.
12Example wet Grass
13Example Wet grass
- Event grass is wet 2 causes Rain or sprinkler.
- From table Pr(W true) Strue, R False0 0.9
, each row sums to 1.0 so Pr(W false Strue ,
R false) 0.1 - Developing Inference from the Bayesian networks
14Inference
We observe the grass is wet- 2 causes sprinkler
or rain .. Which is more likely ???
Pr(S1W1) S Pr(S1, W1) / Pr(W1)
0.2781/0.6 Pr(S1W1) S Pr(R1, W1) / Pr(W1)
0.4581/0.6
Normalizing Pr(W1) 0.6471
15Inference 2
- Pr(S1 W1) 0.2781/0.6471 0.429
- Pr((R1W1) 0.4581 / 0.6471 0.7079
- More likely grass is wet because its raining!!
- Example given is bottom up Bayes Network from
effects to causes. Top down reasoning also
possible using example above we can deduce
probability grass is wet given that its cloudy.
16Inference (cont.)
- Inference is concerned with, how can we use
graphical models to efficiently answer
probabilistic queries? - Uses Bayes thoerem
- P(BA) odds P(AB) / 1 P(AB)
- A prior probability is based on previously
observed data - Conditional probability of the form P(BA)
17Scenario
- Apartment with a smoke detector
- Smoke detector near bathroom
- Taking shower often triggers detector (smoke
detectors detect stream)
18Scenario (2)
- B (burn dinner)
- O (plan to go out ) A
(smoke alarm) - S (take shower)
-
- F( Electrical
Fire)
19Bayes theorem
- Bayesian P(BA)
- odds P(AB) / 1 P(AB)
- Likelihood(AB) odds(B) /
- 1 (Likelihood(AB) odds(B))
- Likelihood(AB) P(B) / P(AB') P(B')
20Bayes theorem (2)
- Conditional probabilities specify the degree of
belief in some proposition or propositions based
on the assumption that some other propositions
are true. - Therefore the theory has no meaning without prior
resolution of the probability of these antecedent
propositions.
21Approch
- Top down
- The probability an event will occur given it a
prior probability - Bottom up
- Reasoning which starts from effect and tries
to determine the causes
22types of inference
- (a) Predictive - a can cause b
- (b) Diagnostic - b is evidence of a
- (c) Intercasual - a and b can cause c
- a explains c so its evidence
against b - (explaining away,Berkson's paradox, or
"selection bias")
a
a
a
b
b
b
c
23Example
- The a priori probability of a burglary B is
0.0001. - The conditional probability of an alarm A given a
burglary is Pr(AB)
24Example (2)
- Burglary No Burglary
- --------------------
- Alarm 0.95 0.01
- --------------------
- No Alarm 0.05 0.99
- --------------------
- What is value of Pr(BA)?
25Example (3)
- Pr(BA) odds(BA) / 1 odds(BA)
- Where odds(BA) Likelihood(AB)
odds(B) - P(AB) / P(AB') / Pr(B) /
P(B) - 0.95 / 0.01 0.0001/ .9999 0.0095
- Pr(BA) 0.0095 / 1.0095 0.00941
- An alarm implies that , burglary is 94
times - more likely than a priori
26Bayesian Learning
Sources. A Tutorial on Learning Bayesian
Networks by David Heckerman MSR-TR-95-06 Lear
ning Bayesian Networks from Data by Nir Friedman
and Moises Goldszmidt from Berkeley and SRI
International
27The easier side to Bayesian Learning
Chorus In the Theory we can build a sample, With
Convergeance surely guarenteed, But beware of
autocorrelations, Or it will take forever to
succeed! Verse 4 When it runs aint it
thrillin To the last Iteration. It frolics and
plays throughout n-space Walkin in a Bayesian
Wonderland Ending Random walkin in a Bayesian
Wonderland.
28In perspective
29Where Learning enters the arena
- Bayesian Networks Summarise as follows as
- Efficient representations of probability
distributions - Local Models
- Independence
- Effective representations of Probability
Distributions for - Computing posterior probabilities
- Computing most probable instantiation
- Decision making
- But there is more i.e. Statistical Induction -gt
Learning
30The Learning Process
- Done by
- Encode existing expert knowledge in a Bayesian
Network - Use a database to update this knowledge
creating one or more new Bayesian Networks - Results in
- Refinement of original knowledge
- Sometimes the identification of new distinctions
and relationships - Robust to the errors in knowledge of experts
31Similar to Neural Net Learning
- But with the following advantages
- We can easily encode expert knowledge
increasing efficiency and accuracy of learning - Nodes and Arcs in learned Bayesian Networks often
correspond to recognizable distinctions and
causal relationships - Thus it is easier to understand and interpreted
the knowledge encoded in the representation
32Bayesian Learning The Problem
Known Structure Unknown Structure
Complete Data Statistical parametric estimation Discrete optimization over structures
Incomplete Data Parametric optimization Combined
33Why Learning
- Feasibility of Learning
- Availability of data and computational power
- Need for Learning
- Characteristics of current systems and processes
- Defy closed form analysis
- gt need data driven approach for characterisation
- Scale and change fast
- gt need continuous automatic adaptation
- Examples
- Communications networks, illegal activities, the
brain, economic markets
34Why Learn a Bayesian Network
- Combine knowledge engineering and statistical
induction - Covers the whole spectrum from knowledge
intensive model construction to data intensive
model induction - More than a learning black-box
- Explanation of outputs
- Interpretability and modifiability
- Algorithms for decision making, value of
information diagnosis an repair - Causal representation , reasoning and discovery
- i.e. does smoking cause cancer
35A Simple Example
- Wang presents a simple example in 2 using only
the first four operations, which I reproduce in
abbreviated form here. He begins with the
following 8 statements - robin ( feathered-creature lt1.00, 0.90gt
- bird ( feathered-creature lt1.00, 0.90gt
- wan ( bird lt1.00, 0.90gt
- wan ( swimmer lt1.00, 0.90gt
- gull ( bird lt1.00, 0.90gt
- gull ( swimmer lt1.00, 0.90gt
- row ( bird lt1.00, 0.90gt
- row ( swimmer lt0.00, 0.90gt
- (Note that giving a statement with a frequency of
0.00 simply means that it is not true.) The
system is then asked to evaluate the truth value
of "robin ( swimmer". It comes to the following
conclusions, in this order - robin ( bird lt1.00, 0.45gt (1 and 2, abduction)
- bird ( swimmer lt1.00, 0.45gt (3 and 4,
induction) - obin ( swimmer lt1.00, 0.20gt (9 and 10,
deduction) - bird ( swimmer lt1.00, 0.45gt (5 and 6,
induction) - bird ( swimmer lt1.00, 0.62gt (10 and 12,
revision) - ird ( swimmer lt0.00, 0.45gt (7 and 8, induction)
- bird ( swimmer lt0.67, 0.71gt (13 and 14,
revision) - robin ( swimmer lt0.67, 0.32gt (9 and 15,
deduction) - Note that NARS actually comes to a great many
more conclusions than this, but the ones shown
are the ones that actually lead toward the
conclusion. Also, NARS reports the conclusions at
both lines 11 and 16, since the guesswork
involved necessarily means it needs to be able to
change its mind, as it were. The final
conclusion, given at line 16, means that two
thirds of the relevant evidence indicates that a
robin can swim, but that this conclusion has
somewhat less than one third of the possible
degree of confidence both of these items, of
course, indicate the need for more information
-).
36A Comparison with another Learning Technique
37Current Topics
- Time
- Beyond discrete time and beyond fixed rate
- Causality
- Removing the assumptions
- Hidden Variables
- Where to place them and how many
- Model Evaluation and active learning
- What parts of it are suspect and what and how
much data is needed.
38Decision Theory (1)
- What happens when it is time to convert beliefs
into actions? - Decision Theory Probability Theory Utility
Theory
39Decision Theory (2)
- Decompose a multi-attribute utility fonction into
a sum of local utilities - Each term is a node, which has as parents
- The random variables on which it depends
- The action (control) nodes
- The resulting graph is an influence diagram
- Finally, compute the optimal sequence of actions
to perform to maximize expected utility
40Applications (1)
- QMR-DT a decision-theoretic reformulation of the
Quick Medical Reference model
41Some Applications
- Biostatistics Medical Research Council Bayesian
Inferance Using Gibbs Sampling BUGS) - Data Analysis NASA (AutoClass)
- Collaborative filtering Microsoft (Microsoft
Belief Networks - MSBN) - Fraud Detection ATT
- Speech recognition UC Berkeley
42Applications (2)
- Real-Time decision NASAs system Visa
- Genetics linkage analysis
- Speech recognition
- Data compression density estimation
- Coding turbocodes
43Applications MS Office
- MS office assistant The Lumière Project
- Source The Lumière Project Bayesian User
Modeling for Inferring the Goals and Needs of
Software Users, by E. Horvitz, J. Breese, D.
Heckerman, D. Hovel, K. Rommelse (Microsoft
Research)
44MS Office (2)
- User behaviour is monitored to determine
Assistant actions. Examples - Search
- Focus of attention
- Introspection
- Undesired effects
- Inefficient command sequences
- Domain-specific syntactic and semantic content
45MS Office (3)
- Portion of a Bayesian Net for infering the
likehood that a user needs assistance,
considering profile info and recent activity