FTP Search and Compare Engine - PowerPoint PPT Presentation

1 / 17

About This Presentation

Title:

FTP Search and Compare Engine

Description:

Basic Model and the Score Function. K2 ... Basic Model. The problem: to find the most probable Bayes-network structure given a database ... Basic Model ... – PowerPoint PPT presentation

Number of Views:37

Avg rating:3.0/5.0

Slides: 18

Provided by: hpg9

Learn more at: https://www.kddresearch.org

Category:

more less

Transcript and Presenter's Notes

Title: FTP Search and Compare Engine

1
K2 Algorithm Presentation
Learning Bayes Networks from Data Haipeng Guo
Friday, April 21, 2000 KDD Lab, CIS Department,
KSU
2
Presentation Outline

Bayes Networks Introduction
Whats K2?
Basic Model and the Score Function
K2 algorithm
Demo

3
Bayes Networks Introduction

A Bayes network B (Bs, Bp)
A Bayes Network structure Bs is a directed
acyclic graph in which nodes represent random
domain variables and arcs between nodes represent
probabilistic independence.
Bs is augmented by conditional probabilities, Bp,
to form a Bayes Network B.

4
Bayes Networks Introduction

Example Sprinkler

- Bs of Bayes Network the structure
5
Bayes Networks Introduction
- Bp of Bayes Network the conditional probability
season
sprinkler
Rain , Ground-moist, and Ground-state
6
Whats K2?

K2 is an algorithm for constructing a Bayes
Network from a database of records
A Bayesian Method for the Induction of
Probabilistic Networks from Data, Gregory F.
Cooper and Edward Herskovits, Machine Learning 9,
1992

7
Basic Model

The problem to find the most probable
Bayes-network structure given a database
D a database of cases
Z the set of variables represented by D
Bsi , Bsj two bayes network structures
containing exactly those variables that are in Z

8
Basic Model

By computing such ratios for pairs of bayes
network structures, we can rank order a set of
structures by their posterior probabilities.

Based on four assumptions, the paper
introduces an efficient formula for computing
P(Bs,D), let B represent an arbitrary bayes
network structure containing just the variables
in D

9
Computing P(Bs,D)

Assumption 1 The database variables, which we
denote as Z, are discrete

Assumption 2 Cases occur independently, given
a bayes network model

Assumption 3 There are no cases that have
variables with missing values

Assumption 4 The density function f(BpBs) is
uniform. Bp is a vector whose values denotes the
conditional-probability assignment associated
with structure Bs

10
Computing P(Bs,D)
Where
D - dataset, it has m cases(records) Z - a set
of n discrete variables (x1, , xn) ri - a
variable xi in Z has ri possible value
assignment
Bs - a bayes network structure containing just
the variables in Z ?i - each variable xi in Bs
has a set of parents which we represent with a
list of variables ?i qi - there are has unique
instantiations of ?i wij - denote jth unique
instantiation of ?i relative to D. Nijk - the
number of cases in D in which variable xi has the
value of and ?i is instantiated
as wij. Nij -
11
Decrease the computational complexity
Three more assumptions to decrease the
computational complexity to polynomial-time lt1gt
There is an ordering on the nodes such that if xi
precedes xj, then we do not allow structures in
which there is an arc from xj to xi . lt2gt There
exists a sufficiently tight limit on the number
of parents of any nodes lt3gt P(?i? xi) and P(?j?
xj) are independent when i? j.
12
K2 algorithm a heuristic search method
Use the following functions
Where the Nijk are relative to ?i being the
parents of xi and relative to a database D
Pred(xi) x1, ... xi-1
It returns the set of nodes that precede xi in
the node ordering
13
K2 algorithm a heuristic search method
Input A set of nodes, an ordering on the nodes,
an upper bound u on the number of parents a node
may have, and a database D containing m
cases Output For each nodes, a printout of
the parents of the node
14
K2 algorithm a heuristic search method
Procedure K2 For i1 to n do ?i ? Pold
g(i, ?i ) OKToProceed true while OKToProceed
and ?i ltu do let z be the node in Pred(xi)-
?i that maximizes g(i, ?i ?z) Pnew g(i, ?i
?z) if Pnew gt Pold then Pold
Pnew ?i ?i ?z else OKToProceed
false end while write(Node, parents of
this nodes , ?i ) end for end K2
15
Conditional probabilities

Let ?ijk denote the conditional probabilities
P(xi vik ?i wij )-that is, the probability
that xi has value v for some k from 1 to ri ,
given that the parents of x , represented by ,
are instantiated as wij. We call ?ijk a network
conditional probability.
Let ? be the four assumptions.
The expected value of ?ijk

16
Demo Example
Input
The dataset is generated from the following
structure
x1
x2
x3
17
Demo Example
Note -- use logg(i, ?i ) instead of g(i, ?i )
to save running time

Write a Comment

User Comments (0)