FTP Search and Compare Engine - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

FTP Search and Compare Engine

Description:

Basic Model and the Score Function. K2 ... Basic Model. The problem: to find the most probable Bayes-network structure given a database ... Basic Model ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 18
Provided by: hpg9
Category:
Tags: ftp | basic | compare | engine | search

less

Transcript and Presenter's Notes

Title: FTP Search and Compare Engine


1
K2 Algorithm Presentation
Learning Bayes Networks from Data Haipeng Guo
Friday, April 21, 2000 KDD Lab, CIS Department,
KSU
2
Presentation Outline
  • Bayes Networks Introduction
  • Whats K2?
  • Basic Model and the Score Function
  • K2 algorithm
  • Demo

3
Bayes Networks Introduction
  • A Bayes network B (Bs, Bp)
  • A Bayes Network structure Bs is a directed
    acyclic graph in which nodes represent random
    domain variables and arcs between nodes represent
    probabilistic independence.
  • Bs is augmented by conditional probabilities, Bp,
    to form a Bayes Network B.

4
Bayes Networks Introduction
  • Example Sprinkler

- Bs of Bayes Network the structure
5
Bayes Networks Introduction
- Bp of Bayes Network the conditional probability
season
sprinkler
Rain , Ground-moist, and Ground-state
6
Whats K2?
  • K2 is an algorithm for constructing a Bayes
    Network from a database of records
  • A Bayesian Method for the Induction of
    Probabilistic Networks from Data, Gregory F.
    Cooper and Edward Herskovits, Machine Learning 9,
    1992

7
Basic Model
  • The problem to find the most probable
    Bayes-network structure given a database
  • D a database of cases
  • Z the set of variables represented by D
  • Bsi , Bsj two bayes network structures
    containing exactly those variables that are in Z

8
Basic Model
  • By computing such ratios for pairs of bayes
    network structures, we can rank order a set of
    structures by their posterior probabilities.
  • Based on four assumptions, the paper
    introduces an efficient formula for computing
    P(Bs,D), let B represent an arbitrary bayes
    network structure containing just the variables
    in D

9
Computing P(Bs,D)
  • Assumption 1 The database variables, which we
    denote as Z, are discrete
  • Assumption 2 Cases occur independently, given
    a bayes network model
  • Assumption 3 There are no cases that have
    variables with missing values
  • Assumption 4 The density function f(BpBs) is
    uniform. Bp is a vector whose values denotes the
    conditional-probability assignment associated
    with structure Bs

10
Computing P(Bs,D)
Where
D - dataset, it has m cases(records) Z - a set
of n discrete variables (x1, , xn) ri - a
variable xi in Z has ri possible value
assignment
Bs - a bayes network structure containing just
the variables in Z ?i - each variable xi in Bs
has a set of parents which we represent with a
list of variables ?i qi - there are has unique
instantiations of ?i wij - denote jth unique
instantiation of ?i relative to D. Nijk - the
number of cases in D in which variable xi has the
value of and ?i is instantiated
as wij. Nij -
11
Decrease the computational complexity
Three more assumptions to decrease the
computational complexity to polynomial-time lt1gt
There is an ordering on the nodes such that if xi
precedes xj, then we do not allow structures in
which there is an arc from xj to xi . lt2gt There
exists a sufficiently tight limit on the number
of parents of any nodes lt3gt P(?i? xi) and P(?j?
xj) are independent when i? j.
12
K2 algorithm a heuristic search method
Use the following functions
Where the Nijk are relative to ?i being the
parents of xi and relative to a database D
Pred(xi) x1, ... xi-1
It returns the set of nodes that precede xi in
the node ordering
13
K2 algorithm a heuristic search method
Input A set of nodes, an ordering on the nodes,
an upper bound u on the number of parents a node
may have, and a database D containing m
cases Output For each nodes, a printout of
the parents of the node
14
K2 algorithm a heuristic search method
Procedure K2 For i1 to n do ?i ? Pold
g(i, ?i ) OKToProceed true while OKToProceed
and ?i ltu do let z be the node in Pred(xi)-
?i that maximizes g(i, ?i ?z) Pnew g(i, ?i
?z) if Pnew gt Pold then Pold
Pnew ?i ?i ?z else OKToProceed
false end while write(Node, parents of
this nodes , ?i ) end for end K2
15
Conditional probabilities
  • Let ?ijk denote the conditional probabilities
    P(xi vik ?i wij )-that is, the probability
    that xi has value v for some k from 1 to ri ,
    given that the parents of x , represented by ,
    are instantiated as wij. We call ?ijk a network
    conditional probability.
  • Let ? be the four assumptions.
  • The expected value of ?ijk

16
Demo Example
Input
The dataset is generated from the following
structure
x1
x2
x3
17
Demo Example
Note -- use logg(i, ?i ) instead of g(i, ?i )
to save running time
Write a Comment
User Comments (0)
About PowerShow.com