This is an attempt at a title

About This Presentation

Title:

This is an attempt at a title

Description:

Ryan O Donnell - Microsoft Mike Saks - Rutgers Oded Schramm - Microsoft Rocco Servedio - Columbia – PowerPoint PPT presentation

Number of Views:64

Avg rating:3.0/5.0

Slides: 33

Provided by: Ryan1159

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: This is an attempt at a title

1
Decision Trees and Influences
Ryan ODonnell - Microsoft
Mike Saks - Rutgers
Oded Schramm - Microsoft
Rocco Servedio - Columbia
2
Part I Decision trees have large influences
3
Printer troubleshooter
Does anything print?
Right size paper?
Can print from Notepad?
Network printer?
Printer mis-setup?
File too complicated?
Solved
Solved
Driver OK?
Driver OK?
Solved
Solved
Call tech support
4
Decision tree complexity

f Attr1 Attr2 Attrn ? -1,1.
Whats the best DT for f, and how to find it?
Depth worst case of questions.
Expected depth avg. of questions.

5
Building decision trees

Identify the most influential/decisive/releva
nt variable.
Put it at the root.
Recursively build DTs for its children.
Almost all real-world learning algs based on this
CART, C4.5,
Almost no theoretical (PAC-style) learning algs
based on this
Blum92, KM93, BBVKV97, PTF-folklore, OS04
no
EH89, SJ03 sorta.
Conjd to be good for some problems (e.g.,
percolation SS04) but unprovable

6
Boolean DTs

f -1,1n ? -1,1.
D(f) min depth of a DT for f.
0 D(f) n.

x1
x2
Maj3
x3
-1
1
-1
1
-1
1
7
Boolean DTs

-1,1n viewed as a probability space, with
uniform probability distribution.
uniformly random path down a DT, plus a uniformly
random setting of the unqueried variables,
defines a uniformly random input
expected depth d(f).

8
Influences

influence of coordinate j on f
the probability that xj is relevant for f
Ij(f) Pr f(x) ? f(x (?j) ) .
0 Ij(f) 1.

9
Main question

If a function f has a shallow decision tree,
does it have a variable with significant
influence?

10
Main question

No.
But for a silly reason
Suppose f is highly biased say Prf 1 p
1.
Then for any j,
Ij(f) Prf(x) 1, f(x(?j)) -1
Prf(x) -1, f(x(?j)) 1
Prf(x) 1 Prf(x(?j))
1
p p
2p.

11
Variance

? Influences are always at most 2 minp,q.
Analytically nicer expression Varf.
Varf Ef2 Ef2
1 (p q)2 1 (2p - 1)2 4p(1 p)
4pq.
2 minp,q 4pq 4 minp,q.
Its 1 for balanced functions.
So Ij(f) Varf, and it is fair to say Ij(f) is
significant if its a significant fraction of
Varf.

12
Main question

If a function f has a shallow decision
tree,does it have a variable with influence at
leasta significant fraction of Varf?

13
Notation

t(d) min max Ij(f) /
Varf .

f D(f) d
j
14
Known lower bounds

Suppose f -1,1n ? -1,1.
An elementary old inequality states
Varf Ij(f).
Thus f has a variable with influence at least
Varf/n.
A deep inequality of KKL88 shows there is
always a coord. j such that Ij(f) Varf
O(log n / n).
If D(f) d then f really has at most 2d
variables.
Hence we get t(d) 1/2d from the first, and t(d)
O(d/2d) from KKL.

15
Our result

t(d) 1/d.
This is tight
Then VarSEL 1, d 2, all three variables
have infl. ½.
(Form recursive version, SEL(SEL, SEL, SEL) etc.,
gives Var 1 fcn with d 2h, all influences 2-h
for any h.)

SEL
16
Our actual main theorem

Given a decision tree f, let dj(f) Prtree
queries xj.
Then
Varf dj(f) Ij(f).
Cor Fix the tree with smallest expected depth.
Then dj(f) Edepth of a path d(f)
D(f).
? Varf max Ij
dj max Ij d(f)
? max Ij Varf / d(f)
Varf / D(f).

17
Proof

Pick a random path in the tree. This gives some
set of variables, P (xJ1, , xJT), along with
an assignment to them, ßP.
Call the remaining set of variables P and pick a
random assignment ßP for them too.
Let X be the (uniformly random string) given by
combining these two assignments, (ßP, ßP).
Also, define JT1, , Jn -.

18
Proof

Let ßP be an independent random asgn to vbls in
P.
Let Z (ßP, ßP).
Note Z is also uniformly random.

JT1 Jn -
xJ1 1
J1
J2
J3
JT
xJ2 1
X (-1, 1, -1, , 1, )
1, -1, 1, -1
xJ3 -1
Z ( , )
1, -1, 1, -1
1,-1, -1, ,-1
xJT 1
1
19
Proof

Finally, for t 0T, let Yt be the same string
as X, except that Zs assignments (ßP) for
variables xJ1, , xJt are swapped in.
Note Y0 X, YT Z.
Y0 X (-1, 1, -1, , 1, 1, -1, 1, -1 )
Y1 ( 1, 1, -1, , 1, 1, -1, 1, -1 )
Y2 ( 1,-1, -1, , 1, 1, -1, 1, -1 )
YT Z ( 1,-1, -1, ,-1, 1, -1, 1, -1 )
Also define YT1 Yn Z.

Varf Ef2 Ef2
E f(X)f(X) E f(X)f(Z)
E f(X)f(Y0) f(X)f(Yn)
E f(X) (f(Yt-1) f(Yt))
E f(Yt-1) f(Yt)
2 Prf(Yt-1) ? f(Yt)
PrJt j 2
Prf(Yt-1) ? f(Yt) Jt j
PrJt j 2
Prf(Yt-1) ? f(Yt) Jt j

21
Proof

PrJt j 2
Prf(Yt-1) ? f(Yt) Jt j
Utterly Crucial Observation
Conditioned on Jt j,
(Yt-1, Yt) are jointly distributed exactly as
(W, W), where W is uniformly random, and W
is W with jth bit rerandomized.

22
JT1 Jn -
xJ1 1
J1
J2
J3
JT
xJ2 1
X (-1, 1, -1, , 1, )
1, -1, 1, -1

Y0 X (-1, 1, -1, , 1, 1, -1, 1, -1 )
Y1 ( 1, 1, -1, , 1, 1, -1, 1, -1 )
Y2 ( 1,-1, -1, , 1, 1, -1, 1, -1 )
YT Z ( 1,-1, -1, ,-1, 1, -1, 1, -1 )

xJ3 1
Z ( , )
1, -1, 1, -1
1,-1, -1, ,-1
xJT 1
1
23
Proof

PrJt j 2
Prf(Yt-1) ? f(Yt) Jt j
PrJt j 2
Prf(W) ? f(W)
PrJt j Ij(f)
Ij PrJt j
Ij dj.

24
Part II Lower bounds for monotone graph
properties
25
Monotone graph properties
v2

Consider graphs on v vertices let n ( ).
Nontrivial monotone graph property
nontrivial property a (nonempty, nonfull)
subset of all v-vertex graphs
graph property closed under permutations of
the vertices (? no edge is distinguished)
monotone adding edges can only put you into the
property, not take you out
e.g. Contains-A-Triangle, Connected,
Has-Hamiltonian-Path, Non-Planar,
Has-at-least-n/2-edges,

26
Aanderaa-Karp-Rosenberg conj.

Every nontrivial monotone graph propery has D(f)
n.
Rivest-Vuillemin-75 v2/16.
Kleitman-Kwiatowski-80 v2/9.
Kahn-Saks-Sturtevant-84 n/2, n, if v is a
prime power.
Topology group theory!
Yao-88 n in the bipartite case.

27
Randomized DTs

Have coin flip nodes in the trees that cost
nothing.
Or, probability distribution over deterministic
DTs.
Note We want both 0-sided error and worst-case
input.
R(f) min, over randomized DTs that compute f
with 0-error, of max over inputs x, of expected
of queries.
The expectation is only over the DTs internal
coins.

D(Maj3) 3.
Pick two inputs at random, check if theyre the
same. If not, check the 3rd.
? R(Maj3) 8/3.
Let f recursive-Maj3 Maj3 (Maj3 , Maj3 ,
Maj3 ), etc
For depth-h version (n 3h),
D(f) 3h.
R(f) (8/3)h.
(Not best possible!)

Maj3
29
Randomized AKR / Yao conj.

Yao conjectured in 77 that every nontrivial
monotone graph property f has R(f) O(v2).
Lower bound O( ) Who
v Yao-77
v log 1/12 v Yao-87
v5/4 King-88
v4/3 Hajnal-91
v4/3 log 1/3 v Chakrabarti-Khot-01
min v/p, v2/log v Fried.-Kahn-Wigd.-02
v4/3 / p1/3 us

30
Outline

Extend main inequality to the p-biased case.
(Then LHS is 1.)
Use Yaos minmax principle Show that under
p-biased -1,1n, d S dj avg queries is
large for any tree.
Main inequality max influence is small ? d is
large.
Graph property ? all vbls have the same
influence.
Hence sum of influences is small ? d is large.
OS04 f monotone ? sum of influences vd.
Hence sum of influences is large ? d is large.
So either way, d is large.

31
Generalizing the inequality
n

Varf dj(f) Ij(f).
Generalizations (which basically require no proof
change)
holds for randomized DTs
holds for randomized subcube partitions
holds for functions on any product probability
space f O1 On ? -1,1 (with
notion of influence suitably generalized)
holds for real-valued functions with (necessary)
loss of a factor, at most vd

S
j 1
32
Closing thought

Its funny that our bound gets stuck roughly at
the same level as Hajnal / Chakrabarti-Khot, n2/3
v4/3.
Note that n2/3 I believe cannot be improved by
more than a log factor merely for monotone
transitive functions, due to BSW04.
Thus to get better than v4/3 for monotone graph
properties, you must use the fact that its a
graph property.
Chakrabarti-Khot does definitely use the fact
that its a graph property (all sorts of graph
packing lemmas).
Or do they? Since they get stuck at essentially
v4/3, I wonder if theres any chance their result
doesnt truly need the fact that its a graph
property

Write a Comment

User Comments (0)

About PowerShow.com

This is an attempt at a title - PowerPoint PPT Presentation

This is an attempt at a title

Ryan O Donnell - Microsoft Mike Saks - Rutgers Oded Schramm - Microsoft Rocco Servedio - Columbia – PowerPoint PPT presentation