Title: Correlation Immune Functions and Learning
1 Correlation Immune Functions and Learning
- Lisa Hellerstein
- Polytechnic Institute of NYU
- Brooklyn, NY
- Includes joint work with Bernard Rosell (ATT),
Eric Bach and David Page (U. of Wisconsin), and
Soumya Ray (Case Western)
2Identifying relevant variables from random
examples
x
f(x) (1,1,0,0,0,1,1,0,1,0)
1 (0,1,0,0,1,0,1,1,0,1) 1 (1,0,0,1,0,1,0,0,1
,0) 0
3Technicalities
- Assume random examples drawn from uniform
distribution over 0,1n - Have access to source of random examples
4Detecting that a variable is relevant
- Look for dependence between input variables and
output - If xi irrelevant P(f1xi1)
P(f1xi0) -
- If xi relevant P(f1xi1) ?
P(f1xi0) - for previous
function f
5Unfortunately
xi relevant P(f1xi1) 1/2
P(f1xi0) xi irrelevant
P(f1xi1) 1/2 P(f1xi0)
Finding a relevant variable easy for some
functions. Not so easy for others.
6How to find the relevant variables
- Suppose you know r ( of relevant vars)
- Assume r ltlt n
- (Think of r log n)
- Get m random examples, where
- m poly(2r ,log n,1/d)
- With probability gt 1-d, have enough info to
determine which r variables are relevant - All other sets of r variables can be ruled out
7 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
f (1, 1, 0, 1, 1, 0, 1, 0, 1, 0)
1 (0, 1, 1, 1, 1, 0, 1, 1, 0, 0) 0 (1, 1,
1, 0, 0, 0, 0, 0, 0, 0) 1 (0, 0, 0, 1, 1,
0, 0, 0, 0, 0) 0 (1, 1, 1, 0, 0, 0, 1, 1,
1, 1) 0
8 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
f (1, 1, 0, 1, 1, 0, 1, 0, 1, 0)
1 (0, 1, 1, 1, 1, 0, 1, 1, 0, 0) 0 (1, 1,
1, 0, 0, 0, 0, 0, 0, 0) 1 (0, 0, 0, 1, 1,
0, 0, 0, 0, 0) 0 (1, 1, 1, 0, 0, 0, 1, 1,
0, 1) 0
9 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
f (1, 1, 0, 1, 1, 0, 1, 0, 1, 0)
1 (0, 1, 1, 1, 1, 0, 1, 1, 0, 0) 0 (1, 1,
1, 0, 0, 0, 0, 0, 0, 0) 1 (0, 0, 0, 1, 1,
0, 0, 0, 0, 0) 0 (1, 1, 1, 0, 0, 0, 1, 1,
0, 1) 0
x3, x5, x9 cant be the relevant variables
10 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
f (1, 1, 0, 1, 1, 0, 1, 0, 1, 0)
1 (0, 1, 1, 1, 1, 0, 1, 1, 0, 0) 0 (1, 1,
1, 0, 0, 0, 0, 0, 0, 0) 1 (0, 0, 0, 1, 1,
0, 0, 0, 0, 0) 0 (1, 1, 1, 0, 0, 0, 1, 1,
1, 1) 0
x1, x3, x10 ok
11- Naïve algorithm Try all combinations of r
variables. Time nr - Mossel, ODonnell, Servedio STOC 2003
- Algorithm that takes time ncr where c .704
- Subroutine Find a single relevant variable
- Still open Can this bound be improved?
12- If output of f is dependent on xi, can detect
dependence (whp) in time poly(n, 2r) and identify
xi as relevant. - Problematic Functions
- Every variable is independent of output of f
- Pf1xi0 Pf1xi1 for all
xi - Equivalently, all degree 1 Fourier coeffs 0
- Functions with this property said to be
- CORRELATION-IMMUNE
-
13- Pf1xi0 Pf1xi1 for all xi
- Geometrically
-
11
10
e.g. n2
01
00
14- Pf1xi0 Pf1xi1 for all xi
- Geometrically
-
0
1
11
10
Parity(x1,x2)
01
00
0
1
15- Pf1xi0 Pf1xi1 for all xi
- Geometrically
-
0
1
11
10
X11
X10
01
00
0
1
16X20
X21
0
1
11
10
01
00
0
1
17- Other correlation-immune functions besides
parity? - f(x1,,xn) 1 iff x1 x2 xn
-
18- Other correlation-immune functions besides
parity? - All reflexive functions
-
19- Other correlation-immune functions besides
parity? - All reflexive functions
- More
-
20Correlation-immune functions and decision tree
learners
- Decision tree learners in ML
- Popular machine learning approach (CART, C4.5)
- Given set of examples of Boolean function, build
a decision tree - Heuristics for decision tree learning
- Greedy, top-down
- Differ in way choose which variable to put in
node - Pick variable having highest gain
- Pf1xi1 Pf1xi0 means 0 gain
- Correlation-immune functions problematic for
decision tree learners
21- Lookahead
- Skewing An efficient alternative to lookahead
for decision tree induction. IJCAI 2003 Page,
Ray - Why skewing works learning difficult Boolean
functions with greedy tree learners. ICML 2005
Rosell, Hellerstein, Ray, Page
22Story Part One
23- How many difficult functions?
-
- More than
n
fns
n-1 2 2
24- How many different hard functions?
-
- More than
- SOMEONE MUST HAVE STUDIED THESE FUNCTIONS BEFORE
n
fns
n/2 2 2
25(No Transcript)
26(No Transcript)
27Story Part Two
28- I had lunch with Eric Bach
29- Roy, B. K. 2002. A Brief Outline of Research on
Correlation Immune Functions. In Proceedings of
the 7th Australian Conference on information
Security and Privacy (July 03 - 05, 2002). L. M.
Batten and J. Seberry, Eds. Lecture Notes In
Computer Science, vol. 2384. Springer-Verlag,
London, 379-394.
30Correlation-immune functions
- k-correlation immune function
- For every subset S of the input variables s.t.
- 1 S k
- Pf S Pf
- Xiao, Massey 1988 Equivalently, all Fourier
coefficients of degree i are 0, for - 1 i k
-
31- Siegenthalers Theorem
- If f is k-correlation immune, then the GF2
polynomial for f has degree at most n-k.
32- Siegenthalers Theorem 1984
- If f is k-correlation immune, then the GF2
polynomial for f has degree at most n-k. - Algorithm of Mossel, ODonnell, Servedio STOC
2003 based on this theorem
33End of Story
34Non-uniform distributions
- Correlation-immune functions are defined wrt the
uniform distribution - What if distribution is biased?
- e.g. each bit 1 with probability ¾
-
-
35f(x1,x2) parity(x1,x2)each bit 1 with
probability 3/4
Pf1x11 ? Pf1x10
36f(x1,x2) parity(x1,x2)p1 with probability 1/4
Pf1x11 ? Pf1x10
For added irrelevant variables, would be equal
37Correlation-immunity wrt p-biased distributions
- Definitions
- f is correlation-immune wrt distribution D if
- PDf1xi1 PDf1xi0
- for all xi
- p-biased distribution Dp each bit set to 1
independently with probability p - For all p-biased distributions D,
- PDf1xi1 PDf1xi0
- for all irrelevant xi
-
-
38- Lemma Let f(x1,,xn) be a Boolean function with
r relevant variables. Then f is correlation
immune w.r.t. Dp for at most r-1 values of p. - Pf Correlation immune wrt Dp means
- Pf1xi1 Pf1xi0 0 ()
- for all xi.
- Consider fixed f and xi. Can write lhs of ()
- as polynomial h(p).
39- e.g. f(x1,x2, x3) parity(x1,x2, x3)p-biased
distribution Dp - h(p) PDpf1x11 - PDpf1x10
- ( p2 p(1-p) ) ( p(1-p) (1-p)p )
- If add irrelevant variable, this polynomial
doesnt change - h(p) for arbitrary f, variable xi, has degree lt
r-1, where r is number of variables. - f correlation-immune wrt at most r-1 values of p,
unless h(p) identically 0 for all xi.
40- h(p) PDpf1xi1 -PDpf1xi0
- where wd is number of inputs x for which
f(x)1, xi1, and x contains exactly d additional
1s. - i.e. wd number of positive assignments of
fxilt-1 of Hamming weight d - Similar expression for PDpf1xi0
41- PDpf1xi1 - PDpf1xi0
-
-
- where wd number of positive assignments of
fxilt-1 of Hamming weight d - rd number of positive assignments of
fxilt-0 of Hamming weight d - Not identically 0 iff wd ? rd for some d
42Property of Boolean functions
- Lemma If f has at least one relevant variable,
then for some relevant variable xi, and some d, - wd ? rd for some d
- where
- wd number of positive assignments of fxilt-1 of
Hamming weight d - rd number of positive assignments of fxilt-0 of
Hamming weight d
43- How much does it help to have access to examples
from different distributions?
44- How much does it help to have access to examples
from different distributions? - Hellerstein, Rosell, Bach, Page, Ray
- Exploiting Product Distributions to Identify
Relevant Variables of Correlation Immune
Functions
Exploiting Product Distributions to Identify
Relevant Variables of Correlation Immune
Functions Hellerstein, Rosell, Bach, Ray, Page
45- Even if f is not correlation-immune wrt Dp, may
need very large sample to detect relevant
variable - if value of p very near root of h(p)
- Lemma If h(p) not identically 0, then for some
value of p in the set - 1/(r1),2/(r1),3/(r1), (r1)/(r1) ,
- h(p) 1/(r1)r-1
46- Algorithm to find a relevant variable
- Uses examples from distributions Dp, for
- p 1/(r1),2/(r1),3/(r1), (r1)/(r1)
- sample size poly((r1) r, log n, log 1/d)
- Essentially same algorithm found independently
by Arpe and Mossel, using very different
techniques - Another algorithm to find a relevant variable
- Based on proving (roughly) that if choose random
p, then h2(p) likely to be reasonably large.
Uses prime number theorem. - Uses examples from poly(2r, log 1/ d)
distributions Dp. - Sample size poly(2r, log n, log 1/ d)
47 48Summary
- Finding relevant variables (junta-learning)
- Correlation-immune functions
- Learning from p-biased distributions
49Moral of the Story
- Handbook of integer sequences can be useful in
doing literature search - Eating lunch with the right person can be much
more useful