Correlation Immune Functions and Learning Lisa Hellerstein Polytechnic

Correlation Immune Functions and Learning Lisa Hellerstein Polytechnic

Correlation Immune Functions and Learning Lisa Hellerstein Polytechnic Institute of NYU Brooklyn, NY Includes joint work with Bernard Rosell (AT&T), Eric Bach and David Page (U. of Wisconsin), and Soumya Ray (Case Western) Identifying relevant variables from random examples f ( x1, x 2,..., x10) x 2 x 7 x10

x (1,1,0,0,0,1,1,0,1,0) (0,1,0,0,1,0,1,1,0,1) (1,0,0,1,0,1,0,0,1,0) f(x) 1 1 0 2 Technicalities Assume random examples drawn from uniform distribution over {0,1}n

Have access to source of random examples 3 Detecting that a variable is relevant Look for dependence between input variables and output If xi irrelevant P(f=1|xi=1) = P(f=1|xi=0) If xi relevant

P(f=1|xi=1) P(f=1|xi=0) for previous function f 4 Unfortunately f ( x1, x 2,..., x10) parity( x 2, x 7, x10) xi relevant xi irrelevant P(f=1|xi=1) = 1/2 = P(f=1|xi=0) P(f=1|xi=1) = 1/2 = P(f=1|xi=0) Finding a relevant variable easy for some functions. Not so easy for others.

5 How to find the relevant variables Suppose you know r (# of relevant vars) Assume r << n (Think of r = log n) Get m random examples, where m = poly(2r ,log n,1/) With probability > 1-, have enough info to determine which r variables are relevant All other sets of r variables can be ruled out 6

x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 (1, 1, 0, 1, 1, 0, 1, 0, 1, 0) (0, 1, 1, 1, 1, 0, 1, 1, 0, 0) (1, 1, 1, 0, 0, 0, 0, 0, 0, 0) (0, 0, 0, 1, 1, 0, 0, 0, 0, 0) (1, 1, 1, 0, 0, 0, 1, 1, 1, 1) f 1 0 1 0

0 7 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 (1, 1, 0, 1, 1, 0, 1, 0, 1, 0) (0, 1, 1, 1, 1, 0, 1, 1, 0, 0) (1, 1, 1, 0, 0, 0, 0, 0, 0, 0) (0, 0, 0, 1, 1, 0, 0, 0, 0, 0) (1, 1, 1, 0, 0, 0, 1, 1, 0, 1) f 1

0 1 0 0 8 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 (1, 1, 0, 1, 1, 0, 1, 0, 1, 0) (0, 1, 1, 1, 1, 0, 1, 1, 0, 0) (1, 1, 1, 0, 0, 0, 0, 0, 0, 0) (0, 0, 0, 1, 1, 0, 0, 0, 0, 0) (1, 1, 1, 0, 0, 0, 1, 1, 0, 1)

f 1 0 1 0 0 x3, x5, x9 cant be the relevant variables 9 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 (1, 1, 0, 1, 1, 0, 1, 0, 1, 0)

(0, 1, 1, 1, 1, 0, 1, 1, 0, 0) (1, 1, 1, 0, 0, 0, 0, 0, 0, 0) (0, 0, 0, 1, 1, 0, 0, 0, 0, 0) (1, 1, 1, 0, 0, 0, 1, 1, 1, 1) f 1 0 1 0 0 x1, x3, x10 ok 10

Nave algorithm: Try all combinations of r variables. Time nr Mossel, ODonnell, Servedio [STOC 2003] Algorithm that takes time ncr where c .704 Subroutine: Find a single relevant variable Still open: Can this bound be improved? 11 If output of f is dependent on xi, can detect dependence (whp) in time poly(n, 2r) and identify xi as relevant. Problematic Functions Every variable is independent of output of f

P[f=1|xi=0] = P[f=1|xi=1] for all xi Equivalently, all degree 1 Fourier coeffs = 0 Functions with this property said to be CORRELATION-IMMUNE 12 P[f=1|xi=0] = P[f=1|xi=1] for all xi Geometrically: 10 11 00

01 e.g. n=2 13 P[f=1|xi=0] = P[f=1|xi=1] for all xi Geometrically: 1 10 11

0 00 01 0 Parity(x1,x2) 1 14 P[f=1|xi=0] = P[f=1|xi=1] for all xi

Geometrically: 1 10 11 0 00 01 0 X1=1

X1=0 1 15 P[f=1|xi=0] = P[f=1|xi=1] for all xi 1 X2=0 X2=1

10 11 0 00 01 0 1 16

Other correlation-immune functions besides parity? f(x1,,xn) = 1 iff x1 = x2 = = xn 17 Other correlation-immune functions besides parity? All reflexive functions f(x) f(x) for all x 18

Other correlation-immune functions besides parity? All reflexive functions f(x) f(x) for all x More 19 Correlation-immune functions and decision tree learners Decision tree learners in ML Popular machine learning approach (CART, C4.5) Given set of examples of Boolean function, build a

decision tree Heuristics for decision tree learning Greedy, top-down Differ in way choose which variable to put in node Pick variable having highest gain P[f=1|xi=1] = P[f=1|xi=0] means 0 gain Correlation-immune functions problematic for

decision tree learners 20 Lookahead Skewing: An efficient alternative to lookahead for decision tree induction. IJCAI 2003 [Page, Ray] Why skewing works: learning difficult Boolean functions with greedy tree learners. ICML 2005 [Rosell, Hellerstein, Ray, Page] 21

Story Part One 22 How many difficult functions? n # fns 0 2 1 2

2 4 3 18 4 5 648 3140062 More than n-1 2

2 23 How many different hard functions? n # fns 0 2 1 2

2 4 3 18 4 5 648 3140062 More than n/2 2 2

SOMEONE MUST HAVE STUDIED THESE FUNCTIONS BEFORE 24 25 26 Story Part Two 27 I had lunch with Eric Bach

28 Roy, B. K. 2002. A Brief Outline of Research on Correlation Immune Functions. In Proceedings of the 7th Australian Conference on information Security and Privacy (July 03 - 05, 2002). L. M. Batten and J. Seberry, Eds. Lecture Notes In Computer Science, vol. 2384. Springer-Verlag, London, 379-394. 29 Correlation-immune functions

k-correlation immune function For every subset S of the input variables s.t. 1 |S| k P[f | S] = P[f] [Xiao, Massey 1988] Equivalently, all Fourier coefficients of degree i are 0, for 1ik 30 Siegenthalers Theorem If f is k-correlation immune, then the GF[2] polynomial for f has degree at most n-k.

31 Siegenthalers Theorem [1984] If f is k-correlation immune, then the GF[2] polynomial for f has degree at most n-k. Algorithm of Mossel, ODonnell, Servedio [STOC 2003] based on this theorem 32 End of Story

33 Non-uniform distributions Correlation-immune functions are defined wrt the uniform distribution What if distribution is biased? e.g. each bit 1 with probability 34 f(x1,x2) = parity(x1,x2) each bit 1 with probability 3/4 x

parity(x) P[x] 00 0 1/16 01 1

3/16 10 1 3/16 11 0 9/16

P[f=1|x1=1] P[f=1|x1=0] 35 f(x1,x2) = parity(x1,x2) p=1 with probability 1/4 x parity(x) P[x] 00 0

1/16 01 1 3/16 10 1 3/16

11 0 9/16 P[f=1|x1=1] P[f=1|x1=0] For added irrelevant variables, would be equal 36 Correlation-immunity wrt p-biased distributions Definitions

f is correlation-immune wrt distribution D if PD[f=1|xi=1] = PD[f=1|xi=0] for all xi p-biased distribution Dp: each bit set to 1 independently with probability p For all p-biased distributions D, PD[f=1|xi=1] = PD[f=1|xi=0] for all irrelevant xi 37 Lemma: Let f(x1,,xn) be a Boolean function with r relevant variables. Then f is correlation immune w.r.t. Dp for at most r-1

values of p. Pf: Correlation immune wrt Dp means P[f=1|xi=1] P[f=1|xi=0] = 0 (*) for all xi. Consider fixed f and xi. Can write lhs of (*) as polynomial h(p). 38 e.g. f(x1,x2, x3) = parity(x1,x2, x3) p-biased distribution Dp h(p) = PDp[f=1|x1=1] - PDp[f=1|x1=0] = ( p2 + p(1-p) ) ( p(1-p) + (1-p)p )

If add irrelevant variable, this polynomial doesnt change h(p) for arbitrary f, variable xi, has degree <= r1, where r is number of variables. f correlation-immune wrt at most r-1 values of p, unless h(p) identically 0 for all xi. 39 h(p) = PDp[f=1|xi=1] -PDp[f=1|xi=0] n 1 d PDp [f 1 | x i 1] wdp (1 p)

n 1 d d 0 where wd is number of inputs x for which f(x)=1, xi=1, and x contains exactly d additional 1s. i.e. wd = number of positive assignments of fxi<-1 of Hamming weight d Similar expression for PDp[f=1|xi=0] 40 PDp[f=1|xi=1] - PDp[f=1|xi=0] =

n 1 PDp [ f 1 | xi 1] ( wd rd ) p d (1 p ) n 1 d d 0 where wd = number of positive assignments of fxi<-1 of Hamming weight d rd = number of positive assignments of fxi<-0 of Hamming weight d Not identically 0 iff wd rd for some d 41

Property of Boolean functions Lemma: If f has at least one relevant variable, then for some relevant variable xi, and some d, wd rd for some d where wd = number of positive assignments of fxi<-1 of Hamming weight d rd = number of positive assignments of fxi<-0 of Hamming weight d 42

How much does it help to have access to examples from different distributions? 43 How much does it help to have access to examples from different distributions? Exploiting Product Distributions to Identify Hellerstein, Rosell, Bach, Page,

Ray Relevant Variables of Correlation Immune Exploiting Product Distributions to Identify Relevant Functions Variables of Correlation Immune Functions [Hellerstein, Rosell, Bach, Ray, Page] 44 Even if f is not correlation-immune wrt Dp, may need very large sample to detect relevant variable if value of p very near root of h(p)

Lemma: If h(p) not identically 0, then for some value of p in the set { 1/(r+1),2/(r+1),3/(r+1), (r+1)/(r+1) }, h(p) 1/(r+1)r-1 45 Algorithm to find a relevant variable Uses examples from distributions Dp, for p = 1/(r+1),2/(r+1),3/(r+1), (r+1)/(r+1) sample size poly((r+1) r, log n, log 1/) [Essentially same algorithm found independently by Arpe and Mossel, using very different techniques]

Another algorithm to find a relevant variable Based on proving (roughly) that if choose random p, then h2(p) likely to be reasonably large. Uses prime number theorem. Uses examples from poly(2r, log 1/ ) distributions Dp. Sample size poly(2r, log n, log 1/ ) 46 Better algorithms? 47 Summary Finding relevant variables (junta-learning)

Correlation-immune functions Learning from p-biased distributions 48 Moral of the Story Handbook of integer sequences can be useful in doing literature search Eating lunch with the right person can be much more useful 49

Recently Viewed Presentations

  • Key Principles for Interpreting the Holy Scriptures (Biblical

    Key Principles for Interpreting the Holy Scriptures (Biblical

    Luke 24:27 "And beginning with Moses and all the Prophets, he interpreted to them in all the Scriptures the things concerning himself." (ESV) καὶ ἀρξάμενος ἀπὸ Μωϋσέως καὶ ἀπὸ πάντων τῶν προφητῶν . διερμήνευσεν
  •  SSER Ltd. A. Popperwell 1. The following drawing
  • &quot;Dulce et Decorum Est&quot; by Wilfred Owen

    "Dulce et Decorum Est" by Wilfred Owen

    Anger is like… The wind is like… Feeling like… Moving like… Scattering like… Blooming like… p . Wednesday, February 27 . Madonna used similes! Pick five similes to extend by answering Who? What? When? Why? How? Example: The breeze is...
  • The Peopling of Newfoundland and Labrador

    The Peopling of Newfoundland and Labrador

    Easton was never brought to justice and eventually used his fortune to buy the title, Marquis of Savoy. Being at that time a handsome man around 40, according to contemporary descriptions, he crowned his career by marrying a very wealthy...
  • Lesson Study - University of South Florida

    Lesson Study - University of South Florida

    does your current lesson design and planning process provide the framework for the implementation. of . ... Assist the team in following the directions on the slide to complete the Team Roles portion on the Lesson Study Team template.
  • Analysis of Algorithms - Emory University

    Analysis of Algorithms - Emory University

    Tahoma Arial Wingdings Times New Roman Symbol Blueprint 1_Blueprint Microsoft Visio Drawing Tries Preprocessing Strings Standard Tries Analysis of Standard Tries Word Matching with a Trie Compressed Tries Compact Representation Suffix Trie Analysis of Suffix Tries Encoding Trie (1) Encoding...
  • Basel III and Future Banking Systems - Instruct

    Basel III and Future Banking Systems - Instruct

    Basel III, an enhancement of Basel II (con't) An effective implementation of Basel III will demonstrate to regulators, customers, and shareholders that the bank is recovering well from the global banking crisis of 2008.
  • academy.spotonmedics.nl

    academy.spotonmedics.nl

    Een methode om de strategische planning te bepalen is OGSM. Een kader dat een top-down benadering gebruikt om te definiëren wat de Objectives, Goals, Strategies en Measures (OGSM) zijn van je praktijk. De output is een helder, eenvoudig en beknopt...