Improved Average-Case Lower Bounds for De Morgan Formula Size

Improved Average-Case Lower Bounds for De Morgan Formula Size

Time-Space Hardness for Learning Problems Avishay Tal (Stanford) Based on joint works with Sumegha Garg, Gillat Kol & Ran Raz Learning The Streaming Model Black Box (0,1,1,0,1) (0,0,1,0,1) (1,1,0,1,0)

stream of examples [Shamir2014] [Steinhardt-Valiant-Wager'2015] aka Online Learning 1 0 Learner Examples of Learning Problems Parity Learning: for DNF Learning: is a small size DNF formula Decision Tree Learning:

is a small size decision tree Junta Learning: depends only on of the input bits. Parity Learning Problem is unknown to the learner Given a stream of examples where and , the learner needs to learn with high probability. Parity Learning Problem is chosen uniformly at random

is unknown to the learner Given a stream of examples where and , the learner needs to learn with high probability. :Algorithms for Parity Learning 1. Gaussian Elimination memory bits, samples. 2. Trying all possibilities memory bits, samples. Razs Breakthrough Theorem [Raz16]: Any algorithm for parity learning requires either memory bits or an

exponential number of samples. Sparse Parities Could we learn better if we knew that is -sparse (i.e., )? Note: any -sparse parity is also: size DNF formula, size decision-tree, Junta on variables. Lower bounds for learning -sparse parities Lower bounds for learning all of the above

Upper Bounds 1. Trying all possibilities: samples memory bits 2. Record and Eliminate (like Gaussian Elim.) i. Record equations in memory. ii. Check which of all possible -sparse vectors satisfies the recorded equations. samples memory bits Algorithm #3: O(n) memory and samples.

Can we learn -sparse parities in O(n) memory and polynomial number of samples? !No Theorem [Kol-Raz-T17] Any algorithm for -sparse parity learning requires either memory bits or samples. -sparse parity learning requires either memory or samples. Motivation: Cryptography [Raz 16, Valiant-Valiant 16] Applications to Bounded Storage Crypto:

Encryption/Decryption scheme with: Keys length: Encryption/Decryption time: Unconditional security, if the attackers memory size is at most Previous works assumed that the attackers memory size is at most linear in the time needed for encryption/decryption Motivation: Cryptography [Raz 16, Valiant-Valiant 16, Kol-Raz-T 16] Applications to Bounded Storage Crypto: Encryption/Decryption scheme with:

Keys length: Encryption/Decryption time: Unconditional security, if the attackers memory size is at most In the second part of the talk: Keys length: Encryption/Decryption time: Secure against memory size Motivation: Complexity Theory Time-Space Lower Bounds have been studied in many models [Beame-Jayram-Saks 98, Ajtai 99, Beame-Saks-Sun-Vee00, Fortnow 97,

Fortnow-Lipton-van Melkebeek-Viglas05, Williams06,] Main difference: the online model is easier to prove lower bounds against, since the input is read only once. The Branching Program (BP) Model ( , ) ( , ) (width) (length)

Each layer represents a time step. Each vertex represents a memory state of the learner. Each non-leaf vertex has outgoing edges, one for each . The Branching Program (BP) Model ( , ) ( , ) (width) (length)

A sequence of random examples defines a computation path in the BP. The path finally reaches a leaf and outputs , a guess for the value of . The program is successful if . Affine Branching Programs (ABP) ( , ) ( , ) (width) (length)

An ABP is a BP where each vertex remembers a set of linear equations in the variables , such that, if is reached by the computation-path then all equations in are satisfied (by the true unknown ). Accurate Affine BPs Let be the vertex reached by the computational path of the ABP in layer . is a random variable that depends on . = the distribution of conditioned on reaching a specific vertex in layer . Accurate ABP: for every , is close to uniform over the set of (-sparse) solutions to the eqs .

Proof Plan We follow Razs two steps plan: 1. Simulate any BP for sparse parity learning with an accurate ABP. 2. Prove that ABP for sparse parity learning must be either wide or long. Fix some parameter . In the ABP, all vertices will be labeled with at most equations. Once we reach a vertex with equations in the ABP we declare success. Proof Highlights Simulation Part Layer by layer, we convert the BP to an ABP. For , we convert the -th layer of the program.

Every vertex in the -th layer is split into many vertices by regrouping the edges entering . ( , ) ( , ) (

( , ) ) , Regrouping We partition the edges going into to (not too many) groups,

and associate with each group a set of accurate equations. Layer i-1 Layer i

....

Main Lemma Each edge going into remembers a set of equations Main Lemma ( , )

( , ) ( , ) Either:

1. There exists an equation that is shared by many of the edges. 2. is close to uniform (over all -sparse vectors). Regrouping from Main Lemma ( , )

( , ) ( , ) Main Lemma: Either

1. There exists an equation that is shared by many of the edges. 2. is close to uniform (over all -sparse vectors). Applying the main lemma recursively times, we find a large fraction of the edges with common eqs s.t. conditioned on passing through one of these edges, is close to uniform over all (-sparse) solutions to the eqs.

Proof on White Board Lower Bounds on the Affine BP Recall: all subspaces in the Affine BP are defined by at most equations. Success = learned equations. ( , ) ( , )

( ( , ) ) , Fix a node in the Affine BP with linearly independent eqs. [Raz16]: prob. of reaching is at most To succeed whp, the width should be .

Proof on White Board Conclusion First Part Main Theorem: Learning -sparse parities requires either memory bits or number of samples. Implies same bounds for learning size DNF formula size Decision trees Juntas on variables Open: proving tight samples-memory hardness for learning DNFs, Decision Trees, or Juntas

Lower Bounds more Generally Q: Can we generalize the lower bounds to hold for problems not involving parities? [Raz17, Moshkovitz-Moshkovitz17, MoshkovitzMoshkovitz18]: Yes A new and general proof technique (we shall focus on Razs proof technique) As a special case: a new proof for the memory-samples lower bound for parity learning. [Garg-Raz-T18, Beame-Oveis Gharan-Yang18]: Further generalizations of the method & more applications A Learning Problem as a Matrix : finite sets : concept class

: possible samples : a matrix is chosen uniformly at random A learner tries to learn from a stream , where : and Extractor-Based Lower Bounds for Learning Thm [Garg-Raz-T18] Assume that any submatrix of of fraction has bias of at most . Then, any learning algorithm for the learning problem defined by requires either: memory bits,

or samples. 2 Independently, [BeameOveis Gharan-Yang18] got a similar result 2

Applications of Extractor-Based Theorem Learning Parities Learning Sparse Parities and implications Learning from low-degree equations: A learner tries to learn , from random polynomial equations of degree at most , over . memory or samples Learning low-degree polynomials: A learner tries to learn an -variate multilinear polynomial of degree at most over , from random evaluations of over . memory or samples and more Technique to Prove Extractor Property

: {1,1} : the learning matrix Defn: We say that the columns of are -almost orthogonal if for each column , at most of the columns have . Claim: Suppose the columns of are -almost orthogonal, for . Then, learning requires either memory bits or samples Recall: The Branching Program Model ( , ) ( , )

(width) (length) Each layer represents a time step. Each vertex represents a memory state of the learner. Each non-leaf vertex has outgoing edges, one for each . Proof Overview = the distribution of conditioned on reaching a specific vertex .

| |=2 Significant vertices: s.t. = probability that the path reaches . We prove: If is significant, Hence, there are at least significant vertices. = same as the computational path, but stops when atypical things happen (stopping rules) is exp small Proof Overview If is significant, Progress Function: For layer ,

1) 2) is very slowly growing: (as long as number of steps is at most ) 3) If , then Hence: If is significant, Open Problems Optimal tradeoffs for DNFs, Juntas, Decision Trees. What are the limits of the Extractor-Based lower bounds for these problems? Characterize memory-samples complexity from properties of the learning matrix . Generalize to Real-Valued Domains Generalize to k-passes (some progress)

Open Problem: Understanding Neural Nets Expressiveness and Learnability are empirically different in Neural Nets. Consider the following experiment: Generate (input,output) pairs from a depth-2 NN with a fixed structure & randomly chosen weights. Try to learn weights from (input,output) pairs using stochastic gradient descent. This usually fails. Can this be explained by the low-memory of the learner? !Thank You

Recently Viewed Presentations

  • Lección 7 - Susquehanna Township School District

    Lección 7 - Susquehanna Township School District

    Verbs ending in -gar, -car, and -zar change g to gu, c to qu, ... Pretérito de verbosregulares (4) Verbs whose stem ends in a strong vowel change the unaccented i of thepreterit ending to y in the third-person singular...
  • civil war vocabulary #1

    civil war vocabulary #1

    Define each of the following words and draw a memory cue for each one. Due tomorrow. abolitionists popular sovereignty Kansas-Nebraska Act Bleeding Kansas
  • Weathering - WordPress.com

    Weathering - WordPress.com

    Engineering Geology I (GED355) S. Bansah, August 2010 Engineering Geological Maps & Plans
  • Readability and Discourse

    Readability and Discourse

    Mother Teresa died 10 years ago Wednesday, but the legacy of the Roman Catholic nun who became a Nobel Peace Prize winner continues to inspire. The order she founded, the Missionaries of Charity, has expanded — the group says it...
  • The Feasts of Israel - Bible Study Resource Center

    The Feasts of Israel - Bible Study Resource Center

    The Lord would allow this to teach Oholibah to abhor the Egyptians as political partners. * The Lord also announced that He would turn Jerusalem over to those whom she had come to hate, namely, the Babylonians. ... Became a...
  • Nurturing Parenting Programs

    Nurturing Parenting Programs

    Babies are primed to relate to people and faces and elicit "bonding" reactions. Birthing processes have changed. ... Continuum of Caring. Alice laughed, "There's no use in trying," she said. "One can't believe in impossible things."
  • Credits  Graphics from www.phsuccessnet.com  Score Slide from Mark

    Credits Graphics from www.phsuccessnet.com Score Slide from Mark

    The type of organism is this. Return Scores Daily Double To Question Credits Graphics from www.phsuccessnet.com Score Slide from Mark E. Damon www.teachnet.com PowerPoint Template from Mrs. Warren, Barnstable Horace Mann Charter School JEOPARDY RULES All questions are from the...
  • REPENTANCE LONDON TEACHING DAY 30 MARCH 2008 Session

    REPENTANCE LONDON TEACHING DAY 30 MARCH 2008 Session

    repentance london teaching day 30 march 2008