A Quarter-Century of Efficient Learnability - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

A Quarter-Century of Efficient Learnability

Description:

View a k-DNF as a disjunction over 'metavariables', learn the disjunction using elimination. ... disjunctions, halfspaces, decision lists, parities, k-DNF, k-CNF. ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 20
Provided by: Ryan1227
Category:

less

Transcript and Presenter's Notes

Title: A Quarter-Century of Efficient Learnability


1
A Quarter-Century of Efficient Learnability
  • Rocco Servedio
  • Columbia University

Valiant 60th Birthday Symposium Bethesda,
Maryland May 30, 2009
2
1984
and of course...
3
(No Transcript)
4
Probably Approximately Correct learningValiant84

Valiant84 presents range of learning models,
oracles
D models (possibly complex) world
typically or
  • Concept class of Boolean functions over
    domain X
  • Unknown target concept to be learned
    from examples
  • Unknown and arbitrary distribution over X

Learner has access to i.i.d. draws from
labeled according to each
belongs to X, i.i.d. drawn from

5
PAC learning concept class
  • Learners goal
  • come up with hypothesis that will
    have high accuracy on future examples.

Efficiently
  • For any target function
  • for any distribution over X,
  • with probability learner outputs
    hypothesisthat is -accurate w.r.t.

Algorithm must be computationally efficient
should run in time
6
So, what can be learned efficiently?
PAC model, and its variants, provide a clean
theoretical framework for studying the
computational complexity of learning
problems. From
The results of learnability theory would then
indicate the maximum granularity of the single
concepts that can be acquired without
programming. This paper attempts to explore
the limits of what is learnable as allowed by
algorithmic complexity.The identification of
these limits is a major goal of the line of work
proposed in this paper.
7
25 years of efficient learnability
(Didnt just ask the question what can be
learned efficiently he did a great deal
towards answering it. (highlight some of these
contributions and how the field has evolved since
then)
In the rest of the 1980s, Valiant
colleagues gave remarkable results on the
abilities and limitations of computationally
efficient learning algorithms. This work
introduced research directions and questions that
continue to be intensively studied to this day.
  • Rest of talk survey some
  • positive results (algorithms)
  • negative results (two flavors of hardness
    results)

8
Positive results learning k-DNF
Theorem Valiant84 k-DNF learnable in
polynomial time for any kO(1). k2 View a
k-DNF as a disjunction over
metavariables, learn the disjunction using
elimination.
25 years later improving this to k
is still a major open question! Much has been
learned in trying for this improvement
9
Poly-time PAC learning, general distributions
  • Decision lists (greedy alg.)Rivest87
  • Halfspaces (poly-time LP)Littlestone87, BEHW89,
  • Parities, integer lattices (Gaussian elim.)
    HelmboldSloanWarmuth92, FischerSimon92
  • Restricted types of branching programs (DL
    parities) ErgunKumarRubinfeld95,
    BshoutyTamonWilson98
  • Geometric concept classes (random projections)
    BshoutyChenHomer94, BGMST98, Vempala99,
  • and more






-

-

-

-
-
-
-
- -
10
General-distribution PAC learning, cont
  • Quasi-poly / sub-exponential-time learning
  • poly-size decision trees EhrenfeuchtHaussler89,
    Blum92
  • poly-size DNF Bshouty96, TaruiTsukiji99,Klivans
    S01
  • intersections of few poly(n)-weight halfspaces
    KlivansODonnellS02
  • PTF method (halfspaces metavariables) - link
    with complexity theory

x3
x1
x5
x1
x5
x4
1
-1
1
-1
1
-1
1
OR
AND
AND
AND
_
_
_
_
x2
x3
x5
x6
x3
x5
x1
x6
x7
-


-

-
-
-

-
-
-
-
-
-
-
-
- -
-
11
Distribution-specific learning
  • Theorem KearnsLiValiant87 monotone Boolean
    functions can be weakly learned (accuracy
    ) in poly time under the uniform
    distribution on
  • Ushered in study of algorithms for
    uniform-distribution and distribution-specific
    learning halfspaces Baum90, DNF Verbeurgt90,
    Jackson95, decision trees KushilevitzMansour93,
    AC0 LinialMansourNisan89, FurstJacksonSmith91,
    extended AC0 JacksonKlivansS02, juntas
    MosselODonnellS03, general monotone functions
    BshoutyTamon96, BlumBurchLangford98,
    ODonnellWimmer09, monotone decision trees
    ODonnellS06, intersections of halfspaces
    BlumKannan94, Vempala97, KwekPitt98,
    KlivansODonnellS08, convex sets, much more
  • Key tool Fourier analysis of Boolean functions
  • Recently come full circle on monotone functions
  • ODonnellWimmer09 poly time,
    accuracy optimal! (by
    BlumBurchLangford98)

1
1
0
12
Other variants
  • After Valiant84, efficient learning algorithms
    studied in many settings
  • Learning in the presence of noise malicious
    Valiant85, agnostic KearnsSchapireSellie93,
    random misclassification AngluinLaird87,
  • Related models Exact learning from queries and
    counterexamples Angluin87, Statistical Query
    Learning Kearns93, many others
  • PAC-style analyses of unsupervised learning
    problems learning discrete distributions
    KMRRSS94, learning mixture distributions
    Dasgupta99, AroraKannan01, many others
  • Evolvability framework Valiant07, Feldman08,
  • Nice algorithmic results in all these settings.

13
Limits of efficient learnabilityis proper
learning feasible?
Proper learning learning algorithm for class
must uses hypotheses from
  • There are efficient proper learning algorithms
    for conjunctions, disjunctions, halfspaces,
    decision lists, parities, k-DNF, k-CNF.
  • What about k-term DNF can we learn
    using k-term DNF as hypotheses?

14
Proper learning is computationally hard
Theorem PittValiant87 If
no poly-time algorithm can learn 3-term DNF
using 3-term DNF hypotheses. Given a graph
reduction produces distribution over
labeled examples such that high-accuracy
3-term DNF iff is
3-colorable. Note can learn 3-term DNF in
poly time using 3-CNF hypotheses! Often a
change of representation can make a difficult
learning task easy.
distribution over (011111, ) (001111,
-) (101111, ) (010111, -) (110111, )
(011101, -)
reduction
15
From 1987
This work showed computational barriers to
learning with restricted representations in
general, not just proper learning
Theorem PittValiant87 Learning k-term DNF
using (2k-3)-term DNF hypotheses is
hard. Opened door to whole range of hardness
results is hard to learn using
hypotheses from
16
to 2009
  • Great progress in recent years using
    sophisticated machinery from hardness of
    approximation.
  • ABFKP04 Hard to learn n-term DNF using
    n100-size OR-of-halfspace hypotheses.
    Feldman06 Holds even if learner can make
    membership queries to target function.
  • KhotSaket08 Hard to (even weakly) learn
    intersection of 2 halfspaces using 100 halfspaces
    as hypothesis
  • If data is corrupted with 1 noise, then
  • FeldmanGopalanKhotPonnuswami08 Hard to (even
    weakly) learn an AND using an AND as hypothesis.
    Same for halfspaces.
  • GopalanKhotSaket07, Viola08 Hard to (even
    weakly) learn a parity even using degree-100
    GF(2) polynomials as hypotheses
  • Active area with lots of ongoing work.

17
Representation-Independent Hardness
Suppose there are no hypothesis restrictions
any poly-size circuit OK. Are there learning
problems that are still hard for computational
reasons?
Yes
  • Valiant84 Existence of pseudorandom functions
    GoldreichGoldwasserMicali84 implies that
    general Boolean circuits are (representation-indep
    endently) hard to learn.

18
PKC and hardness of learning
  • Key insight of KearnsValiant89 Public-key
    cryptosystems ? hard-to-learn functions.
  • Adversary can create labeled examplesof
    by herselfso must not be learnable from
    labeled examples, or else cryptosystem would be
    insecure!
  • Theorem KearnsValiant89 Simple classes of
    functions NC1, TC0, poly-size DFAs are
    inherently hard to learn.

Theorem Regev05, KlivansSherstov06 Really
simple functions poly-size OR of halfspaces
are inherently hard to learn. Closing the gap
Can these results be extended to show that DNF
are inherently hard to learn? Or are DNF
efficiently learnable?
19
Efficient learnability Model and Results
  • Valiant
  • provided an elegant model for the computational
    study of learning
  • followed this up with foundational results on
    what is (and isnt) efficiently learnable
  • These fundamental questions continue to be
    intensively studied and cross-fertilize other
    topics in TCS.

Thank you, Les!
Write a Comment
User Comments (0)
About PowerShow.com