- Fast Co-Training under Weak Dependence via Stream-Based Active Learning
[abstract]
Ilias Diakonikolas,
Mingchen Ma,
Lisheng Ren,
and
Christos Tzamos
Advances in International Conference on Machine Learning (ICML 2024)
Selected for Oral Presentation
Co-training is a classical semi-supervised learning method which only requires a small number of labeled examples for learning,
under reasonable assumptions. Despite extensive literature on the topic,
very few hypothesis classes are known to be provably efficiently learnable via co-training,
even under very strong distributional assumptions.
In this work, we study the co-training problem in the stream-based active learning model.
We show that a range of natural concept classes are efficiently learnable via co-training, in terms of both label efficiency and computational efficiency.
We provide an efficient reduction of co-training under the standard assumption of weak dependence, in the stream-based active model, to online classification.
As a corollary, we obtain efficient co-training algorithms with error independent label complexity for every concept class class efficiently learnable in the mistake bound online model.
Our framework also gives co-training algorithms with label complexity $\tilde{O}(d\log (1/\epsilon))$ for any concept class with VC dimension $d$,
though in general this reduction is not computationally efficient.
Finally, using additional ideas from online learning, we design the first efficient co-training algorithms with label complexity $\tilde{O}(d^2\log (1/\epsilon))$ for several concept classes,
including unions of intervals and homogeneous halfspaces.
- SQ Lower Bounds for Non-Gaussian Component Analysis with Weaker Assumptions
[abstract]
Ilias Diakonikolas,
Daniel M. Kane,
Lisheng Ren,
and
Yuxin Sun
Neural Information Processing Systems (NeurIPS 2023)
We study the complexity of Non-Gaussian Component Analysis (NGCA) in the Statistical Query (SQ) model.
Prior work developed a methodology to prove SQ lower bounds for NGCA that have been applicable to a wide range of contexts.
In particular, it was known that for any univariate distribution $A$ satisfying certain conditions,
distinguishing between a standard multivariate Gaussian and a distribution
that behaves like $A$ in a random hidden direction and like a standard Gaussian in the orthogonal complement, is SQ-hard.
The required conditions were that (1) $A$ matches many low-order moments with a standard Gaussian,
and (2) the chi-squared norm of $A$ with respect to the standard Gaussian is finite.
While the moment-matching condition is clearly necessary for hardness, the chi-squared condition was only required for technical reasons.
In this work, we establish that the latter condition is indeed not necessary.
In particular, we prove near-optimal SQ lower bounds for NGCA under the moment-matching condition only.
- Near-Optimal Cryptographic Hardness of Agnostically Learning Halfspaces and ReLU Regression under Gaussian Marginals
[abstract]
[arxiv]
Ilias Diakonikolas,
Daniel M. Kane, and Lisheng Ren
Advances in International Conference on Machine Learning (ICML 2023)
We study the task of agnostically learning halfspaces under the Gaussian distribution.
Specifically, given labeled examples $(\mathbf{x},y)$ from an unknown distribution on $\mathbb{R}^n \times \{\pm 1 \}$,
whose marginal distribution on $\mathbf{x}$ is the standard Gaussian and the labels $y$ can be arbitrary,
the goal is to output a hypothesis with 0-1 loss $\mathrm{OPT}+\epsilon$, where $\mathrm{OPT}$ is the 0-1 loss of the best-fitting halfspace.
We prove a near-optimal computational hardness result for this task, under the widely believed
sub-exponential time hardness of the Learning with Errors (LWE) problem. Prior hardness results are either
qualitatively suboptimal or apply to restricted families of algorithms. Our techniques extend to
yield near-optimal lower bounds for related problems, including ReLU regression.
- Cryptographic Hardness of Learning Halfspaces with Massart Noise
[abstract]
[arxiv]
Ilias Diakonikolas,
Daniel M. Kane,
Pasin Manurangsi, and Lisheng Ren
Advances in Neural Information Processing Systems (NeurIPS 2022)
We study the complexity of PAC learning halfspaces in the presence of Massart noise.
In this problem, we are given i.i.d. labeled examples
$(\mathbf{x}, y) \in \mathbb{R}^N \times \{ \pm 1\}$,
where the distribution of $\mathbf{x}$ is arbitrary
and the label $y$ is a Massart corruption
of $f(\mathbf{x})$, for an unknown halfspace $f: \mathbb{R}^N \to \{ \pm 1\}$,
with flipping probability $\eta(\mathbf{x}) \leq \eta < 1/2$.
The goal of the learner is to compute a hypothesis with small 0-1 error.
Our main result is the first computational hardness result for this learning problem.
Specifically, assuming the (widely believed) subexponential-time hardness
of the Learning with Errors (LWE) problem, we show that no polynomial-time
Massart halfspace learner can achieve error better than $\Omega(\eta)$,
even if the optimal 0-1 error is small, namely
${\rm OPT} = 2^{-\log^{c} (N)}$
for any universal constant $c \in (0, 1)$.
Prior work had provided qualitatively similar evidence of hardness in the
Statistical Query model. Our computational hardness result
essentially resolves the polynomial PAC learnability of Massart halfspaces,
by showing that known efficient learning algorithms
for the problem are nearly best possible.
- SQ Lower Bounds for Learning Single Neurons with Massart Noise
[abstract]
[arxiv]
Ilias Diakonikolas,
Daniel M. Kane,
Lisheng Ren,
and
Yuxin Sun
Advances in Neural Information Processing Systems (NeurIPS 2022)
We study the problem of PAC learning a single neuron in the presence of Massart noise.
Specifically, for a known activation function $f: \mathbb{R} \to \mathbb{R}$, the learner is given
access to labeled examples $({\bf x}, y) \in \mathbb{R}^d \times \mathbb{R}$, where the marginal distribution of ${\bf x}$ is
arbitrary and the corresponding label $y$ is a Massart corruption of $f(\langle {\bf w}, {\bf x} \rangle)$.
The goal of the learner is to output a hypothesis $h: \mathbb{R}^d \to \mathbb{R}$ with small squared loss.
For a range of activation functions, including ReLUs,
we establish super-polynomial Statistical Query (SQ) lower bounds for this learning problem.
In more detail, we prove that no efficient SQ algorithm can approximate
the optimal error within any constant factor.
Our main technical contribution is a novel SQ-hard construction
for learning $\{ \pm 1\}$-weight Massart halfspaces on the Boolean hypercube
that is interesting on its own right.
- Hardness of Learning a Single Neuron with Adversarial Label Noise
Ilias Diakonikolas,
Daniel M. Kane,
Pasin Manurangsi, and Lisheng Ren
Proceedings of the 25th International Conference on Artificial Intelligence and Statistics (AISTATS 2022)
Selected for Oral Presentation