- SQ Lower Bounds for Non-Gaussian Component Analysis with Weaker Assumptions
[abstract]
Ilias Diakonikolas,
Daniel M. Kane,
Lisheng Ren,
and
Yuxin Sun
To appear in Advances in Neural Information Processing Systems (NeurIPS 2023)
We study the complexity of Non-Gaussian Component Analysis (NGCA) in the Statistical Query (SQ) model.
Prior work developed a methodology to prove SQ lower bounds for NGCA that have been applicable to a wide range of contexts.
In particular, it was known that for any univariate distribution $A$ satisfying certain conditions,
distinguishing between a standard multivariate Gaussian and a distribution
that behaves like $A$ in a random hidden direction and like a standard Gaussian in the orthogonal complement, is SQ-hard.
The required conditions were that (1) $A$ matches many low-order moments with a standard Gaussian,
and (2) the chi-squared norm of $A$ with respect to the standard Gaussian is finite.
While the moment-matching condition is clearly necessary for hardness, the chi-squared condition was only required for technical reasons.
In this work, we establish that the latter condition is indeed not necessary.
In particular, we prove near-optimal SQ lower bounds for NGCA under the moment-matching condition only.
- Near-Optimal Cryptographic Hardness of Agnostically Learning Halfspaces and ReLU Regression under Gaussian Marginals
[abstract]
[arxiv]
Ilias Diakonikolas,
Daniel M. Kane, and Lisheng Ren
Advances in International Conference on Machine Learning (ICML 2023)
We study the task of agnostically learning halfspaces under the Gaussian distribution.
Specifically, given labeled examples $(\mathbf{x},y)$ from an unknown distribution on $\mathbb{R}^n \times \{\pm 1 \}$,
whose marginal distribution on $\mathbf{x}$ is the standard Gaussian and the labels $y$ can be arbitrary,
the goal is to output a hypothesis with 0-1 loss $\mathrm{OPT}+\epsilon$, where $\mathrm{OPT}$ is the 0-1 loss of the best-fitting halfspace.
We prove a near-optimal computational hardness result for this task, under the widely believed
sub-exponential time hardness of the Learning with Errors (LWE) problem. Prior hardness results are either
qualitatively suboptimal or apply to restricted families of algorithms. Our techniques extend to
yield near-optimal lower bounds for related problems, including ReLU regression.
- Cryptographic Hardness of Learning Halfspaces with Massart Noise
[abstract]
[arxiv]
Ilias Diakonikolas,
Daniel M. Kane,
Pasin Manurangsi, and Lisheng Ren
Advances in Neural Information Processing Systems (NeurIPS 2022)
We study the complexity of PAC learning halfspaces in the presence of Massart noise.
In this problem, we are given i.i.d. labeled examples
$(\mathbf{x}, y) \in \mathbb{R}^N \times \{ \pm 1\}$,
where the distribution of $\mathbf{x}$ is arbitrary
and the label $y$ is a Massart corruption
of $f(\mathbf{x})$, for an unknown halfspace $f: \mathbb{R}^N \to \{ \pm 1\}$,
with flipping probability $\eta(\mathbf{x}) \leq \eta < 1/2$.
The goal of the learner is to compute a hypothesis with small 0-1 error.
Our main result is the first computational hardness result for this learning problem.
Specifically, assuming the (widely believed) subexponential-time hardness
of the Learning with Errors (LWE) problem, we show that no polynomial-time
Massart halfspace learner can achieve error better than $\Omega(\eta)$,
even if the optimal 0-1 error is small, namely
${\rm OPT} = 2^{-\log^{c} (N)}$
for any universal constant $c \in (0, 1)$.
Prior work had provided qualitatively similar evidence of hardness in the
Statistical Query model. Our computational hardness result
essentially resolves the polynomial PAC learnability of Massart halfspaces,
by showing that known efficient learning algorithms
for the problem are nearly best possible.
- SQ Lower Bounds for Learning Single Neurons with Massart Noise
[abstract]
[arxiv]
Ilias Diakonikolas,
Daniel M. Kane,
Lisheng Ren,
and
Yuxin Sun
Advances in Neural Information Processing Systems (NeurIPS 2022)
We study the problem of PAC learning a single neuron in the presence of Massart noise.
Specifically, for a known activation function $f: \mathbb{R} \to \mathbb{R}$, the learner is given
access to labeled examples $({\bf x}, y) \in \mathbb{R}^d \times \mathbb{R}$, where the marginal distribution of ${\bf x}$ is
arbitrary and the corresponding label $y$ is a Massart corruption of $f(\langle {\bf w}, {\bf x} \rangle)$.
The goal of the learner is to output a hypothesis $h: \mathbb{R}^d \to \mathbb{R}$ with small squared loss.
For a range of activation functions, including ReLUs,
we establish super-polynomial Statistical Query (SQ) lower bounds for this learning problem.
In more detail, we prove that no efficient SQ algorithm can approximate
the optimal error within any constant factor.
Our main technical contribution is a novel SQ-hard construction
for learning $\{ \pm 1\}$-weight Massart halfspaces on the Boolean hypercube
that is interesting on its own right.
- Hardness of Learning a Single Neuron with Adversarial Label Noise
Ilias Diakonikolas,
Daniel M. Kane,
Pasin Manurangsi, and Lisheng Ren
Proceedings of the 25th International Conference on Artificial Intelligence and Statistics (AISTATS 2022)
Selected for Oral Presentation