Derivation of Soft Thresholding Operator / Proximal Operator of $ {L}_{1} $ Norm
Olivia Zamora
I was going through the derivation of soft threholding at .
It says the three unique solutions for
$\operatorname{arg min} \|x-b\|_2^2 + \lambda\|x\|_1$ is given by
$\operatorname{arg min} \|x-b\|^2 + \lambda x$ assuming $x > 0$
$\operatorname{arg min} \|x-b\|^2 - \lambda x$ assuming $x < 0$
$\operatorname{arg min} \|x-b\|^2 = 0 $ assuming $x = 0$
I am confused in the third case. For $x = 0$, how come the solution is zero. I mean when x = 0, I will have $\operatorname{arg min} \|x-b\|^2$. If I solve this, I will get x = b, the solution then how come it is $x = 0$
$\endgroup$ 13 Answers
$\begingroup$I wrote a more detailed derivation of the soft-thresholding operator, following the source you mention and other ones. I hope it helps.
The soft-thresholding is just the proximal mapping of the $l_1$-norm. Let $f(x) = \lambda\|x\|_1$, then the proximal mapping of $f$ is defined as
\begin{equation} \operatorname{prox}_f(x) = \operatorname{argmin}_z\{\frac{1}{2}\|x-z\|^2_2 + \lambda\|z\|_1 \} \end{equation}
The optimality condition for the previous problem is
\begin{equation} 0 \in \nabla(\frac{1}{2}\|x-z\|^2_2) + \partial(\lambda\|z\|_1) \Leftrightarrow 0 \in z-x + \lambda\partial\|z\|_1 \end{equation}
The $l_1$-norm is separable and thus we can consider each of its components separately. Let's examine first the case where $z_i \neq 0$. Then, $\partial \|z_i\|=\operatorname{sign}(z_i)$ and the optimum $z_i^*$ is obtained as
\begin{equation} 0 = z_i-x_i + \lambda \operatorname{sign}(z_i) \Leftrightarrow z_i^* = x_i - \lambda \operatorname{sign}(z_i^*) \end{equation}
Note also that if $z_i^* < 0$, then $x_i < -\lambda$ and equivalently if $z_i^* > 0 \Rightarrow x_i > \lambda$. Thus, $|x_i| > \lambda$ and $\operatorname{sign}(z_i^*) = \operatorname{sign}(x_i)$. Substituting in the previous equation we get
\begin{equation} z_i^* = x_i - \lambda \operatorname{sign}(x_i) \end{equation}
In the case where $z_i = 0$, the subdifferential of the $l_1$-norm is the interval $[-1,1]$ and the optimality condition is
\begin{equation} 0 \in -x_i + \lambda[-1,1] \Leftrightarrow x_i \in [-\lambda,\lambda] \Leftrightarrow |x_i| \leq \lambda \end{equation}
Putting all together we get
\begin{equation} [\operatorname{prox}_f(x)]_i = z_i^* = \left\{ \begin{array}{lr} 0 & \text{if } |x_i| \leq \lambda \\ x_i - \lambda \operatorname{sign}(x_i) & \text{if } |x_i| > \lambda \end{array}\right. \end{equation}
The previous equation can also be written as
\begin{align*} [\operatorname{prox}_f(x)]_i &= \operatorname{sign}(x_i)\max(|x_i|-\lambda, 0) \\ &= \operatorname{sign}(x_i)(|x_i|-\lambda)_+ \end{align*}
where $(\cdot)_+$ denotes the positive part.
$\endgroup$ 6 $\begingroup$Suppose $\hat{x} \in \mathbb R$ and we want to find a minimizer of $f(x) = |x| + \frac{1}{2t}(x - \hat{x})^2$. We want to make $|x|$ small without straying too far from $\hat{x}$.
Consider the case where $\hat{x} > 0$.
In this case there's no reason to let $x$ be negative. Because if $x$ is negative, then $-x$ is closer to $\hat{x}$ and $-x$ has the same absolute value as $x$.
So we need to minimize the quadratic function $g(x) = x + \frac{1}{2t} (x - \hat{x})^2$ over $[0,\infty)$. The graph of $g$ is a parabola that opens upwards, and we can minimize $g$ using techniques from pre-calculus.
The case where $\hat{x} < 0$ is similar.
$\endgroup$ 1 $\begingroup$I wonder about something.
One could write the Soft Thresholding as:
$$ x - \lambda \operatorname{sgn} \left( x \right) \min \left( \left| \frac{x}{\lambda} \right|, 1 \right) $$
This is of the form $ x - \lambda g \left( \frac{x}{\lambda} \right) $.
Using the trick done in my answer to Closed Form Solution of $ \arg \min_{x} {\left\| x − y \right\|}_{2}^{2} + \lambda {\left\| x \right\|}_{2} $ - Tikhonov Regularized Least Squares - it might suggest that $ \operatorname{sgn} \left( x \right) \min \left( \left| x \right|, 1 \right) $ is the projection to $ {L}_{1} $ Unit Ball (Which doesn't have closed form solution).
So what's wrong here?
Remark
As noted below - The formula states the projection to the dual ball.