Derivation of Soft Thresholding Operator / Proximal Operator of $ {L}_{1} $ Norm

$\begingroup$

I was going through the derivation of soft threholding at .

It says the three unique solutions for

$\operatorname{arg min} \|x-b\|_2^2 + \lambda\|x\|_1$ is given by

$\operatorname{arg min} \|x-b\|^2 + \lambda x$ assuming $x > 0$

$\operatorname{arg min} \|x-b\|^2 - \lambda x$ assuming $x < 0$

$\operatorname{arg min} \|x-b\|^2 = 0 $ assuming $x = 0$

I am confused in the third case. For $x = 0$, how come the solution is zero. I mean when x = 0, I will have $\operatorname{arg min} \|x-b\|^2$. If I solve this, I will get x = b, the solution then how come it is $x = 0$

$\endgroup$ 1

3 Answers

$\begingroup$

I wrote a more detailed derivation of the soft-thresholding operator, following the source you mention and other ones. I hope it helps.

The soft-thresholding is just the proximal mapping of the $l_1$-norm. Let $f(x) = \lambda\|x\|_1$, then the proximal mapping of $f$ is defined as

\begin{equation} \operatorname{prox}_f(x) = \operatorname{argmin}_z\{\frac{1}{2}\|x-z\|^2_2 + \lambda\|z\|_1 \} \end{equation}

The optimality condition for the previous problem is

\begin{equation} 0 \in \nabla(\frac{1}{2}\|x-z\|^2_2) + \partial(\lambda\|z\|_1) \Leftrightarrow 0 \in z-x + \lambda\partial\|z\|_1 \end{equation}

The $l_1$-norm is separable and thus we can consider each of its components separately. Let's examine first the case where $z_i \neq 0$. Then, $\partial \|z_i\|=\operatorname{sign}(z_i)$ and the optimum $z_i^*$ is obtained as

\begin{equation} 0 = z_i-x_i + \lambda \operatorname{sign}(z_i) \Leftrightarrow z_i^* = x_i - \lambda \operatorname{sign}(z_i^*) \end{equation}

Note also that if $z_i^* < 0$, then $x_i < -\lambda$ and equivalently if $z_i^* > 0 \Rightarrow x_i > \lambda$. Thus, $|x_i| > \lambda$ and $\operatorname{sign}(z_i^*) = \operatorname{sign}(x_i)$. Substituting in the previous equation we get

\begin{equation} z_i^* = x_i - \lambda \operatorname{sign}(x_i) \end{equation}

In the case where $z_i = 0$, the subdifferential of the $l_1$-norm is the interval $[-1,1]$ and the optimality condition is

\begin{equation} 0 \in -x_i + \lambda[-1,1] \Leftrightarrow x_i \in [-\lambda,\lambda] \Leftrightarrow |x_i| \leq \lambda \end{equation}

Putting all together we get

\begin{equation} [\operatorname{prox}_f(x)]_i = z_i^* = \left\{ \begin{array}{lr} 0 & \text{if } |x_i| \leq \lambda \\ x_i - \lambda \operatorname{sign}(x_i) & \text{if } |x_i| > \lambda \end{array}\right. \end{equation}

The previous equation can also be written as

\begin{align*} [\operatorname{prox}_f(x)]_i &= \operatorname{sign}(x_i)\max(|x_i|-\lambda, 0) \\ &= \operatorname{sign}(x_i)(|x_i|-\lambda)_+ \end{align*}

where $(\cdot)_+$ denotes the positive part.

$\endgroup$ 6 $\begingroup$

Suppose $\hat{x} \in \mathbb R$ and we want to find a minimizer of $f(x) = |x| + \frac{1}{2t}(x - \hat{x})^2$. We want to make $|x|$ small without straying too far from $\hat{x}$.

Consider the case where $\hat{x} > 0$.

In this case there's no reason to let $x$ be negative. Because if $x$ is negative, then $-x$ is closer to $\hat{x}$ and $-x$ has the same absolute value as $x$.

So we need to minimize the quadratic function $g(x) = x + \frac{1}{2t} (x - \hat{x})^2$ over $[0,\infty)$. The graph of $g$ is a parabola that opens upwards, and we can minimize $g$ using techniques from pre-calculus.

The case where $\hat{x} < 0$ is similar.

$\endgroup$ 1 $\begingroup$

I wonder about something.

One could write the Soft Thresholding as:

$$ x - \lambda \operatorname{sgn} \left( x \right) \min \left( \left| \frac{x}{\lambda} \right|, 1 \right) $$

This is of the form $ x - \lambda g \left( \frac{x}{\lambda} \right) $.
Using the trick done in my answer to Closed Form Solution of $ \arg \min_{x} {\left\| x − y \right\|}_{2}^{2} + \lambda {\left\| x \right\|}_{2} $ - Tikhonov Regularized Least Squares - it might suggest that $ \operatorname{sgn} \left( x \right) \min \left( \left| x \right|, 1 \right) $ is the projection to $ {L}_{1} $ Unit Ball (Which doesn't have closed form solution).

So what's wrong here?

Remark
As noted below - The formula states the projection to the dual ball.

$\endgroup$ 1

Velvet Star Monitor

Derivation of Soft Thresholding Operator / Proximal Operator of $ {L}_{1} $ Norm

3 Answers

Your Answer

Sign up or log in

Post as a guest

Similar Journal

Do lure modules and incense cause new pokemon to spawn near you, or existing pokemon to be drawn to you?

What is the strongest fixed location equipment you can obtain at Level 1?

What's the best strategy to keep the chaos low?

Where do you find slime chunks?