Velvet Star Monitor

Standout celebrity highlights with iconic style.

general

Why is log-of-sum-of-exponentials $f(x)=\log\left(\sum_{i=1}^n e^ {x_i}\right)$ a convex function for $x \in\mathbb R^n$?

Writer Emily Wong
$\begingroup$

How to prove $f(x)=\log\left(\displaystyle\sum_{i=1}^n e^{x_i}\right)$ is a convex function?

EDIT1: for above function $f(x)$, following inequalities hold:

$$\max\{x_1,x_2,\ldots,x_n\}\leqslant f(x)\leqslant\max\{x_1,x_2,\ldots,x_n\}+\log n$$

and I have tried proving its convexity via definition of a convex function with above inequalities, but that didn't work.

EDIT2: I have posted my answers below.

$\endgroup$ 8

6 Answers

$\begingroup$

Proof:

Let $u_i=e^ {x_i} ,v_i=e^ {y_i}$. So $f(\theta x+(1-\theta)y)=log(\sum_ {i=1}^n e^{\theta x_i + (1-\theta)y_i})=log(\sum_ {i=1}^n u_i^ \theta v_i^{(1-\theta)})$

From Hölder's inequality:

$$\sum_ {i=1}^n x_iy_i \le (\sum_ {i=1}^n|x_i|^p)^{\frac{1}{p}} \cdot (\sum_ {i=1}^n|x_i|^q)^{\frac{1}{q}}$$ where $1/p+1/q=1$.

Applying this inequality to $f(\theta x+(1-\theta)y)$: $$log(\sum_ {i=1}^n u_i^ \theta v_i^{(1-\theta)}) \le log[(\sum_ {i=1}^n u_i^ {\theta \cdot \frac{1}{\theta}})^ \theta \cdot (\sum_ {i=1}^n v_i^ {1-\theta \cdot \frac{1}{1-\theta}})^ {1-\theta}]$$ Right formula can be reduced to:

$$\theta log\sum_ {i=1}^n u_i+(1-\theta)log\sum_ {i=1}^n v_i$$

Here I regard $\theta$ as $\frac{1}{p}$ and $1-\theta$ as $\frac{1}{q}$.

So I achieve that $f(\theta x+(1-\theta)y) \le \theta f(x) + (1-\theta)f(y)$.

$\endgroup$ $\begingroup$

It is enough to show that $$\frac{1}{2} \log (\sum \exp x_i) + \frac{1}{2}\log (\sum \exp y_i)\ge \log (\sum \exp\frac{x_i+y_i}{2})$$or, with the substitution $\exp\frac{x_i}{2} = a_i$, $\exp\frac{y_i}{2} = b_i$$$(\sum a_i^2)^{\frac{1}{2}}(\sum b_i^2)^{\frac{1}{2}}\ge \sum a_i b_i$$

$\endgroup$ 4 $\begingroup$

Another way to prove the convexity of this function is to use the Jensen's Inequality which states that a function is convex if and only if

$$f(tX+(1-t)Y) \le t f(X) + (1-t)f(Y)$$

Now let $X$ be represented by the vector $({X_1, X_2, X_3,... X_n})$,

and let $Y$ be represented by the vector $({Y_1, Y_2, Y_3,... Y_n})$

Let $t = \dfrac{1}{2}$

$$f(tX+(1-t)Y) = \log\left(\sum_{i=1}^{n} e^{\frac{X_i+Y_i}{2}}\right)$$

$$\text{RHS} = \frac{1}{2} \log\left(\sum_{i = 1}^{n} e^{X_i}\right)+ \frac{1}{2} \log\left(\sum_{i = 1}^{n} e^{Y_i}\right)$$

$$\text{RHS} = \frac{1}{2} \log\left(\sum_{i = 1}^{n} e^{X_i}\sum_{i = 1}^{n} e^{Y_i}\right)$$

RHS contains more cross product terms than the LHS thus making it larger than LHS and hence the function is convex.

$\endgroup$ 8 $\begingroup$

This answer is similar to the answer written by @Nicholas, but I'm including more details.

A nice fact about the logSumExp function $f$ is that its gradient is the softmax function $S$:$$ \nabla f(x) = S(x) = \begin{bmatrix} \frac{e^{x_1}}{e^{x_1} + \cdots + e^{x_n}} \\ \vdots \\ \frac{e^{x_n}}{e^{x_1} + \cdots +e^{x_n}} \end{bmatrix}. $$The Hessian of $f$ is the matrix $S'(x)$, and a nice fact about the softmax function is that$$ S'(x) = \text{diag}(S(x)) - S(x) S(x)^T. $$If we can show that $S'(x)$ is positive semidefinite, it will follow that $f$ is convex.

In other words, we need to show that if $v \in \mathbb R^n$, then $v^T S'(x) v \geq 0$. But notice that\begin{align} & v^T S'(x) v \geq 0 \\ \iff & v^T \text{diag}(S(x)) v \geq v^T S(x) S(x)^T v \\ \iff & \sum_{i=1}^n \left( \frac{e^{x_i}}{e^{x_1} + \cdots + e^{x_n}}\right) v_i^2 \geq (S(x)^T v )^2 \\ \iff & \sum_{i=1}^n \left( \frac{e^{x_i}}{e^{x_1} + \cdots + e^{x_n}}\right) v_i^2 \geq \left( \sum_{i=1}^n v_i \cdot \frac{e^{x_i}}{e^{x_1} + \cdots + e^{x_n}} \right)^2 \\ \iff & \left(\sum_{i=1}^n e^{x_i} v_i^2 \right) \left(\sum_{i=1}^n e^{x_i}\right) \geq \left(\sum_{i=1}^n v_i e^{x_i} \right)^2 \end{align}This last inequality is true, as can be seen by applying the Cauchy-Schwarz inequality to the vectors$$ a = \begin{bmatrix} \sqrt{e^{x_1}} \\ \vdots \\ \sqrt{e^{x_n}} \end{bmatrix}, \quad b = \begin{bmatrix} v_1 \sqrt{e^{x_1}} \\ \vdots \\ v_n \sqrt{e^{x_n}} \end{bmatrix}. $$

$\endgroup$ 1 $\begingroup$

For a multivariate function to be convex, it's equivalent to show that its Hessian matrix is positive semi-definite. That is, you can calculate $\nabla^2 f(\mathbf{x})$ here and show it is positive semi-definite.

This can be proved using Cauchy Schwarz inequality as shown here.

$\endgroup$ 2 $\begingroup$

I have a preference for proving that the Hessian matrix is non negative as fully developed by @littleO. Once you have shown that the gradient is the softmax function $S(x)$ and the Hessian matrix has the expression$$ \nabla^2 f(x) = \operatorname{diag}(S(x)) - S(x)S(x)^T, $$you can simply use the convexity of the real map $t\mapsto t^2$ to conclude that$\nabla^2 f(x)$ is non negative: Since $S(x)^T v$ is a convex linear combination of the coordinates of the vector $v$ (that is $S(x)_k\geq 0$ and $\sum_{k=1}^n S(x)_k =1$), one has$$ v^T S(x) S(x)^T v = (S(x)^T v)^2 = \left(\sum_{k=1}^n S(x)_k v_k\right)^2 \leq \sum_{k=1}^n S(x)_k v_k^2 = v^T \operatorname{diag}(S(x)) v. $$

$\endgroup$

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy