How do you differentiate a matrix equation with respect to a vector?

$\begingroup$

I'm having lots of trouble piecing this together (from Elements of Statistical Learning by Hastie/Friedman):

I don't understand the step: "[d]ifferentiating w.r.t $B$", specifically how to calculate the derivative of an equation involving matrix products and transposes with respect to a vector. Is this standard matrix calculus?

$\endgroup$

2 Answers

$\begingroup$

$\newcommand{\mat}[1]{\mathbf{#1}}$ Yes — you can solve this using the standard tools of matrix calculus.

In particular, we can use the following rules (which you can confirm componentwise):

If matrix $\mat{A}$ does not depend on the entries in vector $\mat x$, then $\frac{\partial}{\partial \mat x}(\mat A\mat x)=\mat A$.
If the matrices $\mat A$ and $\mat B$ may depend on the entries in vector $\mat x$, then $\frac{\partial }{\partial \mat x}(\mat A\mat B) = \frac{\partial \mat A}{\partial \mat x}\mat B + \mat A\frac{\partial \mat B}{\partial \mat x}$. This holds even in the special case that $\mat{A}$ and $\mat {B}$ are vectors.
The derivative of a sum is the sum of the derivatives.
The derivative of the transpose is the transpose of the derivative.
The transpose of a sum is the sum of the transpose.

Hence, given that the terms $\mathbf{X}$ and $\mathbf{y}$ do not depend on the vector $\beta$, we find the following results:

$\frac{\partial }{\partial \beta}(\mat y - \mat X \beta ) = -\mat X$
$\frac{\partial }{\partial \beta}(\mat y - \mat X \beta )^\top = -\mat X^\top$
$$\begin{align*}\frac{\partial }{\partial \beta}(\mat y - \mat X \beta )^\top(\mat y - \mat X \beta) &= -\mat X^\top (\mat y - \mat X \beta) - (\mat y - \mat X \beta)^\top \mat X\\&=-\mat X^\top \mat y + \mat X^\top\mat X\beta - \mat y^\top \mat X - \beta^\top\mat X^\top \mat X \\ &= (-\mat X^\top \mat y + \mat X^\top\mat X\beta) + (-\mat X^\top \mat y + \mat X^\top\mat X\beta)^\top \\ &= 2(-\mat X^\top \mat y + \mat X^\top \mat X\beta) &\{\alpha^\top = \alpha \text{, when }\alpha\text{ is a scalar}\}\\ &= -2\mat{X}^\top(\mat y - \mat X \beta)&\{\text{factor out }-\mat{X}^\top\} \\\end{align*}$$

And this last term is equal to zero if and only if

$$\mat X^\top (\mat y - \mat X \beta) = 0.$$

This system comprises the “normal equations” that show at what value of $\beta$ the quadratic $(\mat y - \mat X \beta)^\top(\mat y - \mat X \beta)$ has a critical point.

$\endgroup$ 0 $\begingroup$

Regardless of how you define differentiation wrt $\beta$, the expected rules still apply.

\begin{align} RSS(\beta) &= (\mathbf y- \mathbf X\beta)^T (\mathbf y- \mathbf X\beta) \\ RSS(\beta)' &= (\mathbf 0 - \mathbf X)^T (\mathbf y- \mathbf X\beta) + (\mathbf y- \mathbf X\beta)^T (\mathbf 0 - \mathbf X) \\ &= -\mathbf X^T (\mathbf y- \mathbf X\beta) - (\mathbf y- \mathbf X\beta)^T \mathbf X \\ \end{align}

Notice that $(\mathbf y- \mathbf X\beta)^T \mathbf X$ is a scalar. Hence

\begin{align} (\mathbf y- \mathbf X\beta)^T \mathbf X &= \left((\mathbf y- \mathbf X\beta)^T \mathbf X \right)^T \\ &= \mathbf X^T (\mathbf y- \mathbf X\beta) \end{align}

In which case $$RSS(\beta)' = -2\mathbf X^T (\mathbf y- \mathbf X\beta)$$

and equation $(2.5)$ follows.

Let $F : \mathbb R^p \to \mathbb R$ and let $\mathbf u$ be a unit vector in $\mathbb R^p$. Then we can define the partial derivative $$F_{\mathbf u}(\beta) = \lim_{\delta \to 0} \frac{1}{\delta}(F(\beta + \delta \mathbf u) - F(\beta))$$

If the value of $F_{\mathbf u}(\beta)$ is independent of the value of $\mathbf u$, then we can define $\dfrac{d}{d\mathbf x}F(\mathbf x) = F_{\mathbf u}(\mathbf x)$

$\endgroup$ 1

Velvet Star Monitor

How do you differentiate a matrix equation with respect to a vector?

2 Answers

Your Answer

Sign up or log in

Post as a guest

Similar Journal

Should I buy the Spectre Requisition Terminal weapons?

What's the point of being lucky?

Persona 3 Portable - 10/21 atm, reached tartarus – What do I do?

Ability timers increasing when overused