How do you differentiate a matrix equation with respect to a vector?
Emily Wong
I'm having lots of trouble piecing this together (from Elements of Statistical Learning by Hastie/Friedman):
I don't understand the step: "[d]ifferentiating w.r.t $B$", specifically how to calculate the derivative of an equation involving matrix products and transposes with respect to a vector. Is this standard matrix calculus?
$\endgroup$2 Answers
$\begingroup$$\newcommand{\mat}[1]{\mathbf{#1}}$ Yes — you can solve this using the standard tools of matrix calculus.
In particular, we can use the following rules (which you can confirm componentwise):
- If matrix $\mat{A}$ does not depend on the entries in vector $\mat x$, then $\frac{\partial}{\partial \mat x}(\mat A\mat x)=\mat A$.
- If the matrices $\mat A$ and $\mat B$ may depend on the entries in vector $\mat x$, then $\frac{\partial }{\partial \mat x}(\mat A\mat B) = \frac{\partial \mat A}{\partial \mat x}\mat B + \mat A\frac{\partial \mat B}{\partial \mat x}$. This holds even in the special case that $\mat{A}$ and $\mat {B}$ are vectors.
- The derivative of a sum is the sum of the derivatives.
- The derivative of the transpose is the transpose of the derivative.
- The transpose of a sum is the sum of the transpose.
Hence, given that the terms $\mathbf{X}$ and $\mathbf{y}$ do not depend on the vector $\beta$, we find the following results:
- $\frac{\partial }{\partial \beta}(\mat y - \mat X \beta ) = -\mat X$
- $\frac{\partial }{\partial \beta}(\mat y - \mat X \beta )^\top = -\mat X^\top$
- $$\begin{align*}\frac{\partial }{\partial \beta}(\mat y - \mat X \beta )^\top(\mat y - \mat X \beta) &= -\mat X^\top (\mat y - \mat X \beta) - (\mat y - \mat X \beta)^\top \mat X\\&=-\mat X^\top \mat y + \mat X^\top\mat X\beta - \mat y^\top \mat X - \beta^\top\mat X^\top \mat X \\ &= (-\mat X^\top \mat y + \mat X^\top\mat X\beta) + (-\mat X^\top \mat y + \mat X^\top\mat X\beta)^\top \\ &= 2(-\mat X^\top \mat y + \mat X^\top \mat X\beta) &\{\alpha^\top = \alpha \text{, when }\alpha\text{ is a scalar}\}\\ &= -2\mat{X}^\top(\mat y - \mat X \beta)&\{\text{factor out }-\mat{X}^\top\} \\\end{align*}$$
And this last term is equal to zero if and only if
$$\mat X^\top (\mat y - \mat X \beta) = 0.$$
This system comprises the “normal equations” that show at what value of $\beta$ the quadratic $(\mat y - \mat X \beta)^\top(\mat y - \mat X \beta)$ has a critical point.
$\endgroup$ 0 $\begingroup$Regardless of how you define differentiation wrt $\beta$, the expected rules still apply.
\begin{align} RSS(\beta) &= (\mathbf y- \mathbf X\beta)^T (\mathbf y- \mathbf X\beta) \\ RSS(\beta)' &= (\mathbf 0 - \mathbf X)^T (\mathbf y- \mathbf X\beta) + (\mathbf y- \mathbf X\beta)^T (\mathbf 0 - \mathbf X) \\ &= -\mathbf X^T (\mathbf y- \mathbf X\beta) - (\mathbf y- \mathbf X\beta)^T \mathbf X \\ \end{align}
Notice that $(\mathbf y- \mathbf X\beta)^T \mathbf X$ is a scalar. Hence
\begin{align} (\mathbf y- \mathbf X\beta)^T \mathbf X &= \left((\mathbf y- \mathbf X\beta)^T \mathbf X \right)^T \\ &= \mathbf X^T (\mathbf y- \mathbf X\beta) \end{align}
In which case $$RSS(\beta)' = -2\mathbf X^T (\mathbf y- \mathbf X\beta)$$
and equation $(2.5)$ follows.
Let $F : \mathbb R^p \to \mathbb R$ and let $\mathbf u$ be a unit vector in $\mathbb R^p$. Then we can define the partial derivative $$F_{\mathbf u}(\beta) = \lim_{\delta \to 0} \frac{1}{\delta}(F(\beta + \delta \mathbf u) - F(\beta))$$
If the value of $F_{\mathbf u}(\beta)$ is independent of the value of $\mathbf u$, then we can define $\dfrac{d}{d\mathbf x}F(\mathbf x) = F_{\mathbf u}(\mathbf x)$
$\endgroup$ 1