Velvet Star Monitor

Standout celebrity highlights with iconic style.

general

Proof that Bellman update is a contraction

Writer Mia Lopez
$\begingroup$

Following the proof that Bellman update is a contraction from Instructor's resources to Artificial Intelligence: A Modern Approach I fail to understand 1 step. Some definitions and proofs that were used in the final step:


definition of max norm: $||U||=\max_s|U(s)|$

definition of Bellman update: $U(s) = R(s) + \gamma\max_{a\in A(s)}\sum_{s'}P(s'|s,a)U(s')$

$U_i$ - vector of utilities for all states at the $i$th iteration.

statement: $|max_a f(a) - max_a g(a)| \leq max_a|f(a)-g(a)|$


$|(BU_i-BU'_i)(s)| = |R(s) + \gamma\underset{a\in A(s)}{\max}|\sum_{s'}P(s'|s,a)U_i(s') - R(s) \\- \gamma\max_{a\in A(s)}\sum_{s'}P(s'|s,a)U'_i(s')| \\= \gamma|\max_{a\in A(s)}\sum_{s'}P(s'|s,a)U_i(s')-\max_{a\in A(s)}\sum_{s'}P(s'|s,a)U'_i(s')| \\ \leq \gamma \max_{a\in A(s)}|\sum_{s'}P(s'|s,a)U_i(s')-\sum_{s'}P(s'|s,a)U'_i(s')| \\ = \gamma|\sum_{s'}P(s'|s,a^*(s))U_i(s')-\sum_{s'}P(s'|s,a^*(s))U'_i(s')|\\ = \gamma|\sum_{s'}P(s'|s,a^*(s))(U_i(s')-U'_i(s')|$


then the final part of proof where we use the above: $||BU_i-BU'_i|| = \max_s|(BU_i-BU'_i)(s)| \\ \leq \gamma\max_s|\sum_{s'}P(s'|s,a^*(s))(U_i(s')-U'_i(s'))| \\ \leq \gamma\max_s|U_i(s)-U'_i(s)|\\ =\gamma||U_i-U'_i||$

How do I get the inequality $\gamma\max_s|\sum_{s'}P(s'|s,a^*(s))(U_i(s')-U'_i(s'))| \leq \gamma\max_s|U_i(s)-U'_i(s)|$?

$\endgroup$

1 Answer

$\begingroup$

It's a bit of an old question, but thought I'd answer it, anyways.

Note that (a) $$ P(s'|s,a^*(s)) \ge 0, \sum_{s'} P(s'|s,a^*(s)) = 1 $$ and (b), that $$ U_i(s') - U_i'(s') \le \max_s | U_i(s) - U_i'(s)| $$ using both of these statements yields $$ \begin{align} \sum_{s'} \left(P(s'|s,a^*(s))(U_i(s') - U_i'(s'))\right) &\le \sum_{s'}\left( P(s'|s,a^*(s))\max_s | U_i(s) - U_i'(s)|\right)\\ &=\left( \sum_{s'}P(s'|s,a^*(s))\right)\max_s | U_i(s) - U_i'(s)|\\ &=\max_s | U_i(s) - U_i'(s)| \end{align} $$ for all $s$, from which the final result follows.

$\endgroup$

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy