2nov2024
This commit is contained in:
parent
be7844b4f3
commit
eea09ec9b8
15 changed files with 35749 additions and 63 deletions
|
@ -57,7 +57,7 @@ This is a Markov Process but we also have a reward function! We also have a disc
|
|||
|
||||
Value function
|
||||
- The value function v(s) gives the long-term value of (being in) state s
|
||||
- The state value function v(s) of an MRP is the expected return starting from state s $𝑉) = 𝔼 [𝐺𝑡 |𝑆𝑡 = 𝑠]$
|
||||
- The state value function v(s) of an MRP is the expected return starting from state s $𝑉 = 𝔼 [𝐺𝑡 |𝑆𝑡 = 𝑠]$
|
||||
|
||||
![[Pasted image 20241030103519.png]]
|
||||
![[Pasted image 20241030103706.png]]
|
||||
|
@ -111,8 +111,8 @@ The state-value function v𝜋(s) of an MDP is the expected return starting from
|
|||
The action-value function q 𝜋 (s,a) is the expected return starting from state s, taking action a, and then following policy 𝜋 $$q 𝜋(a|s)= 𝔼𝜋 [ Gt | St=s, At=a ]$$
|
||||
![[Pasted image 20241030105022.png]]
|
||||
|
||||
- The state-value function can again be decomposed into immediate reward plus discounted value of successor state $$v\pi(s) = E\pi[Rt+1 + v⇡(St+1) | St = s]$$
|
||||
- The action-value function can similarly be decomposed $$q\pi(s, a) = E\pi [Rt+1 + q⇡(St+1, At+1) | St = s, At = a]$$
|
||||
- The state-value function can again be decomposed into immediate reward plus discounted value of successor state $$v_{\pi}(s) = E_{\pi}[R_{t+1} + v_{\pi}(S_{t+1}) | St = s]$$
|
||||
- The action-value function can similarly be decomposed $$q_{\pi}(s, a) = E_{\pi}[R_{t+1} + q_{\pi}(S_{t+1}, A_{t+1}) | St = s, At = a]$$
|
||||
![[Pasted image 20241030105148.png]]![[Pasted image 20241030105207.png]]
|
||||
![[Pasted image 20241030105216.png]]
|
||||
putting all together
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue