vault backup: 2024-10-31 13:26:43

This commit is contained in:
Marco Realacci 2024-10-31 13:26:43 +01:00
parent 4aaef01b22
commit 382617bf06
59 changed files with 490 additions and 148 deletions

View file

@ -8,17 +8,49 @@
"type": "tabs", "type": "tabs",
"children": [ "children": [
{ {
"id": "ba82a2242fc4a714", "id": "1716ddb2952202fc",
"type": "leaf", "type": "leaf",
"state": { "state": {
"type": "markdown", "type": "markdown",
"state": { "state": {
"file": "Autonomous Networking/notes/q&a.md", "file": "Biometric Systems/notes/6. Face recognition 2D.md",
"mode": "source", "mode": "source",
"source": false "source": false
}, },
"icon": "lucide-file", "icon": "lucide-file",
"title": "q&a" "title": "6. Face recognition 2D"
}
},
{
"id": "1226605da2216595",
"type": "leaf",
"state": {
"type": "pdf",
"state": {
"file": "Biometric Systems/slides/LEZIONE6_Face recognition2D.pdf",
"page": 55,
"left": -9,
"top": 494,
"zoom": 0.7
},
"icon": "lucide-file-text",
"title": "LEZIONE6_Face recognition2D"
}
},
{
"id": "f5709629447d8f2e",
"type": "leaf",
"state": {
"type": "pdf",
"state": {
"file": "Biometric Systems/slides/Biometric_System___Notes.pdf",
"page": 20,
"left": 127,
"top": 389,
"zoom": 1.3
},
"icon": "lucide-file-text",
"title": "Biometric_System___Notes"
} }
} }
], ],
@ -78,7 +110,8 @@
} }
], ],
"direction": "horizontal", "direction": "horizontal",
"width": 300 "width": 300,
"collapsed": true
}, },
"right": { "right": {
"id": "11560c155f3d8f6e", "id": "11560c155f3d8f6e",
@ -165,6 +198,7 @@
}, },
"left-ribbon": { "left-ribbon": {
"hiddenItems": { "hiddenItems": {
"smart-second-brain:Open S2B Chat": false,
"obsidian-ocr:Search OCR": false, "obsidian-ocr:Search OCR": false,
"switcher:Open quick switcher": false, "switcher:Open quick switcher": false,
"graph:Open graph view": false, "graph:Open graph view": false,
@ -178,54 +212,54 @@
"obsidian-git:Open Git source control": false "obsidian-git:Open Git source control": false
} }
}, },
"active": "b5d8a3515919e28a", "active": "1716ddb2952202fc",
"lastOpenFiles": [ "lastOpenFiles": [
"Pasted image 20241031104526.png",
"Biometric Systems/slides/LEZIONE6_Face recognition2D.pdf",
"Biometric Systems/notes/6. Face recognition 2D.md",
"Pasted image 20241031104206.png",
"Pasted image 20241031102640.png",
"Pasted image 20241031102321.png",
"Pasted image 20241031100207.png",
"Biometric Systems/slides/Biometric_System___Notes.pdf",
"Pasted image 20241031091853.png",
"Pasted image 20241031085606.png",
"Pasted image 20241031084659.png",
"Autonomous Networking/notes/9 Markov processes.md",
"Autonomous Networking/notes/7.1 K-Armed bandit problem.md",
"Autonomous Networking/notes/7.2 10 arm testbed - optimism in face of uncertainty.md",
"Autonomous Networking/notes/2 RFID.md",
"Autonomous Networking/slides/2 RFID.pdf",
"Autonomous Networking/slides/9markovprocess.pdf",
"Autonomous Networking/slides/AutonomousNet-Class11-2122-Performance_of_action_selection_methods_UCB.pdf",
"Autonomous Networking/slides/AutonomousNet-Class10-2122-Multiarmed_bandit.pdf",
"K-Armed bandit problem.md",
"Autonomous Networking/notes/7 RL.md",
"Autonomous Networking/notes/q&a.md", "Autonomous Networking/notes/q&a.md",
"Pasted image 20241030165705.png",
"Autonomous Networking/slides/7 RL1.pdf",
"Pasted image 20241030154413.png",
"Autonomous Networking/notes/6 Internet of Things.md",
"Autonomous Networking/notes/5 Drones.md",
"Pasted image 20241030144246.png",
"Biometric Systems/notes/4. Face detection.md",
"Autonomous Networking/slides/AutonomousNet-Class13-2122-Optimal_policy_and_Qlearning.pdf",
"Biometric Systems/slides/LEZIONE5_NEW_More about face localization.pdf",
"Foundation of data science/slides/Untitled.md",
"Autonomous Networking/notes/3 WSN MAC.md",
"conflict-files-obsidian-git.md", "conflict-files-obsidian-git.md",
"Foundation of data science/notes/3 Multi Class Binary Classification.md", "Foundation of data science/notes/3 Multi Class Binary Classification.md",
"Foundation of data science/notes/2 Logistic Regression.md", "Foundation of data science/notes/2 Logistic Regression.md",
"Foundation of data science/images/Pasted image 20241029130844.png",
"Foundation of data science/images/Pasted image 20241029125726.png",
"Foundation of data science/images/Pasted image 20241029123613.png",
"Foundation of data science/images/Pasted image 20241029122255.png",
"Chats/New Chat.md", "Chats/New Chat.md",
"Chats", "Chats",
"Autonomous Networking/notes/8.md",
"Autonomous Networking/notes/2 RFID.md",
"Autonomous Networking/notes/4 WSN Routing.md", "Autonomous Networking/notes/4 WSN Routing.md",
"Autonomous Networking/notes/5 Drones.md",
"Autonomous Networking/notes/6 Internet of Things.md",
"Foundation of data science/notes/Logistic Regression for C > 1.md", "Foundation of data science/notes/Logistic Regression for C > 1.md",
"Foundation of data science/notes/Logistic Regression.md", "Foundation of data science/notes/Logistic Regression.md",
"Foundation of data science/notes/1 CV Basics.md", "Foundation of data science/notes/1 CV Basics.md",
"Foundation of data science/images/Pasted image 20241025161824.png",
"Foundation of data science/images",
"Foundation of data science/images/Pasted image 20241025165411.png",
"Foundation of data science/images/Pasted image 20241025165317.png",
"Foundation of data science/images/Pasted image 20241025165130.png",
"Foundation of data science/images/Pasted image 20241025163314.png",
"Foundation of data science/images/Pasted image 20241025163040.png",
"Autonomous Networking/notes/3 WSN MAC.md",
"Autonomous Networking/notes/7 RL.md",
"Autonomous Networking/slides/AutonomousNet-Class11-2122-Performance_of_action_selection_methods_UCB.pdf",
"Biometric Systems/notes/4. Face recognition.md",
"Biometric Systems/slides/LEZIONE5_NEW_More about face localization.pdf",
"Autonomous Networking/slides/7 RL1.pdf",
"Autonomous Networking/slides/6 IoT.pdf",
"Biometric Systems/notes/2. Performance indexes.md", "Biometric Systems/notes/2. Performance indexes.md",
"Biometric Systems/slides/LEZIONE3_Affidabilita_del_riconoscimento.pdf",
"Biometric Systems/slides/LEZIONE2_Indici_di_prestazione.pdf",
"Biometric Systems/notes/3. Recognition Reliability.md", "Biometric Systems/notes/3. Recognition Reliability.md",
"Autonomous Networking/slides/5 Drones.pdf",
"Biometric Systems/slides/LEZIONE4_Face introduction and localization.pdf",
"Foundation of data science/slides/Untitled.md",
"Autonomous Networking/notes/4 WSN pt. 2.md", "Autonomous Networking/notes/4 WSN pt. 2.md",
"Biometric Systems/notes/1. Introduction.md", "Biometric Systems/notes/1. Introduction.md",
"Autonomous Networking/notes/3 WSN.md",
"BUCA/Queues.md",
"Biometric Systems/final notes/2. Performance indexes.md",
"().md",
"a.md",
"Untitled.canvas" "Untitled.canvas"
] ]
} }

Binary file not shown.

After

Width:  |  Height:  |  Size: 67 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 85 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 85 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 52 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 58 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 68 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 32 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 78 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 50 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 78 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 49 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 43 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 55 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 61 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 72 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 10 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 10 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 126 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 91 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 64 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 61 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 38 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 36 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 40 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 74 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 89 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 118 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 150 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 150 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 20 KiB

View file

@ -32,7 +32,7 @@ RL is learning what to do, it presents two main characteristics:
- take actions that affects the state - take actions that affects the state
Difference from other ML Difference from other ML
- no supervisor - **no supervisor**
- feedback may be delayed - feedback may be delayed
- time matters - time matters
- agent action affects future decisions - agent action affects future decisions
@ -44,13 +44,6 @@ Learning online
- we expect agents to get things wrong, to refine their understanding as they go - we expect agents to get things wrong, to refine their understanding as they go
- the world is not static, agents continuously encounter new situations - the world is not static, agents continuously encounter new situations
RL applications:
- self driving cars
- engineering
- healthcare
- news recommendation
- ...
Rewards Rewards
- a reward is a scalar feedback signal (a number) - a reward is a scalar feedback signal (a number)
- reward Rt indicates how well the agent is doing at step t - reward Rt indicates how well the agent is doing at step t
@ -63,12 +56,12 @@ communication in battery free environments
- positive rewards if the queried device has new data - positive rewards if the queried device has new data
- else negative - else negative
Challenge: #### Challenges:
- tradeoff between exploration and exploitation - tradeoff between exploration and exploitation
- to obtain a lot of reward a RL agent must prefer action that it tried in the past - to obtain a lot of reward a RL agent must prefer action that it tried in the past
- but better actions may exist... So the agent has to exploit! - but better actions may exist... So the agent has to exploit!
exploration vs exploitation dilemma: ##### exploration vs exploitation dilemma:
- comes from incomplete information: we need to gather enough information to make best overall decisions while keeping the risk under control - comes from incomplete information: we need to gather enough information to make best overall decisions while keeping the risk under control
- exploitation: we take advanced of the best option we know - exploitation: we take advanced of the best option we know
- exploration: test new decisions - exploration: test new decisions
@ -108,6 +101,7 @@ one or more of these components
- used to evaluate the goodness/badness of states - used to evaluate the goodness/badness of states
- values are prediction of rewards - values are prediction of rewards
- $V_\pi(s) = Ep[yRt+1 + y^2Rt+2 ... | St = s]$ - $V_\pi(s) = Ep[yRt+1 + y^2Rt+2 ... | St = s]$
- better explained later
- **Model:** - **Model:**
- predicts what the environment will do next - predicts what the environment will do next
- may predict the resultant next state and/or the next reward - may predict the resultant next state and/or the next reward
@ -125,102 +119,3 @@ back to the original problem:
- negative if it has no data - negative if it has no data
- what to do if the device has lost data? - what to do if the device has lost data?
- state? - state?
### Exploration vs exploitation trade-off
- Rewards evaluate actions taken
- evaluative feedback depends on the action taken
- no active exploration
Let's consider a simplified version of an RL problem: K-armed bandit problem.
- K different options
- every time need to chose one
- maximize expected total reward over some time period
- analogy with slot machines
- the levers are the actions
- which level gives the highest reward?
- Formalization
- set of actions A (or "arms")
- reward function R that follows an unknown probability distributions
- only one state
- ...
Example: doctor treatment
- doctor has 3 treatments (actions), each of them has a reward.
- for the doctor to decide which action to take is best, we must define the value of taking each action
- we call these values the action values (or action value function)
- action value: ...
Each action has a reward defined by a probability distribution.
- the red treatment has a bernoulli probability
- the yellow treatment binomial
- the blue uniform
- the agent does not know the distributions!
- the estimated action for action a is the sum of rewards observed divided by the total time the action has been taken (add formula ...)
- 1predicate denotes the random variable (1 if true else 0)
- greedy action:
- doctors assign the treatment they currently think is the best
- ...
- the greedy action is computed as the argmax of Q values
- greedy always exploits current knowledge
- epsilon-greedy:
- with a probability epsilon sometimes we explore
- 1-eps probability: we chose best greedy action
- eps probability: we chose random action
exercises ...
exercise 2: k-armed bandit problem.
K = 4 actions, denoted 1,2,3 and 4
eps-greedy selection
initial Q estimantes = 0 for all a.
Initial sequenze of actions and rewards is:
A1 = 1 R1 = 1
A2 = 2 R2 = 2
A3 = 2 R3 = 2
A4 = 2 R4 = 2
A5 = 3 R5 = 0
---
step A1: action 1 selected. Q of action 1 is 1
step A2: action 2 selected. Q(1) = 1, Q(2) = 1
step A3: action 2 selected. Q(1) = 2, Q(2) = 1.5
step A4: action 2. Q(1) = 1, Q(2) = 1.6
step A5: action 3. Q(1) = 1, Q(2) = 1.6, Q(3) = 0
For sure A2 and A5 are epsilon cases, system didn't chose the one with highest Q value.
A3 and A4 can be both greedy and epsilon case.
#### Incremental formula to estimate action-value
- to simplify notation we concentrate on a single action
- Ri denotes the reward received after the i(th) selection of this action. Qn denotes the estimate of its action value after it has been selected n-1 times (add Qn formula ...)
- given Qn and the reward Rn, the new average of rewards can be computed by (add formula with simplifications...) $Q_(n+1) = Q_{n} + \frac{1}{n}[Rn - Qn]$
- NewEstimate <- OldEstimate + StepSize (Target - OldEstimate)
- Target - OldEstimate is the error
Pseudocode for bandit algorithm:
```
Initialize for a = 1 to k:
Q(a) = 0
N(a) = 0
Loop forever:
with probability 1-eps:
A = argmax_a(Q(a))
else:
A = random action
R = bandit(A) # returns the reward of the action A
N(A) = N(A) + 1
Q(A) = Q(A) + 1\N(A) * (R - Q(A))
```
Nonstationary problem: rewards probabilities change over time.
- in the doctor example, a treatment may not be good in all conditions
- the agent (doctor) is unaware of the changes, he would like to adapt to it
An option is to use a fixed step size. We remove the 1/n factor and add an $\alpha$ constant factor between 0 and 1.
And we get $Q_{n+1} = (1-\alpha)^{n}Q_1 + \sum_{i=1}^{n}{\alpha(1 - \alpha)^{(n-1)} R_i}$
... ADD MISSING PART ...

View file

@ -0,0 +1,122 @@
### K-Armed bandit problem
- Rewards evaluate actions taken
- evaluative feedback depends on the action taken
- no active exploration
Let's consider a simplified version of an RL problem: K-armed bandit problem.
- K different options
- every time need to chose one
- maximize expected total reward over some time period
- analogy with slot machines
- the levers are the actions
- which lever gives the highest reward?
- **Formalization**
- set of actions A (or "arms")
- reward function R that follows an unknown probability distributions
- only one state
- at each step t, agent selects an action A
- environment generates reward
- goal to maximize cumulative reward
Example: doctor treatment
- doctor has 3 treatments (actions), each of them has a reward.
- for the doctor to decide which action to take is best, we must define the value of taking each action
- we call these values the action values (or action value function)
- **action value:** $$q_{*}=E[R_{t} \mid A_{t}=a]$$
Each action has a reward defined by a probability distribution.
- the red treatment has a Bernoulli probability
- the yellow treatment binomial
- the blue uniform
- the agent does not know the distributions!
![[Pasted image 20241030165705.png]]
- **the estimated action value Q** for action a is the sum of rewards observed divided by the total time the action has been taken
- **greedy action:**
- doctors assign the treatment they currently think is the best
- greedy action is the action that currently has the largest estimated action value $$A_{t}=argmax(Q_{t}(a))$$
- greedy always exploits current knowledge
- **epsilon-greedy:**
- with a probability epsilon sometimes we explore
- 1-eps probability: we chose best greedy action
- eps probability: we chose random action
#### Exercise 1
In ε-greedy action selection, for the case of two actions and ε=0.5, what is the probability that the greedy action is selected?
*We have two actions. The probability of selecting the greedy action is 50%.
But when the exploration happens, the greedy actions may be selected!
So, with 0.5 prob. we select greedy action. With 0.5 prob. we select random action, which can be both. So in the random case, we select the greedy action with 0.5 * 0.5 probability = 0.25.
Finally, we select the greedy action with 0.5 + 0.25 = 0.75 probability.
#### Exercise 2
Consider K-armed bandit problem.
K = 4 actions, denoted 1,2,3 and 4
Agent uses eps-greedy action selection
initial Q estimantes is 0 for all actions: $$Q_{1}(a)=0$$
Initial sequenze of actions and rewards is:
A1 = 1 R1 = 1
A2 = 2 R2 = 2
A3 = 2 R3 = 2
A4 = 2 R4 = 2
A5 = 3 R5 = 0
On some of those time steps, the epsilon case may have occurred, causing an action to be selected at random. On which time steps did this definitely occur?
On which time steps could this possibly have occurred?
***Answer***
to answer, we need to compute the action sequence
in the table, Qa means Q value of the action a at current state!
| steps | Q1 | Q2 | Q3 | Q4 |
| -------------- | --- | ---- | --- | --- |
| A1 \| action 1 | 1 | 0 | 0 | 0 |
| A2 \| action 2 | 1 | 1 | 0 | 0 |
| A3 \| action 2 | 1 | 1.5 | 0 | 0 |
| A4 \| action 2 | 1 | 1.66 | 0 | 0 |
| A5 \| action 3 | 1 | 1.66 | 0 | 0 |
step A1: action 1 selected. Q of action 1 is 1
step A2: action 2 selected. Q(1) = 1, Q(2) = 1
step A3: action 2 selected. Q(1) = 2, Q(2) = 1.5
step A4: action 2. Q(1) = 1, Q(2) = 1.6
step A5: action 3. Q(1) = 1, Q(2) = 1.6, Q(3) = 0
For sure A2 and A5 are epsilon cases, system didn't chose the one with highest Q value.
A3 and A4 can be both greedy and epsilon case.
#### Incremental formula to estimate action-value
- idea: compute incrementally the action values, to avoid doing it every time
- to simplify notation we concentrate on a single action on the next examples
- $R_{i}$ denotes the reward received after the i(th) selection of this action.
- $Q_{n}$ denotes the estimate of its action value after it has been selected n-1 times $$Q_{n}=\frac{R_{1}+R_{2}+\dots+R_{n-1}}{n-1}$$
- given $Q_{n}$ and the reward Rn, the new average of rewards can be computed by $$Q_{n+1}=\frac{1}{n}\sum_{i=1}^nR_{i}$$
General formula: NewEstimate <- OldEstimate + StepSize (Target - OldEstimate) $$Q_(n+1) = Q_{n} + \frac{1}{n}[Rn - Qn]$$Target - OldEstimate is the error
Pseudocode for bandit algorithm:
```
Initialize for a = 1 to k:
Q(a) = 0
N(a) = 0
Loop forever:
with probability 1-eps:
A = argmax_a(Q(a))
else:
A = random action
R = bandit(A) # returns the reward of the action A
N(A) = N(A) + 1
Q(A) = Q(A) + 1\N(A) * (R - Q(A))
```
#### Nonstationary problem:
Rewards probabilities change over time.
- in the doctor example, a treatment may not be good in all conditions
- the agent (doctor) is unaware of the changes, he would like to adapt to it, maybe a treatment is good only on a specific season.
An option is to use a fixed step size. We remove the 1/n factor and add an $\alpha$ constant factor between 0 and 1.
And we get $$Q_{n+1} = (1-\alpha)^{n}Q_1 + \sum_{i=1}^{n}{\alpha(1 - \alpha)^{(n-1)} R_i}$$
#### Optimistic initial values
Initial action values can be used as a simple way to encourage exploration!
This way we can make the agent explore more at the beginning, and explore less after a while, this is cool!

View file

@ -15,7 +15,7 @@
![[Pasted image 20241025084755.png]] ![[Pasted image 20241025084755.png]]
.. add siled ...
![[Pasted image 20241025084830.png]] ![[Pasted image 20241025084830.png]]
#### Experiments #### Experiments

View file

@ -0,0 +1,132 @@
MDPs are a classical formalization of sequential decision making, where actions influence not just immediate rewards, but also subsequent situations (states) and through those future rewards
MDPs involve delayed rewards and the need to trade off immediate and delayed rewards
Whereas in bandit we estimated the q*(a) of each action a, in MDPs we estimate the value q*(a,s) of each action a in each state s, or we estimate the value v*(s) of each state s given optimal action selection
- MDPs are meant to be a straightforward framing of the problem of learning from interaction to achieve a goal.
- The agent and environment interact at each of a sequence of discrete time steps, t = 0, 1, 2, 3, . . .
- At each timestep, the agent receives some representation of the environment state and on that basis selects an action
- One time step later, in part as a consequence of its action, the agent receives a numerical reward and finds itself in a new state
- Markov decision processes formally describe an environment for reinforcement learning
- Where the environment is fully observable
- i.e. The current state completely characterises the process
- Almost all RL problems can be formalised as MDPs
- e.g. Bandits are MDPs with one state
Markov property
“The future is independent of the past given the present”
![[Pasted image 20241030102226.png]]![[Pasted image 20241030102243.png]]
- A Markov process (or markov chain) is a memoryless random process, i.e. a sequence of random states S1, S2, ... with the Markov property.
- S finite set of states
- P is a state transition probability matrix
- then $Pss = [St+1=s | St=s]$
Example
![[Pasted image 20241030102420.png]]
![[Pasted image 20241030102722.png]]
#### Markov Reward Process
This is a Markov Process but we also have a reward function! We also have a discount factor.
Markov Reward Process is a tuple ⟨S, P, R, γ⟩
- S is a (finite) set of states
- P is a state transition probability matrix, $Pss = [ St+1=s | St=s ]$
- R is a reward function, $Rs = 𝔼 [ R_{t+1} | St = s ]$
- γ is a discount factor, $γ ∈ [0, 1]$
![[Pasted image 20241030103041.png]]
![[Pasted image 20241030103114.png]]
- The discount $γ ∈ [0, 1]$ is the present value of future rewards
- The value of receiving reward R after k + 1 time-steps is $γ^kR$
- This values immediate reward above delayed reward
- γ close to 0 leads to ”short-sighted” evaluation
- γ close to 1 leads to ”far-sighted” evaluation
Most Markov reward and decision processes are discounted. Why?
- mathematical convenience
- avoids infinite returns in cyclic Markov processes
- uncertainity about the future may not be fully represented
- if the reward is financial, immediate rewards may earn more interest than delayed rewards
- Animal/human behaviour shows preference for immediate rewards
- It is sometimes possible to use undiscounted Markov reward processess (gamma = 1) e.g. if all sequences terminate
Value function
- The value function v(s) gives the long-term value of (being in) state s
- The state value function v(s) of an MRP is the expected return starting from state s $𝑉) = 𝔼 [𝐺𝑡 |𝑆𝑡 = 𝑠]$
![[Pasted image 20241030103519.png]]
![[Pasted image 20241030103706.png]]
is a prediction of the reward in next states
![[Pasted image 20241030103739.png]]
![[Pasted image 20241030103753.png]]
- The value function can be decomposed into two parts:
- immediate reward $R_{t+1}$
- discounted value of successor state $γv(St+1)$
![[Pasted image 20241030103902.png]]
#### Bellman Equation for MRPs
$v (s) = E [Rt+1 + v (St+1) | St = s]$
![[Pasted image 20241030104056.png]]
- Bellman equation averages over all the possibilities, weighting each by its probability of occurring
- The value of the start state must be equal the (discounted) value of the expected next state, plus the reward expected along the way
![[Pasted image 20241030104229.png]]
4.3 is the reward I get exiting from the state (-2) plus the discount times the value of the next state + ecc.
![[Pasted image 20241030104451.png]]
#### Solving the Bellman Equation
- The Bellman equation is a linear equation
- can be solved directly![[Pasted image 20241030104537.png]]
- complexity O(n^3)
- many iterative methods
- dynamic programming
- monte-carlo evaluation
- temporal-difference learning
#### MDP
![[Pasted image 20241030104632.png]]
![[Pasted image 20241030104722.png]]
Before we had random probabilities, now we have actions to chose from. But how do we chose?
We have policies: a distribution over actions given the states: $$𝜋(a|s)= [ At=a | St=s ]$$
- policy fully defines the behavior of the agent
- MDP policies depend on the current state (not the history)
- policies are stationary (time-independent, depend only on the state but not on the time)
##### Value function
The state-value function v𝜋(s) of an MDP is the expected return starting from state s, and then following policy 𝜋 $$v𝜋(s) = 𝔼𝜋 [ Gt | St=s ]$$
The action-value function q 𝜋 (s,a) is the expected return starting from state s, taking action a, and then following policy 𝜋 $$q 𝜋(a|s)= 𝔼𝜋 [ Gt | St=s, At=a ]$$
![[Pasted image 20241030105022.png]]
- The state-value function can again be decomposed into immediate reward plus discounted value of successor state $$v\pi(s) = E\pi[Rt+1 + v⇡(St+1) | St = s]$$
- The action-value function can similarly be decomposed $$q\pi(s, a) = E\pi [Rt+1 + q⇡(St+1, At+1) | St = s, At = a]$$
![[Pasted image 20241030105148.png]]![[Pasted image 20241030105207.png]]
![[Pasted image 20241030105216.png]]
putting all together
(very important, remeber it)
![[Pasted image 20241030105234.png]]
![[Pasted image 20241030105248.png]]
as we can see, an action does not necessarily bring to a specific state.
Example: Gridworld
2 azioni che rimandano allo stesso stato, due azioni che vanno in uno stato diverso
![[Pasted image 20241030112555.png]]
![[Pasted image 20241030113041.png]]
![[Pasted image 20241030113135.png]]
C is low because from C I get reward of 0 everywhere I go.

Binary file not shown.

View file

@ -0,0 +1,159 @@
![[Pasted image 20241030133828.png]]
Nel riconoscimento facciale 2D abbiamo principalmente due problematiche da
risolvere:
- Costruzione di feature discriminative e rappresentative
- difficile avere una separazione lineare tra classi
- in certi casi dobbiamo combinare classificatori lineari (AdaBoost) o usare classificatori non lineari (molto più complesso)
- Costruzione di un classificatore che possa generalizzare anche su oggetti mai visti nel training
In questo caso quindi ci troviamo a rappresentare le facce allinterno di immagini digitali, che vengono rappresentate da matrici a due dimensioni (w × h) oppure vettori unidimensionali (d = w × h). Lalta dimensionalità dei dati (es.: immagine 100 × 100 ha dimensione 10000) ci porta subito a un primo problema, ovvero il “Curse of dimensionality”: quando la dimensionalità aumenta, il volume dello spazio aumenta cosı̀ velocemente che i dati diventano sparsi/radi, rendendo difficile la classificazione.
La sparsità dei dati, nel momento in cui vengono utilizzati strumenti statistici
come ad esempio i modelli di machine learning, diminuisce drasticamente la capacità predittiva di questi in quanto si hanno bisogno di tanti più esempi per generalizzarne le regole predittive.
> [!PDF|yellow] [[LEZIONE6_Face recognition2D.pdf#page=7&selection=6,0,6,17&color=yellow|LEZIONE6_Face recognition2D, p.7]]
> > Some consequences
>
> leggi le conseguenze
Un'alternativa è rappresentare un'immagine nel feature space:
- Gabor filters
- discrete cosine transform (DCT)
- local binary pattern (LBP) operator
- fractal encoding
#### PCA
possibile soluzione al curse of dimensionality: PCA.
Metodo statistico che consente di ridurre l'alta dimensionalità di uno spazio mappando i dati in un altro spazio con dimensionalità decisamente più piccola minimizzando la perdita di informazioni.
Vengono individuati dei nuovi assi, ortogonali tra loro, su cui proiettare i dati che però ne **massimizzano la varianza**. L'ortogonalità di questi ci permette di escludere componenti correlate tra loro che risultano quindi ridondanti.
Nello spazio delle immagini, i principali componenti sono ortogonali quando sono gli *autovettori della matrice di covarianza*.
L'obiettivo è di eliminare features con varianza molto bassa, che quindi sono comuni tra tutti i sample e poco utili a discriminare classi.
Calcoliamo l'optimal k
Dato un training set TS di m sample di n dimensioni, calcoliamo
- il vettore medio $\hat{x}$ del training set $$\hat{x}=\frac{1}{m}\sum_{i=1}^{m}x_{i}$$
- matrice di covarianza C $$C=\frac{1}{m}\sum_{i=1}^m(x_{i}-\hat{x})(x_{i}-\hat{x})^T$$
- la dimensione di C è (n x n)
- il nuovo spazio k-dimensionale è dato dalla matrice di proiezione dove le colonne sono i k autovettori di C, corrispondenti ai k autovalori di C. Plottando gli autovalori possiamo ottenere la varianza lungo gli autovettori.
> [!PDF|red] [[LEZIONE6_Face recognition2D.pdf#page=14&selection=0,0,2,10&color=red|LEZIONE6_Face recognition2D, p.14]]
> > PCA and Eigenfaces
>
> esempio
![[Pasted image 20241030142613.png]]
da questo otteniamo l'elemento medio:
![[Pasted image 20241030142635.png]]
problema: bisogna avere tutte le facce ben centrate!!
![[Pasted image 20241030142717.png]]
autovettori (completamente diverse da quelle di partenza se ci facciamo caso)
dobbiamo estrarre quelle
Proiezione: eseguita moltiplicando la matrice di proiezione trasposta $𝜑_{k}^T$ per il vettore originale.
![[Pasted image 20241030142934.png]]
![[Pasted image 20241030143149.png]]
##### Problemi di PCA
- mancanza di potere discriminativo
- la separazione degli autovettori dipende anche dalle differenze intra-classe
- ma il sistema è unsupervised learning
- in in presenza di variazioni PIE, il modello potrebbe usare quelle come componenti facciali e non essere quindi in grado di separare correttamente le classi
Possibile soluzione: Fisherfaces, un'applicazione di FLD (Fisher's linear discriminant), spesso menzionato nel contesto del Linear Discriminant Analysis (LDA).
### LDA
Alcuni dei problemi di PCA possono essere dati dal fatto che lavoriamo su dati in una maniera unsupervised (non supervisionata), ovvero senza tenere in considerazione le classi dei vari sample.
Una soluzione supervised è proposta da LDA, ovvero un metodo simile a PCA ma che minimizza la distanza intra-classe e cerca di massimizzare invece la distanza tra classi.
Nell'immagine la riduzione è da 2 a 1 dimensioni. In alcuni casi tipo questo non basta ridurre da due a un asse, gli assi sono ruotati per massimizzare la separazione tra classi. L'importante è che le coordinate rimangano ortogonali tra loro
![[Pasted image 20241030144246.png]]
the best subspace is better cause we are able to separate class among the 1D axis. As we can see, in the worst one, we are definitely not able to.
##### Possibile approccio
- partiamo, come nel PCA, da un set di m sample di n dimensioni.
- a differenza del PCA, splittiamo il set in base ai label che rappresentano le classi $PTS=\{P_{1},P_{2},{_{3},\dots,P_{s}}\}$, dove $P_{i}$ è una classe di cardinalità $m_{i}$
- cerchiamo di ottenere uno scalare $y$ proiettando il sample $x$ su una linea, in modo che $y=w^Tx$
- chiaramente x è un punto nell'immagine 2D
- y è il punto proiettato invece nel nuovo spazio 1D
Consideriamo due classi $P_1$ e $P_{2}$, rispettivamente con $m_{1}$ e $m_{2}$ vettori ciascuna
- consideriamo i vettori medi di ogni classe nei due spazi: $$\mu_{i}=\frac{1}{m_{i}}\sum_{j=1}^{m_{i}}x_{j}$$
- nel nuovo spazio si avrà (la tilde indica che è nel nuovo spazio) $$\tilde{}\mu_{i}=\frac{1}{m_{i}}\sum_{j=1}^{m_{i}}y_{j}=\frac{1}{m_{i}}\sum_{j=1}^{m_{i}}w^Tx_{j}=w^T\mu_{i}$$
- possiamo scegliere la distanza dai vettori medi proiettati (quindi nel nuovo spazio) come una funzione che chiaramente vogliamo cercare di massimizzare $$J(w)=|\tilde{\mu_{1}}-\tilde{\mu_{2}}|=|w^T(\mu_{1}-\mu_{2})|$$
![[Pasted image 20241031085606.png]]
in questo caso però notiamo come in un asse abbiamo un'ottima separabilità tra classi, nell'altro abbiamo una distanza maggiore tra i due vettori medi.
E mo come cazzo famo? Proviamo a cercare un metodo migliore
##### Scattering matrices
Dobbiamo prendere in considerazione lo spread tra le classi.
Dobbiamo massimizzare il rapporto tra la varianza tra classi e la varianza della singola classe
- **varianza della singola classe**
- indica come i vettori sono distribuiti/dispersi rispetto al centro della stessa classe
- **varianza tra classi**
- indica come i centri sono distribuiti rispetto al centro di tutto (ok si potrebbe spiegare meglio ma so le 9:00
##### Approccio
- partiamo, come nel PCA, da un set di m sample di n dimensioni.
- a differenza del PCA, splittiamo il set in base ai label che rappresentano le classi $PTS=\{P_{1},P_{2},{_{3},\dots,P_{s}}\}$, dove $P_{i}$ è una classe di cardinalità $m_{i}$
- per ogni classe $P_{i}$ calcoliamo il vettore medio e la "media dei medi"$$\mu_{i}=\frac{1}{m_{i}}\sum_{j=1}^{m_{i}}x_{j}$$$$\mu_{TS}=\frac{1}{m}\sum_{i=1}^{S}m_{i}\mu_{i}$$
- calcoliamo la matrice di covarianza per la classe ${P_{i}}$ $$C_{i}=\frac{1}{m_{i}}\sum_{j=1}^{m_{i}}(x_{j}-\mu_{i})(x_{j}-\mu_{i})^T$$
- ora possiamo calcolare scatter matrix whithin-class (della stessa classe) $$S_{W}=\sum_{i=1}^Sm_{i}C_{i}$$
- scatter matrix tra classi $$S_{B}=\sum_{i=1}^Sm_{i}(\mu_{i}-\mu_{TS})(\mu_{i}-\mu_{TS})^T$$per due classi è anche definita come $$(\mu_{1}-\mu_{2})(\mu_{1}-\mu_{2})^T$$
- le stesse definizioni sono valide anche nel nuovo spazio con $\tilde{S_{W}}$ e $\tilde{S_{B}}$ con $y=w^Tx$ e la dispersione (scatter) della classe proiettata $P_{i}$ è definito come $$\tilde{s_{i}}^2=\sum_{y \in P_{i}}(y-\tilde{\mu_{i}})^2$$ la somma di tutti gli $\tilde{s_{i}}^2$ (di tutte le classi) è la within-class scatter (dispersione tra classi)
- adesso vogliamo massimizzare la differenza tra i vettori medi, normalizzati secondo una misura della dispersione della singola classe (within-class scatter)
- con due classi, è definita come la funzione lineare $w^Tx$ che massimizza la funzione $$J(w)=\frac{{|\tilde{\mu_{1}}-\tilde{\mu_{2}}|^2}}{\tilde{s_{1}}²+\tilde{s_{2}}^2}$$
- cerchiamo una proiezione dove sample della stessa classe sono proiettati molto vicini tra loro, ma con i valori medi delle classi il più possibile lontani tra loro![[Pasted image 20241031091853.png]]
### Feature spaces
- possiamo estrarre feature applicando filtri o trasformazioni alle immagini
- possiamo usare uno dei metodi appena descritti se le feature estratte hanno una dimensionalità alta
Quali sono le migliori feature da estrarre? Vediamo alcuni operatori
##### Bubbles
- un osservatore umano può subito ipotizzare il genere, l'identità, l'età e l'espressione guardando una faccia
- l'esperimento consiste nel mostrare l'immagine con sopra un pattern a bolle e vedere se l'osservatore umano e un sistema biometrico riescono a classificarlo bene![[Pasted image 20241031100207.png]]
- alla fine dell'esperimento riusciamo a capire quali sono gli elementi (le features quindi) della faccia che vengono prese in considerazione da un essere umano o che un sistema biometrico utilizza per classificare
#### Wavelet
provides a time-frequency representation of the signal. Fornisce tempo e frequenza contemporaneamente.
![[Pasted image 20241031102321.png]]
#### Gabor filters
È un filtro lineare utilizzato in applicazioni di edge detection, analisi di texture e estrazione di feature, e si pensa che sia simile al sistema percettivo visivo di noi umani. Un Gabor filter 2D non è altro che un kernel Gaussiano modulato da unonda piana sinusoidale.
> [!PDF|yellow] [[LEZIONE6_Face recognition2D.pdf#page=43&selection=0,2,11,7&color=yellow|LEZIONE6_Face recognition2D, p.43]]
> > An example of filters: Gabor filters
>
> guarda dalle slide
![[Pasted image 20241031102640.png]]
Un feature vector viene ricavato eseguendo varie convoluzioni con un insieme Gabor filter (Gabor filter bank) con diversi orientamenti e scale (nell'immagine sull'asse x troviamo gli orientamenti e su y le scale)
Otteniamo quindi diversi filtri andando a cambiare i parametri:
- λ: frequenza, controlla lo spessore della striscia; più è grande più sarà spessa
- θ: orientamento, controlla la rotazione della striscia ed è espresso come un angolo
- γ: aspect ratio, controlla laltezza della striscia; più è grande più laltezza diminuirà
- σ: larghezza di banda, controlla la scala generale determinando anche il numero di strisce; aumentando questo valore avremo più strisce con meno distanza tra loro.
Questa procedura, su immagini molto grandi, porta ad unalta dimension-
alità; possibili soluzioni sono:
- Applicare il filtro su un subset di pixel, come ad esempio una griglia =⇒ dimensionalità ridotta, ma possibile perdità di informazioni salienti (dovuta a possibili rotazioni ad esempio)
- Applico il filtro su tutti i pixel, ma poi seleziono solo i punti con picchi di valori =⇒ problemi di allineamento
###### Ma come facciamo la convoluzione?
- Griglia fissa
- possiamo perdere punti importanti
- Griglia definita a mano in modo che i vertici sono vicini alle feature importanti, come gli occhi
#### EBGM Elastic Bunch Graph Matching
Questo metodo fa uso di grafi, nello specifico abbiamo per ogni soggetto una collezione di grafi (per questo il nome “bunch”), uno per ogni posa, dove:
- i vertici (edges) sono labellati (pesati) con la distanza tra i nodi
- i nodi contengono un insieme di risultati di diversi gabor filter (in genere 5 differenti frequenze e 8 diversi orientamenti), conservati in questa struttura chiamata “Jet”, e sono posizionati in punti importanti come ad esempio naso, occhi e bocca
![[Pasted image 20241031104206.png]]
![[Pasted image 20241031104526.png]]
oggi non lo usa nessuno.

Binary file not shown.

Binary file not shown.

After

Width:  |  Height:  |  Size: 70 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 137 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 33 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 204 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 14 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 78 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 96 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 85 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 76 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 8.4 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 30 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 97 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 127 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 152 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 155 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 78 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 185 KiB