master-degree-notes/Autonomous Networking/notes/q&a.md

7.8 KiB

Q: explain the problem of energy consumption in sensor networks

As sensor run on batteries, energy consumption is a serious problem as we want sensors' batteries to last as long as possible. It's often challenging to replace or recharge the batteries. Energy consumption is caused by a several things:

  • overhearing
  • overemitting
  • idle listening
  • collisions
  • overhead caused by control packets
  • continuous operation
  • transmission distance

To achieve a low energy consumption is very important to define good MAC and routing strategies. For MAC we can use protocols such as S-MAC, allows sensor to sleep most of the time when they are not communicating. S-MAC works by letting sensors do the carrier sense only for a small fraction of the time while idle. To make this work, neighbor nodes needs to be synchronized to each-other, to be able to do carrier sensing at the same time. ecc.

Q: Challenges of routing in wireless sensor networks

routing protocols must be:

  • scalable to support networks with very different sizes, and performance should not degrade increasing the size
  • wide range of node density
  • limited resources for each node
    • low computation capability
    • can not use too much energy
    • nodes may even not have a global ID
  • fault tollerant
    • a node failure should not destroy the entire network
  • support mobility as some nodes may be mobile A good routing protocol should also guarantee that the network will have a long lifetime, as long as possible. Energy consumption is very important as we seen before, for this reason, based on the needs, we can have different kind of routing protocols:
  • proactive
  • reactive
  • geo-based

Q: Explain the difference between Framed Slotted Aloha and Tree Slotted Aloha protocols in RFID system

Both protocols are based on Slotted ALOHA: a frame is divided in time slots, and a tag randomly choses a slot to answer, in a way to reduce collisions. In Frame Slotted Aloha, the number of slots in a frame is always the same. If two (or more) nodes decide to take the same slot, they create a collision. To try to address the collision, a new query is issued by the reader. In TSA instead, for each collision slot s, a new child frame with a smaller slot number is issued. And only the tag that decided to transmit in slot s will transmit in the same frame. The TSA protocol improves the system efficiency as the probability of having a collision is lower. But for both protocols, to have good performance is important to have an estimate of the number of tags to identify as we need to chose the number of slots based on it. If we have too many slots, we will have a lot of time wasted in idle slots, if we have too few slots, we will have a lot of collisions.

Q: in a slotted aloha protocol for RFID system how is estimated the tag population participating into intermediate frames?

Main issues:

  • total number of tags to identify is not known
  • initial frame size is set to a predefined value (e.g. 128)
  • the size of the following frames is estimated by:
tags\_per\_collision\_slot=\frac{(estimated\_total\_num\_of\_tags) - (identified\_tags)}{collision\_slots}

The key issue is that we don't know the total number of tags! We can estimate it with the Chebyshev inequality. The problem is that for very large tag number, it can be inaccurate.

Q: explain the binary splitting protocol for RFID systems (discuss its performance)

All the tags have a counter set initially to 0. When the reader sends the first query every tag responds. Every time a collision is generated, rags randomly increments their counter. The process repeat until a single tag or no tag responds. In this case all tags will decrement the counter. As each time the tags are split into two sets, we can "see" it as a binary tree, so we can count the node of the tree to get an estimation.

BS_{tot}(n)=\begin{cases}1,n\le1\\ 1+\sum_{k=0}^{n}\binom{n}{k}\left(\frac12\right)^{k}\left(1-\frac12\right)^{n-k}\left(BS_{tot}\left(k\right)+BS_{tot}\left(n-k\right)\right),n>1\end{cases}

Q: explain the differences between proactive and reactive routing in sensor networks. Discuss the advantages and disadvantages

Q: Define the agent state and the environment state and explain how these two states differ. Give a practical example

The environment state is the actual state of the environment, is the full description of the current situation. It contains everything related to the environment, regardless of whether the agent is able to observe it. Only a small part of the environment may be observable by the agent. The agent state is the view of the agent on the environment, is a function of history: S_{t} = f(H_{t}) The agent state is used by the policy to take the next decision, based on the history. The distinction is important as the agent has to learn to make good decisions having limited information.

Q: Explain the exploitation-exploration dilemma

The exploitation/exploration dilemma is the problem of finding the best compromise between the two. An agent wants to exploit actions that are known to bring positive rewards, but without exploring, it may never learn which actions are the best, so it also wants to explore. If the agent explores too much tho, it may chose some non optimal actions too many times.

Q: Mention and briefly explain three different strategies for action selection in reinforcement learning

greedy: the agent always exploits the action with the highest action value $\epsilon$-greedy: the greedy action is selected with 1-\epsilon probability. While with \epsilon probability a random action is selected. This helps the agent to explore and find the actions with the best values. UCB: this method is based on the "optimism in the face of uncertainty" principle: if we are unsure about something, we should optimistically assume that is good. For this reason actions are choses not only based on the reward, but also based on the uncertainty of the variance of the reward distribution. For each action, the agent will define a confidence window where it thinks the reward's mean value is located. To explore more actions where the agent is not sure about, the window's upper bound is optimistically considered as the action value. riformuliamola meglio, vedi slide

Higher initial values: bla bla bla

\lambda=0.5 and the following sequence of reward is received

R_{1}=-1 R_{2}=2 R_{3}=6 R_{4}=3 R_{5}=2 with T=5. What are G_{0}, G_{1}, \dots, G_{5}? Hint: work backwards

G_{5} = 0 G_{4} = 2 G_{3} = 3 + \frac{1}{2} 2 = 4 G_{2}=6+\frac{1}{2}4 = 8 G_{1}=2+\frac{1}{2}8 = 6 G_{0}=-1+\frac{1}{2}6=2

Q: Imagine a network of 10 sensor nodes deployed across an area to monitor environmental conditions, such as temperature, humidity, or pollutant levels. Each sensor node has a different, but unknown, data quality score and battery level what fluctuates due to environmental factors and usage over time. Your goal is to design a strategy that balances exploration and exploitation to maximize cumulative data quality while conserving battery resources.

Actions = {query sensor 1, ..., query sensor 10}

Reward should be a function of data quality and battery level. We consider data quality dq and battery level bl as floating points between 0 and 1.

R_{t}=\alpha *dq-\beta*bl, with \alpha and \beta being arbitrary parameters that can be set to define the importance of data quality and battery level.

States: one state

Agent: sampling average

Agent's policy: \epsilon-greedy

Explain Bellman Expectation Equation for the value of a state V^\pi

Basic principle: the value of a state is the expected reward I get exiting from that state plus the discounted value of future states.

add backup diagram

v_{\pi}(s)=\sum_{a \in A}\pi(a|s)q(a, s)=\sum_{a \in A}\pi(a|s)\left( R_{s}^a+\gamma \sum_{s' \in S}P_{ss'}^av_{\pi}(s') \right)

origin/main