Question about (time) ordering of data / predictions for the continue predictor #65

PaulScemama · 2023-06-06T13:19:41Z

Hi, I'm quite inexperienced regarding Reinforcement Learning so forgive me if my question is trivial :). I have a quick question about the continue predictor.

In a typical Gym environment with an agent following a random policy, I've seen things like

for _ in range(num_episodes):                                                          # 1
  # First observation of an episode                                                    # 2
  obs, info = gym_env.reset()                                                          # 3
                                                                                       # 4
  done = False                                                                         # 5
  while not done:                                                                      # 6   
    action = gym_env.action_space.sample()                                             # 7
    observation, reward, done, _, _ = gym_env.step(action)                             # 8

The continue predictor is supposed to predict whether an episode will terminate or not. How I see it, for each (non-episode initializing step; lines 7-8) we get

an action | $a_t$
a reward resulting from the action | $r_t$
a "next" observation as a result of the action | $x_t$
a "done" (or alternatively continue) flag indicating if the episode has terminated | $c_t$

My question is: do we use $x_t$ to predict $c_t$? More specifically, does the stochastic posterior incorporate $x_t$ so that the "model state" (concatenation of deterministic state and stochastic state) is used to predict $c_t$?

Another way of asking the question: do we use the observation retrieved at the step that we also receive the continue flag, to predict the continue flag? I.e. in the line observation, reward, done, _, _ = gym_env.step(action), we incorporate the observation into the stochastic state to then help predict the done?

Thanks in advance!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about (time) ordering of data / predictions for the continue predictor #65

Question about (time) ordering of data / predictions for the continue predictor #65

PaulScemama commented Jun 6, 2023

Question about (time) ordering of data / predictions for the continue predictor #65

Question about (time) ordering of data / predictions for the continue predictor #65

Comments

PaulScemama commented Jun 6, 2023