You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have discovered that the PPO agent does not treat a truncation signal as the end of an episode.
In the current code, the truncation signal is only triggers value bootstrapping.
To confirm this, I tested the agent with two custom cartpole environments :
environment A : Sending a termination signal only if the state of the cartpole is not stable.
environment B : Sending a termination signal both when the state is unstable and when a truncation occurs.
Both environments have a very short maximum time steps (i.e. 10) to clearly see the effect of truncation.
Actually I couldn't found reasonable performance difference, but could see that vale loss was very different. A - purple, B - green
As you can see, A have a much bigger loss value compared to B.
I reson it is because A correctly perform 10-step bootstrapping, propagation of blue network is slow and smaller errors (also considering the fact that initial output of network is almost zero).
However, B is performing a larger n-step bootstrapping across multiple episodes, leading to relatively larger errors.
What skrl version are you using?
1.3.0
What ML framework/library version are you using?
Pytorch 2.4.0+cu118
Additional system information
Python 3.10.15, Window11
The text was updated successfully, but these errors were encountered:
Description
I have discovered that the PPO agent does not treat a truncation signal as the end of an episode.
In the current code, the truncation signal is only triggers value bootstrapping.
To confirm this, I tested the agent with two custom cartpole environments :
Both environments have a very short maximum time steps (i.e. 10) to clearly see the effect of truncation.
Actually I couldn't found reasonable performance difference, but could see that vale loss was very different.
A - purple, B - green
As you can see, A have a much bigger loss value compared to B.
I reson it is because A correctly perform 10-step bootstrapping, propagation of blue network is slow and smaller errors (also considering the fact that initial output of network is almost zero).
However, B is performing a larger n-step bootstrapping across multiple episodes, leading to relatively larger errors.
What skrl version are you using?
1.3.0
What ML framework/library version are you using?
Pytorch 2.4.0+cu118
Additional system information
Python 3.10.15, Window11
The text was updated successfully, but these errors were encountered: