Truncated signal not treated as end of episode in PPO agent #224

asdfGuest · 2024-11-13T15:25:40Z

Description

I have discovered that the PPO agent does not treat a truncation signal as the end of an episode.
In the current code, the truncation signal is only triggers value bootstrapping.

To confirm this, I tested the agent with two custom cartpole environments :

environment A : Sending a termination signal only if the state of the cartpole is not stable.
environment B : Sending a termination signal both when the state is unstable and when a truncation occurs.

Both environments have a very short maximum time steps (i.e. 10) to clearly see the effect of truncation.

Actually I couldn't found reasonable performance difference, but could see that vale loss was very different.

A - purple, B - green

As you can see, A have a much bigger loss value compared to B.
I reson it is because A correctly perform 10-step bootstrapping, propagation of blue network is slow and smaller errors (also considering the fact that initial output of network is almost zero).
However, B is performing a larger n-step bootstrapping across multiple episodes, leading to relatively larger errors.

What skrl version are you using?

1.3.0

What ML framework/library version are you using?

Pytorch 2.4.0+cu118

Additional system information

Python 3.10.15, Window11

asdfGuest added the bug Something isn't working label Nov 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Truncated signal not treated as end of episode in PPO agent #224

Truncated signal not treated as end of episode in PPO agent #224

asdfGuest commented Nov 13, 2024 •

edited

Loading

Truncated signal not treated as end of episode in PPO agent #224

Truncated signal not treated as end of episode in PPO agent #224

Comments

asdfGuest commented Nov 13, 2024 • edited Loading

Description

What skrl version are you using?

What ML framework/library version are you using?

Additional system information

asdfGuest commented Nov 13, 2024 •

edited

Loading