Workaround for environment max step limit of 200. #107

sedand · 2017-09-12T13:56:50Z

MountainCar-v0 environment has a default max_episode_steps limit of 200.
See: https://github.com/openai/gym/blob/master/gym/envs/__init__.py
This makes learning nearly impossible, if we do not continue to play after 200 steps, until a reward is received. (First episode takes ~4000 steps to complete sucessfully)
Hacky solution: Ignore the 'done' flag and use the real "done logic" from mountain_car.py
done = bool(position >= self.goal_position)

This makes learning nearly impossible, if we do not continue to play after 200 steps, until a reward is received. Hacky solution: Ignore the 'done' flag and use the real "done logic" from mountain_car.py

dennybritz · 2017-09-13T08:12:09Z

Looks good, but why did it train without this change if there really is a step limit?

sedand · 2017-09-13T09:04:12Z

Good question, for me it doesn't. When using gym commit a3683a7 (Sep 5) it breaks after 200 steps which produces the following plots:

Seems to me like this is also the described default behavior: openai/gym#336
... but this still doesn't explain why it worked/works for you ;)

sedand · 2017-09-13T09:43:34Z

Found it.
Since gym v0.7.4 the environment is returned as a 'TimeLimit' object (v0.7.3 still works).
The gym changelog states some backwards incompatibility issues for v0.7.0, but not that monitor wrappers are used by default. ( https://github.com/openai/gym#what-s-new )
This in turn leads to the step limit enforcement, as now a monitor is attached by default.

This also breaks the DQN Breakout Playground.ipynb on env.get_action_meanings():
AttributeError: 'TimeLimit' object has no attribute 'get_action_meanings'

I've found a simpler workaround for the step limit by overriding the private variable of the TimeLimit wrapped env, so I don't think you should merge my pull request.

env = gym.envs.make("MountainCar-v0")
env._max_episode_steps = 4000

You can reproduce it by checking out gym commit 1d85657 (v0.7.3) and verifying that everything works, while it breaks when using gym commit 1d85657 (v0.7.4)
https://github.com/openai/gym/releases

markusdumke · 2017-09-13T09:50:41Z

You can also use env = gym.make("MountainCar-v0").env which has no time limit instead of env = gym.envs.make("MountainCar-v0") (Stackoverflow).

HardPlant · 2019-05-12T19:25:26Z

@markusdumke this worked, thanks!

ghost · 2020-10-28T16:47:53Z

Or you can do

Workaround for environment max step limit of 200.

53558b6

This makes learning nearly impossible, if we do not continue to play after 200 steps, until a reward is received. Hacky solution: Ignore the 'done' flag and use the real "done logic" from mountain_car.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workaround for environment max step limit of 200. #107

Workaround for environment max step limit of 200. #107

sedand commented Sep 12, 2017 •

edited

Loading

dennybritz commented Sep 13, 2017

sedand commented Sep 13, 2017

sedand commented Sep 13, 2017

markusdumke commented Sep 13, 2017

HardPlant commented May 12, 2019

ghost commented Oct 28, 2020

Workaround for environment max step limit of 200. #107

Are you sure you want to change the base?

Workaround for environment max step limit of 200. #107

Conversation

sedand commented Sep 12, 2017 • edited Loading

dennybritz commented Sep 13, 2017

sedand commented Sep 13, 2017

sedand commented Sep 13, 2017

markusdumke commented Sep 13, 2017

HardPlant commented May 12, 2019

ghost commented Oct 28, 2020

sedand commented Sep 12, 2017 •

edited

Loading