Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workaround for environment max step limit of 200. #107

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sedand
Copy link

@sedand sedand commented Sep 12, 2017

MountainCar-v0 environment has a default max_episode_steps limit of 200.
See: https://github.com/openai/gym/blob/master/gym/envs/__init__.py
This makes learning nearly impossible, if we do not continue to play after 200 steps, until a reward is received. (First episode takes ~4000 steps to complete sucessfully)
Hacky solution: Ignore the 'done' flag and use the real "done logic" from mountain_car.py
done = bool(position >= self.goal_position)

This makes learning nearly impossible, if we do not continue to play after 200 steps, until a reward is received.
Hacky solution: Ignore the 'done' flag and use the real "done logic" from mountain_car.py
@dennybritz
Copy link
Owner

Looks good, but why did it train without this change if there really is a step limit?

@sedand
Copy link
Author

sedand commented Sep 13, 2017

Good question, for me it doesn't. When using gym commit a3683a7 (Sep 5) it breaks after 200 steps which produces the following plots:

ep_limit_200_learn

Seems to me like this is also the described default behavior: openai/gym#336
... but this still doesn't explain why it worked/works for you ;)

@sedand
Copy link
Author

sedand commented Sep 13, 2017

Found it.
Since gym v0.7.4 the environment is returned as a 'TimeLimit' object (v0.7.3 still works).
The gym changelog states some backwards incompatibility issues for v0.7.0, but not that monitor wrappers are used by default. ( https://github.com/openai/gym#what-s-new )
This in turn leads to the step limit enforcement, as now a monitor is attached by default.

This also breaks the DQN Breakout Playground.ipynb on env.get_action_meanings():
AttributeError: 'TimeLimit' object has no attribute 'get_action_meanings'

I've found a simpler workaround for the step limit by overriding the private variable of the TimeLimit wrapped env, so I don't think you should merge my pull request.

env = gym.envs.make("MountainCar-v0")
env._max_episode_steps = 4000

You can reproduce it by checking out gym commit 1d85657 (v0.7.3) and verifying that everything works, while it breaks when using gym commit 1d85657 (v0.7.4)
https://github.com/openai/gym/releases

@markusdumke
Copy link
Contributor

You can also use env = gym.make("MountainCar-v0").env which has no time limit instead of env = gym.envs.make("MountainCar-v0") (Stackoverflow).

@HardPlant
Copy link

@markusdumke this worked, thanks!

@ghost
Copy link

ghost commented Oct 28, 2020

Or you can do
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants