[GSoC-Week2] Batch implementation of environments #145
Replies: 1 comment 1 reply
-
Work in progress at #146 Right now it is just single-threaded CPU code. Now we will need to think about parallelization. I am not clear on how the whole RL pipeline will actually work on multi-threading / GPU. Can environments run asynchronously within a batch? Does it mean that I am totally confused at this point 🤔 |
Beta Was this translation helpful? Give feedback.
-
Last week:
Last week, I implemented a simple mulit-agent environment called
CollectGemsUndirectedMultiAgent
, along with testing a random policy on it. I also implemented a lightweight REPL-only workflow for testing this environment, which allows us to not only play this environment, but also to record these sessions and replay them at arbitrary frame rates, all inside the Julia REPL.This week:
So far, all the environment structs provide a single instance. This week, I am going to implement a batch version for a grid world environment.
As long as there exist at least some algorithms that can leverage training on a batch of environments in parallel (and there do exist such algorithms, for example, some evolution-based algorithms), it is beneficial to have the abstraction of being able to execute batch environments. A single environment instance can be obtained by setting the number of batches to one (the implementation will need to be such that this doesn't incur a substantial cost over the present implementation of single instance environments). It is only after having such an abstraction that we can leverage multi-threading or GPUs or distributed computing in order to increase performance via parallelism.
I will start by trying to implement CPU multi-threading for the batch environments, and worry about GPUs later. This is because it is reasonably safe to assume that (a good enough CPU multi-threaded batch environment implementation) should be faster than (a single threaded single environment CPU implementation). Whereas some more tests might be needed before we can confidently justify the potential gains from a GPU implementation for batch grid world environments (one reason being that most grid world environments are already quite cheap to simulate on the CPU, unlike some game-based environments that may require rendering a frame of pixels for each step).
@findmyway @jonathan-laurent feel free to drop your suggestions/comments :)
Beta Was this translation helpful? Give feedback.
All reactions