You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is not a bug report but more like a new feature request.
We know that the theta is updated after the agents' every interaction with the bandit. What I want to ask is that is it possible to save the "trained" agent with the theta for later use on another dataset. The logic behind this is that the trained agent acts as an oracle/ground truth of the environment, then I want to add a benchmark full information model based on this oracle.In this way, I can look at what is the maximum reward I can theoretically get if I initiate my offline evaluation with this oracle, without knowing the ground truth until the ends of my simulation.
Basically, to achieve this goal, I need to save the trained agents with the thetas, and break the thata updating chain and hold the thetas unchanged when used for another dataset.
Thank you so much for your help!
Best,
Han
The text was updated successfully, but these errors were encountered:
hanchenresearch
changed the title
Saved the trained agent and break the updating when applying to new data
Saved the trained agent and hold the thetas unchanged when applying to new dataset
Mar 22, 2020
hanchenresearch
changed the title
Saved the trained agent and hold the thetas unchanged when applying to new dataset
Saved the trained agent and hold the thetas unchanged for simulation on new dataset
Mar 22, 2020
Dear Robin,
This is not a bug report but more like a new feature request.
We know that the theta is updated after the agents' every interaction with the bandit. What I want to ask is that is it possible to save the "trained" agent with the theta for later use on another dataset. The logic behind this is that the trained agent acts as an oracle/ground truth of the environment, then I want to add a benchmark full information model based on this oracle.In this way, I can look at what is the maximum reward I can theoretically get if I initiate my offline evaluation with this oracle, without knowing the ground truth until the ends of my simulation.
Basically, to achieve this goal, I need to save the trained agents with the thetas, and break the thata updating chain and hold the thetas unchanged when used for another dataset.
Thank you so much for your help!
Best,
Han
The text was updated successfully, but these errors were encountered: