diff --git a/img/learn_to_learn.png b/img/learn_to_learn.png new file mode 100644 index 0000000..7ad6b90 Binary files /dev/null and b/img/learn_to_learn.png differ diff --git a/img/mdp_distribution.png b/img/mdp_distribution.png new file mode 100644 index 0000000..a985e49 Binary files /dev/null and b/img/mdp_distribution.png differ diff --git a/tutorial.ipynb b/tutorial.ipynb index 6f0bc61..35e1cae 100644 --- a/tutorial.ipynb +++ b/tutorial.ipynb @@ -186,7 +186,19 @@ "
\n", "
Fixed tasks for evaluation ❗
\n", "-Randomly sampled tasks from a task distribution for meta-training ❗
\n", "\n", "We generate them from the original, nominal optics, adding a random scaling factor to the quadrupole strengths.\n", "\n", - "$\\implies$ What is the difference in episode length between the benchmark policy and PPO?
\n", "$\\implies$ Look at the cumulative episode length, which policy takes longer?
\n", "$\\implies$ Compare both cumulative rewards, which reward is higher and why?
\n", - "$\\implies$ Look at the final reward (-10*RMS(BPM readings)) and consider the convergence (in red) and termination conditions mentioned before. What can you say about how the episode was ended?
" + "$\\implies$ Look at the final reward (-10*RMS(BPM readings)) and consider the sucessful (in red) and unsuccessful termination conditions mentioned before. What can you say about how the episode was ended?
" ] }, { @@ -555,10 +574,14 @@ "- We have a meta policy $\\phi(\\theta)$, where $\\theta$ are the weights of a neural network. The meta policy starts untrained $\\phi_0$.\n", "\n", "meta-batch-size
in the code) from a task distribution, each one with its particular initial task policy $\\varphi_{0}^i=\\phi_0$.\n",
+ "\n",
+ "We randomly sample a number of tasks $i$ (in our case $i\\in \\{1,\\dots,8\\}$ different lattices, called `meta-batch-size` in the code) from a task distribution, each one with its particular initial task policy $\\varphi_{0}^i=\\phi_0$.\n",
"\n",
"