Guidelines for choosing rw.sd values #142
-
I'm not sure how to choose the values for the Many thanks! Marie |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments
-
Good question, @MarieAugerMethe, and another excellent candidate for the FAQ! To explain, it's necessary to give some background on how the algorithm works. Even if you understand this already, the text here can be a rough draft of the FAQ entry that will result. As ever, your feedback will help me help not only just you yourself, but other users too, by helping me improve the FAQ and other documentation. IF2 (implemented as Some insight into the algorithm's effectiveness is afforded by considering that the effect of the added perturbations is to smooth the likelihood surface. This smoothing out of the local structure is a double-edged sword. On the one hand, it makes the large-scale structure of the surface plainer, which is useful when one is far from the MLE. On the other hand, it makes it impossible to find the exact MLE when it is close by. The key thing to understand is that there are two parameter-space scales at work. First, there is the scale dictated by the distance between your starting guess and the MLE. This can be hard to know a priori, obviously. Second, there is the scale over which the log likelihood function changes appreciably. [Recall that a unit change in the log likelihood is considered "appreciable".] Again, this can be hard to know a priori and it can be very different in different regions of parameter space. In particular, it can be quite different in the lowlands far from the MLE than it is in the high country near the maximum of the likelihood. Moreover, there is a tension between these scales: If you choose the random-walk intensities too small, it will take more IF2 iterations to reach the MLE; If you choose it too large, the random-walk noise will obscure the fine-structure of the likelihood surface. So, back to your question: What to do? My usual practice is to follow a rule of thumb. Since parameters in the sorts of models I have worked with tend to be rates (or probabilities), it is plausible to imagine that multiplicative perturbations on the order of a few percent to the rates (or odds ratios) will lead to relatively small effects on the model behavior. Of course, this is famously not the case in general! Nevertheless, it suggests perturbing the rates (or odds ratios) on the log scale with random increments on the order of a few percent. The idea here is to err on the side of small perturbations, counting on cheap computing power to perform the IF2 iterations needed to achieve the MLE. At any rate, this is the reasoning behind the usual choice I make of setting The second part of your question has to do with identifying the parameters for which one wants to modify the random perturbations. Commonly, one has some parameters that affect only the initial values (i.e., the values at t = t0) of the latent Markovian state process. Clearly, it is useless to perturb these parameters when t > t0. Indeed, it's worse than useless since the perturbations will impede the progress of such parameters toward their ML values. Declaring such parameters to be initial value parameters using I hope this helps. As I say, let's work toward refining this advice into a useful contribution to the FAQ! |
Beta Was this translation helpful? Give feedback.
-
Thank you @kingaa! Again what a fantastic answer. From the paper, I had a similar understanding of IF2, but your description clarifies many points for me and I had completely missed the information on the scale of the parameters. Just a few things to confirm (both are maybe slightly tangents, sorry). This is more curiosity, my understanding is that the perturbation was the difference between IF2 and IF1, but A more important question, is whether we should set the values in the Thank you again! Marie |
Beta Was this translation helpful? Give feedback.
-
No, both IF1 and IF2 use perturbations of the parameters, drawn down gradually. In fact, these are quite similar between the two. The chief differences between IF1 and IF2 are:
As for your second question: thank you again for bringing up an important point I had forgotten to mention. The parameter transformations take the parameters to and from the "estimation scale", i.e., the parameter space visible to the estimation algorithm, whatever it is. In any of the estimation algorithms in pomp, setting the However, it is confusing to keep track of two scales. For this reason, pomp allows you, the user, to interact with the parameters on the "natural" or "model" scale, i.e., the scale on which you've written the model. In particular, To answer your question as directly as possible: the As an aside, it would be nice if pomp would give you a way of specifying the Also, you speculate that parameter transformation is the only method available to enforce constraints on the parameter space. It is the only method for which the user is given facilities, but there is no hindrance to using the barrier method, for example. By the way, if you can suggest edits to make the help pages more transparent on these points, I would be grateful! |
Beta Was this translation helpful? Give feedback.
-
Ok, got it! Thanks @kingaa! Here is some slight wording changes that I think simplifies the explanation (at least to me) and hopefully are still true. I have bolded the changes. One thing I'm not sure how to make clearer is that the pertubation occurs at the time steps of the model (so if I'm modeling the population size of a species every year, you would get different values for the weight of the particles at each year, right?), while the cooling occurs at the iterations level (right?). I tried my best, but maybe it's more complicated now? And actually, I'm not a 100% whether the starting values of the new iterations is as I described it... As a note in addition to adding this to the FAQ, which would be fanstatic. I would maybe add a line in the getting started with pomp page under the mif2 box and link that FAQ page IF2 (implemented as Some insight into the algorithm's effectiveness is afforded by considering that the effect of the added perturbations is to smooth the likelihood surface. This smoothing out of the local structure is a double-edged sword. On the one hand, it makes the large-scale structure of the surface plainer, which is useful when one is far from the MLE. On the other hand, it makes it impossible to find the exact MLE when it is close by. The key thing to understand is that there are two parameter-space scales at work. First, there is the scale dictated by the distance between your starting guess and the MLE. This can be hard to know a priori, obviously. Second, there is the scale over which the log likelihood function changes appreciably. [Recall that a unit change in the log likelihood is considered "appreciable".] Again, this can be hard to know a priori and it can be very different in different regions of parameter space. In particular, it can be quite different in the lowlands far from the MLE than it is in the high country near the maximum of the likelihood. Moreover, there is a tension between these scales: If you choose the random-walk intensities too small, it will take more IF2 iterations to reach the MLE; If you choose it too large, the random-walk noise will obscure the fine-structure of the likelihood surface. So what values should you use for the intesities? My usual practice is to follow a rule of thumb. Since parameters in the sorts of models I have worked with tend to be rates (or probabilities), it is plausible to imagine that multiplicative perturbations on the order of a few percent to the rates (or odds ratios) will lead to relatively small effects on the model behavior. Of course, this is famously not the case in general! Nevertheless, it suggests perturbing the rates (or odds ratios) on the log scale with random increments on the order of a few percent. The idea here is to err on the side of small perturbations, counting on cheap computing power to perform the IF2 iterations needed to achieve the MLE. At any rate, this is the reasoning behind the usual choice I make of setting In addition, one has some parameters that affect only the initial values (i.e., the values at t = t0) of the latent Markovian state process. Clearly, it is useless to perturb these parameters when t > t0. Indeed, it's worse than useless since the perturbations will impede the progress of such parameters toward their ML values. Declaring such parameters to be initial value parameters using ivp results in perturbations that are only applied at t = t0. More generally, there may be parameters for which perturbations should only be applied at certain times. The rw.sd function allows one to specify (via a simple R expression in terms of time t) which parameters fall into this category and when and how much they are to be perturbed. Thanks!! Marie |
Beta Was this translation helpful? Give feedback.
Ok, got it! Thanks @kingaa!
Here is some slight wording changes that I think simplifies the explanation (at least to me) and hopefully are still true. I have bolded the changes. One thing I'm not sure how to make clearer is that the pertubation occurs at the time steps of the model (so if I'm modeling the population size of a species every year, you would get different values for the weight of the particles at each year, right?), while the cooling occurs at the iterations level (right?). I tried my best, but maybe it's more complicated now? And actually, I'm not a 100% whether the starting values of the new iterations is as I described it...
As a note in addition to adding this to the FAQ…