Resilient seeds for batched replication #113
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Prework
Related GitHub issues and pull requests
Summary
In
targets
, target-specific pseudo-random number generator seeds are deterministic and depend on the target names. So far, these default seeds have applied to entire batches of replications intarchetypes
target factories liketar_rep()
andtar_map_rep()
. This behavior has had the undesirable consequence of changing seed assignment when the batching structure changes (i.e. whenbatches
andreps
change while the total number of replicationsbatches * reps
remains constant).This PR assigns a special seed to each replicate. These new seeds depend on the parent target name and the total rep index. As long as
batches * reps
remains constant, these seeds will not change if you change the batching structure. In other words,tar_rep(name = x, command = rnorm(1), batches = 100, reps = 1, ...)
now has the same output astar_rep(name = x, command = rnorm(1), batches = 10, reps = 10, ...)
. Seeds are available in the output of most target factories through the"tar_seed"
column.The affected functions are:
tar_rep()
tar_rep2()
tar_map_rep()
tar_map2_count()
tar_map2_size()
tar_render_rep()
For
tar_map2_count()
andtar_map2_size()
, it is also possible to generate your own seeds incommand1
and use them incommand2
. Similarly, you can supply seeds intar_render_rep()
via theparams
argument and then use them in the R Markdown report. Likewise intar_quarto_rep()
withexecute_params
.tar_quarto_rep()
is ready ontarchetypes
' end for seed resilience, but Quarto itself does not currently have a flag to set seeds.