Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding %doFuture% parallel framework for the sim_gs_n.R code and upda… #249

Merged
merged 7 commits into from
May 28, 2024

Conversation

cmansch
Copy link
Collaborator

@cmansch cmansch commented May 13, 2024

Implementation for sim_gs_n() to have a parallel adaptor similar to sim_fixed_n() (xref: #110).
A seed is now set prior to the function call instead of passed as an argument.
Function now uses %doFuture% to run iterations of n_sim in parallel.

Addresses open issue (xref: #205).

Sequential backend still available (and default) to avoid any delays in development.

Please update "Remove seed argument from sim_gs_n() in (xref: Merck/gsDesign2#288) as part of the implementation.

New pull request created after (xref: #229) merged.
See old PR as (xref: #230)

@jdblischak jdblischak self-requested a review May 14, 2024 01:19
@cmansch cmansch self-assigned this May 14, 2024
R/sim_gs_n.R Show resolved Hide resolved
R/sim_gs_n.R Outdated Show resolved Hide resolved
Removed a commented out line of code that is not needed.
jdblischak

This comment was marked as resolved.

@jdblischak

This comment was marked as resolved.

message("doFuture may exhibit suboptimal performance when using a doParallel backend.")
} else {
message("Backend uses sequential processing.")
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add test cases for both the doFuture and doParallel backends. I'm pretty sure the GitHub Actions machines have 2 cores available. And if we're worried about CRAN, we can always add skip_on_cran() for these tests of the parallelization method

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree on adding tests. CRAN enforces the max 2 core policy strictly so to avoid any potential trouble, always using skip_on_cran() is a good idea, just in case someone accidentally added something that used all cores.

@jdblischak
Copy link
Collaborator

Might be worth a brief mention in the vignette on parallelization that now sim_gs_n() supports the same parallelization behavior as sim_fixed_n()

@cmansch

This comment was marked as resolved.

…rror handling to "stop" in sim_fixed_n for parity.
…lobal parameters to avoid note from R CMD check. Adding test as an input to the foreach loop to handle the test in multisession Windows.
@jdblischak

This comment was marked as resolved.

@jdblischak

This comment was marked as resolved.

@jdblischak
Copy link
Collaborator

Any idea why we are now getting the error object 'results' not found? We didn't edit the vignettes, so I assume this is due to the change in the error handling behavior of sim_fixed_n().

Confirmed. When I revert sim_fixed_n() to use .errorhandling = "pass", I can build the vignettes locally.

@jdblischak
Copy link
Collaborator

I confirmed the error handling behavior of sim_gs_n(). It immediately fails when running in parallel and returns a single error, similar to sequential processing. I used Example 8 (multiple tests per cut), which isn't supported yet, and thus should fail.

library("future")
library("simtrial")

source("tests/testthat/helper-sim_gs_n.R")

ia1_test <- list(
 test1 = create_test(wlr, weight = fh(rho = 0, gamma = 0)),
 test2 = create_test(wlr, weight = fh(rho = 0, gamma = 0.5)),
 test3 = create_test(rmst, tau = 20)
)
ia2_test <- list(
 test1 = create_test(wlr, weight = fh(rho = 0, gamma = 0)),
 test2 = create_test(wlr, weight = early_zero(6))
)
fa_test <- list(
 test1 = create_test(wlr, weight = fh(rho = 0, gamma = 0)),
 test3 = create_test(milestone, ms_time = 20)
)

sim_gs_n(
 n_sim = 3,
 sample_size = 400,
 enroll_rate = test_enroll_rate(),
 fail_rate = test_fail_rate(),
 test = list(ia1 = ia1_test, ia2 = ia2_test, fa = fa_test),
 cut = test_cutting()
)
## Backend uses sequential processing.
## Loading required package: foreach
## Error: attempt to apply non-function

plan("multisession")

sim_gs_n(
  n_sim = 3,
  sample_size = 400,
  enroll_rate = test_enroll_rate(),
  fail_rate = test_fail_rate(),
  test = list(ia1 = ia1_test, ia2 = ia2_test, fa = fa_test),
  cut = test_cutting()
)
## Using 16 cores with backend cluster
## Error: attempt to apply non-function

Copy link
Collaborator

@jdblischak jdblischak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Still needs some examples and tests of sim_gs_n() running in parallel.

Also, given that updating sim_fixed_n() is causing problems, maybe we should separate the sim_fixed_n() updates into a separate PR.

@cmansch
Copy link
Collaborator Author

cmansch commented May 17, 2024

I have reverted the sim_fixed_n() error handling and created a new issue.

Updating with a parallel example will come soon.

@cmansch
Copy link
Collaborator Author

cmansch commented May 24, 2024

Commit added with an example using parallel processing that will be skipped by CRAN similar to in sim_fixed_n(). Similarly, a parallel test has been added that performs the logrank test with a multisession backend. The other future backends do not work on all machines and are therefore omitted from tests.

Copy link
Collaborator

@jdblischak jdblischak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @cmansch! I just had one minor comment about the test format, but I don't want to delay merging this PR. We can always address it in a future PR if desired

weight = fh(rho = 0, gamma = 0)
)
plan("sequential")
expected <- data.frame(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of hard-coding the expected output, you could instead set the seed again and run the exact same call to sim_gs_n() in sequential mode. That would be more succinct and wouldn't require any manual updating if we update the output of sim_gs_n()

@LittleBeannie
Copy link
Collaborator

Thanks @cmansch for leading this pull request, and I extend my thanks to everyone who has contributed to this exceptional effort. I will proceed with merging it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants