From d1b9a51ea51f8a6eec3ada1041c54553293fdedc Mon Sep 17 00:00:00 2001 From: cmansch <50928670+cmansch@users.noreply.github.com> Date: Mon, 25 Sep 2023 09:07:53 -0400 Subject: [PATCH 1/4] New vignette for sim_fixed_n() in parallel Adding a new vignette to show users how to set up the backend topologies for parallel runs of sim_fixed_n(). The schema.png is included in the vignette. --- vignettes/parallel.Rmd | 266 +++++++++++++++++++++++++++++++++++++++++ vignettes/schema.png | Bin 0 -> 21668 bytes 2 files changed, 266 insertions(+) create mode 100644 vignettes/parallel.Rmd create mode 100644 vignettes/schema.png diff --git a/vignettes/parallel.Rmd b/vignettes/parallel.Rmd new file mode 100644 index 00000000..308a8044 --- /dev/null +++ b/vignettes/parallel.Rmd @@ -0,0 +1,266 @@ +--- +title: "Simulating time-to-event trials in parallel" +output: rmarkdown::html_vignette +bibliography: simtrial.bib +vignette: "%\\VignetteIndexEntry{Simulating time-to-event trials in parallel} %\\VignetteEngine{knitr::rmarkdown}\n" +--- + +```{r setup, include=FALSE} +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>" +) +``` + +## Overview + +This vignette demonstrates the ability to implement `sim_fixed_n()` using user-defined backends to parallelize simulations. We will consider the backends supported by the [`future`](https://future.futureverse.org/) framework. + +The backends supported by the future package include: + +* `sequential` - default and non-parallel backend. +* `multisession` - uses multiple background R sessions on a single machine. +* `multicore` - uses multiple forked R processes on a single non-Windows machine outside of RStudio. +* `cluster` - supports external R sessions across multiple machines. + +You can also choose other backend types supported by additional future extension packages, such as the HPC job schedule backends from `future.batchtools`. + +The function `sim_fixed_n()` provides a simulation workflow for a two-arm trial with a single endpoint. +We can vary the parameters of the trial using different functions outlined in the documentation. +This function now provides users the opportunity to implement their simulations using the previously described parallel backends to accelerate the computation. + +## Background + +Without specifying a backend, `sim_fixed_n()` will execute sequentially. +The sequential execution will run all `n_sim` iterations within the same process or session of R. +In order to execute in parallel, we must define the environment prior to calling the function. +Setting your seed prior to calling the function will ensure that the results are reproducible. + +Suppose you want to investigate the duration of a trial under two possible enrollments strategies. +Both enrollments are piecewise, but have varying durations and rates. + + +```{r libraries, message=FALSE, warning=FALSE} +library(simtrial) +library(tibble) +library(future) +library(doFuture) +``` + +```{r set-plan-sequential, echo=FALSE} +plan("sequential") # ensure that the backend is sequential +``` + +```{r enrollments, fig.height=4, fig.width=6, fig.align="center"} +set.seed(1) + +n <- 5000 + +enroll_rate1 <- tibble(rate = c(5, 20, 10), duration = c(100, 150, 150)) + +enroll_rate2 <- tibble(rate = c(10, 15, 30), duration = c(150, 175, 75)) + +x1 <- rpw_enroll(n = n, enroll_rate = enroll_rate1) +x2 <- rpw_enroll(n = n, enroll_rate = enroll_rate2) + +plot(x1, 1:n, "l", col = "blue", + xlim = c(0, max(x1,x2)), + main = "Piecewise enrollments", + xlab = "Time", + ylab = "Enrollment" +) +lines(x2, 1:n, col = "orange") +legend(250, 1500, legend=c("Enrollment 1", "Enrollment 2"), + col=c("blue", "orange"), + lty=c(1,1)) +``` +We see that *Enrollment 2* enrolls individuals more quickly than *Enrollment 1* at onset. +Later, *Enrollment 1* will outpace *Enrollment 2* before eventually being overtaken again. +As such, we want to consider how the duration of the study changes under these enrollments. + +## The sequential run + +Naively, we can execute these simulations sequentially. +We set the target of a total enrollment of 3000 individuals with the trial ending after observing 700 events. +We use `timing_type = 2` to return the correct trial duration. + +```{r confirm-sequential} +set.seed(1) + +n_sim <- 200 + +start_sequential <- proc.time() + +seq_result1 <- sim_fixed_n( + n_sim = n_sim, + sample_size = 3000, + target_event = 700, + enroll_rate = enroll_rate1, + timing_type = 2 # Time until targeted event count achieved +) + +seq_result2 <- sim_fixed_n( + n_sim = n_sim, + sample_size = 3000, + target_event = 700, + enroll_rate = enroll_rate2, + timing_type = 2 # Time until targeted event count achieved +) + +duration_sequential <- proc.time() - start_sequential +``` + +A message automatically appears in the console that indicates what backend is being used for processing. + +The calls to `proc.time()` allow us to evaluate the computation time of these procedures. +This function provides three outputs, we will focus on user and elapsed time. +User time represents the CPU time spent evaluating the function and elapsed time represents the "wall clock" time spent by the end user waiting for the results. + +```{r sequential-time} +print(duration_sequential) +``` + +We can see that the CPU time is `r sprintf("%.2f", duration_sequential[[1]])` and the elapsed time is `r sprintf("%.2f", duration_sequential[[3]])` seconds. +These provide our baseline for the computation time. + +As you may have anticipated, we see that for a lower number of events, enrollment 2 has a shorter average duration of +`r sprintf("%.1f", mean(seq_result2$duration))` +over that of enrollment 1, which is +`r sprintf("%.1f", mean(seq_result1$duration))`. + +We also see that there is a distinction between the duration of the study under the proposed enrollment strategies. + +```{r sequential-display-results, eval=FALSE, echo=FALSE} +seq_result1 %>% head(5) %>% kable(digits=2) +seq_result2 %>% head(5) %>% kable(digits=2) +``` + +## Setting up a parallel backend + +If we instead, wanted to run more simulations for each enrollment, we can expect the time to run our simulations to increase. +As we vary and increase the number of parameter inputs that we consider, we expect the simulation process to continue to increase in duration. +To help combat the growing computational burden, we can run these simulations in parallel using the `multisession` backend available to us in `plan()`. + +We can adjust the default number of cores with the `future` library function `parallelly::availableCores()`. +The multisession backend will automatically use all available cores by default. +To initialize our backend, we change our plan. + +```{r multisession} +plan(multisession, workers = availableCores()) +``` + +## Execution in parallel + +Once we have configured the backend details, we can execute the same code as before to automatically distribute the `n_sim` simulations across the available cores. + +```{r confirm-multisession} +set.seed(1) + +start_sequential <- proc.time() + +seq_result1m <- sim_fixed_n( + n_sim = n_sim, + sample_size = 3000, + target_event = 700, + enroll_rate = enroll_rate1, + timing_type = 2 # Time until targeted event count achieved +) + +seq_result2m <- sim_fixed_n( + n_sim = n_sim, + sample_size = 3000, + target_event = 700, + enroll_rate = enroll_rate2, + timing_type = 2 # Time until targeted event count achieved +) + +duration_sequential <- proc.time() - start_sequential +``` + +```{r time-parallel} +print(duration_sequential) +``` + +We can see that the CPU time is `r sprintf("%.2f", duration_sequential[[1]])` and the elapsed time is `r sprintf("%.2f", duration_sequential[[3]])` seconds. +The user time here appears to be drastically reduced because of how R keeps track of time; the time used by the parent process and not the children processes are reported for the user time. +Therefore, we compare the elapsed time to see the real-world impact of the parallelization. + +To change the implementation back to a sequential backend, we simply use what is below. +```{r plan-sequential} +plan(sequential) +``` + +We can also verify that the simulation results are identical because of setting a seed and that the backend type will not affect the results. Below, it is clear that the results from our sequential and multisession backends match completely. + +```{r compare-results} +sum(seq_result1 != seq_result1m) +sum(seq_result2 != seq_result2m) +``` + +*Note:* A parallel implementation may not always be faster than a serial implementation. +If there is substantial overhead associated with executing in parallel, sequential evaluation may be faster. +For a low number of simulations or available cores, it may be preferable to continue computation in serial rather than parallel. +We leave it to the end user to determine this difference based on the resources available to them. + + +## A nested parallel example + +We provide an additional example using a nested parallel structure for users with more extensive resources, such as high-performance computing clusters, available to them. +Because these resources are not commonly available, we will not execute the below code herein. +Consider that you have two, accessible nodes, each with three cores (shown in the diagram below). + +![Available Resource Schematic](schema.png){width=90%} + +Ideally, all available resources will be used when executing the simulations. +To do this, we need to correctly define our backend using `plan()` and run the same code as previously. +The different structures, or topologies, for a backend can be changed with a more in depth explanation given in the [future topologies vignette](https://future.futureverse.org/articles/future-3-topologies.html). +Our below example follows closely their example. + +In our below snippet, we consider the two nodes named `n1` and `n2` and create a function to select the number of cores to use on those named nodes. +While trivial here, a courteous user of shared machines would specify fewer than all available cores and can do such using a modification of the below code. +We then implement our backend using a list that follows the hierarchy of the available resources. + +```{r nested-topology, eval=FALSE} +nodes <- c("n1", "n2") +customCores <- function(){ + switch(Sys.info()[["nodename"]], + "n1" = 3L, # Modify here for number of cores on node1 + "n2" = 3L, # Modify here for number of cores on node2 + ## Default: + availableCores()) +} +plan(list( + tweak(cluster, workers = nodes), + tweak(multisession, workers = customCores) +)) +``` + +The function `tweak` is necessary to override the inherent protection of nested parallelism, meant to help avoid overloading one's resources by errantly starting too many processes. +Because of the need to tweak backends, the message echoed to the console for nested backends reflects the highest level of the nested hierarchy. + +With the backend in place, we then can run the identical code from before using all available resources and return the same results as before. +```{r confirm-cluster, eval=FALSE} +set.seed(1) + +enroll_rates <- list(enroll_rate1, enroll_rate2) + +seq_resultc <- foreach::foreach( + i = 1:2, + .combine = "list", + .options.future = list(seed=TRUE) +) %dofuture% { + sim_fixed_n( + n_sim = n_sim, + sample_size = 3000, + target_event = 700, + enroll_rate = enroll_rates[[i]], + timing_type = 2 # Time until targeted event count achieved + ) +} +``` + +Then, we reset the `plan` to sequential to avoid accidently continuing to execute later calls within these resources. + +```{r plan-sequential2, eval=FALSE} +plan(sequential) +``` diff --git a/vignettes/schema.png b/vignettes/schema.png new file mode 100644 index 0000000000000000000000000000000000000000..0ff12a99caa2dffc8d9b14fd799dd7f911393f52 GIT binary patch literal 21668 zcmeIacT^PHwl`X25tO8WihzKCTcS!14O^lTMacr9l9U`Jwz5@_C?KMQMv)96k_3bX z2`UPrG^hky2?C9Pahjp@aXV_Si-bKRxH(K7$`B?^D`;B9A+G#^L|aI;!idBS>i+)A~8| zRz?pU3vUG3*@XU&>ZN<$1q3;*psS&NI>2Uf!0__fk5~AA7Fy8KKjnUX*w9E`NKT)D zPQYtuCu_q&u9#={%|%(;qAF#z_Ler#CWwD02eb`cZ(3fcj)+?i0$B03rl(qVx z4*!mj)l0kK8xf;NVTlwj^uI*oY5CFLU3%24=}9d2^I3xd#b;Be*5f5@?PZCJvJ;l)_;3Ubh_g* zh?-wVy3$>n^!8$A{U?0Xh^xc@>^J{^E3yB7^#8YY^+$g$KG}u_xiH#a?LEGG_>H&> z5`R?i|K9#cUz`8<;S#FenNnK6wpui7y{CSYy%jD&K}r?;hHu`y=?-ULImLwVOTzY3 zt28h$C}84}c*Hq8`Sa(Knt&O57b;}9X_pG{*Db?ZP})V@SoJ$ zc(0|Tq}1KpYl+_4`1eKkx&_lfIwYP2z90L8hmX%}Zf-80Zqj}iKN7!h^Wt;L>csf& z@^@8u^5pz>gr5cd2GiKs*a9+{Ot`if84}>{FIRyeIq%RfI1K;TvE?>mc5}-!Lj-Ch zMFM@zWp1+ep$z3Kr(7>ze#D71`0N2gbD?Prb7FHSCiEULjrL=&|K)<1nAqt!UX>H# zqT$9c>P}QNIaAu0)!V~-nzxLxA0MARDUM*rrNDZnC}wSlo)w~ghr{435v2VHdhL&_ zjTE{qnIN6Y=pgwk%R_O6lEL4Y(%%~!DQ3YXXKD1-`&GnHtJ%1F=V0nm;^< z6V0i8*WRAZiKrdj9R0-@#^L=$2|fd)d+a3$GSgFQL;B557M8*&F3B92+Aun`V^hUd zUlP{OJA_PF5l@sL$o+F*b~^KhhK5{1Lscv6WcSd)U9Zs>7>6zShuyVl{BdMEE8>*E z1~WY$O$jIVe*T=_)6)YBZ|8-My>~8sI#{HRn$j1x=03g~X}Ew|=NNfK#R6Gb*;7&K z*3Ud19JLe^b{k%t8+)HMFR|xzus>$?eca|Nz=UJ9`k!ApT4!%$yFSQOmi~vLoSfO& zx`4`X0pNY}9H$OxmXkX|Q>g=sfv>>oHKrkuX z;B)Nk{rvq4QN^=o<&v~1k~mwFWI4%da%W@SaN}47w+6v5cx|%E1itOL9u++z{GEmJ zm8HOzGJ6-kUEqKZWqpOYHMl5nL$xHq>zB+NM>tSK_k!&RSh zy;A>dho+D|F>s<7od)0LG@kG`dwd@qExgIXmtz`KTvU`oQF){t$6H}P;)rCbpb}{X z5$aNQNaSF@)Q2cl^iF=M3mV@YzFLc4tA#COr@VQS!H#F&ii(TRz{bq)4&AqJUw3<| zoJo{AHz-DQA}S7Z%@5kM?jF*s)8NAL&)nE2=jNlz6~5NtU!Mr-7i|D~%Uy^HNWUcS zaKZ&t(F?gKW5S%Yw`qB-g+AzKF(9p)=w17=Hg0f4;0 z;Q~r@K5LQa0XiIIkdU61cAANaX))$NQoTFuBXn^a`A5SJl1j)U zSJl;}TQ&#J*7%?+y!@v$)K0&HqDomGm)}b)de5h zr~}hIt^j*xacRkfDWzQgJHe0^xp8j06l#Wdy@eeK|Nf_u&41HRf2uYN6R&f)wFzC+ z*Cv1bIDP5TrF=RB89EsHP-sx(z8x*UlpiXlFQIFbARQ!g-wcPo6txsbhMn=lLbDQ6 zD=Vwb2M&Zj)EE>I22*2aUs+Y9WLO3c)m7R>q+OG`_V^;>w~ zn8AAUGq6c9S7wKG7w(ha=v%>G%&m8geatzpAR|51oIzWBrRExPzFj*cP~Fhk4?N7;$><|jhP(QnkPVo?{T zmX?+Z=SJVcX(ZMz_tTpc!^eYc>luC5%!j4g}6asNfjAAiSCb>)%e*|QaI zRS-mtZ%~8;(|Ee;O~3o5BSVmZ2;7+Vjt0HFb5xFH=uPj~F^!*_4SfV*L@uc@w<}kw zP5MFH*OnP;I53`^CfKV#Ca0&(V3Q?RGl0~NuQyw4twJY}N%FZ--3Dc?yT9KWB;6p! zj!_RA6frnT&FU0`3j00|SM~c-3;?BDiX>MPLCU}6#wLVZHLdX)xpi?00J{q-<^jE( zFyRUIn{{1aMQ?BK3AoSU>N4XE!O;6>*|GOLsKURs99deL8Yt`l$5{zD`^XFjto^an zO)1l7RRg|otPbLmMzFJyIJ;wmxu3UY#yt=vS#M76!jKcu!M2NpUV9&jbBLm~6Si)! z3kGYVWLOS6DRG&xReSx|z9MSoU5=if?~g3O63ay39Kvb&jUJhQ5B2sw3ag8ZX|11D zHNhS|w@pd!2KvS}!O&~3SuuCsY*Wj0>g(%E&$SQ{5kYkmab!a;H|Paiqf5MaF9^YU zMd0k7HXDwBD^Fq6*4FO+@ZpqiaBvz-2=U;YF}d<$yT5#~`K{$6@$i=p5nS-8;>V9I;SnU0WyWL=Ncmm`sO$i^ z(_i0CaC1Q%{@lTok^ve^f{knXj2dxboZ--mTh3445@uJ$^XJwBwf^rknUPGoW<#m7 zgSm^hVFNO8>k(1w_q8$H#mx=%NBr7@A}OjfCbfA~+v1f2=kqKrEb;+${oc4PT!OF* z7&(VWDCD*7?2~uFA7j%ZaQu>kx$9t%J#K(xw}2Tc*(4038dl=5b+tW63&HMfHoS6_ zPM+h&ljHS|ZR#vtU0ngZAoB-8+dvT-;_dIP#l*z?vSeL87PLYfw?6$DWPvV|#h{3jQ&Ry;BEstHv=>=1GTPKqf-Jnsd0PMs!MZgYA`7~RgM*{v%gwpa z&v2-zuw(b!iuzU39WL8#|i>Ie_1vU-6=Qe0(%rawdQ;5z{A;fBUdHESw;{pKM36*bm)*ez|)1h%4 zdQSIN02Zn-M2EMCx?jNC+q=85QC$YJx!=+4!O`HZi&Vbtiu%+N#npw*1Y864@(#uZ zPQ@@va0MPeEdUu1Fr&y)niySckQ;wcMParB4nnaX44I2+s}ahY;6CijF8CeOL= z?Jnof3s$t9m%C{0>-KX{_h((ulA_^_2@fio-qtHjU;QI-UoQ{ldg&sZLK1eK2M*sJ zNK$VxDZF#%4(WZCw)08%8I6g?aE6KuhiWVpI`yBRJj>!UCMz&NS#`BVaY;$8sltyO z=E1SCJWFe9z2e4xd>FpQXZ(~)dOIDn)4ox0Y*{2uZT;9`c%;0l$_%!Cj_IBsIU|9U z^%~2Gn3cp1%?qFI^bQX0<|;kcoLKZSjxV?8vvmWu((?;7ctMR+@E*yub`J&x7DRGc5KiOHFn>ckY2{NMyjnc|b>&T&1A<9eBQ*Hpf4o{WT=`>W*f^XcF1IUy&hE*R`hmA zKuh9br3U`u2o9Fpc4BZooNt}KZts9$=xic@R;cSn`1>}#OA94oDlnUq zl66S{ouwuKz@V4}#y(f&Hc$XC^Aywl88BfAb$j*Fi2$BXmS`=_zLCr|LlFK@dqoiM zn=MqP)iV1Pke6<;@@M@wPoF)LHxiHHHsC#6+Re7+m40wKC~`OP;h4AFXh-_Vk`9M3 zIxkQ|DOCV%08e;?knvA_eTAlGW+sO*eqB$ST}%K8bSH`!en|fE5IYHu-(^8|ENo4T zOv(7A7bxh{1vxbCvQvEcSdFLLcl-4hdG;0g;o;#Oi4ScCU!PIdEeM*~K0~YzS@Uy& z(8G^;7OY^33XCeWIqHg;XJ@N!i)OTNGli2Y7nJZcWEPw!+%=RWgrqOm5jCg8jNb@Q zYQt~6=(wdsgBSs95{|@i-#OF}ti?=a11L}VecSbcq;qJVpyc`N31Z+L&#p!AS-o5* zkAJ#~Dt`*k3`aKj?DlPHszc*H2f0<5UHw)MeYW7r{lsb52C;!MA@K-;uwsn2&zO9U zMjqwD)&--DsmOWhcEUipks96SnR{T&%&4{sfw3oe zA0!WK^{@_gyyc((r#Bh!&X_)2(4vm-wZGyHfCa%)nhn(q#He31pkd=-v&WB2kN(IR z)?l;BbWN*5a&d31tX!0Q_wL`O z`MwaQKo|9uG}Ej7X-fVQb&9EToNB7ghHv+7x|>IAK|w*?B$A7Z`e*bW zK2j!Zp+LNzylS56`T-W$M8 z!IB%J9tafPx*48XWyMep2~g<_0sgdwDK-2;3LH{9bC9EFqN1X@-@i9@QQw1pKngXp z5jft1UVI;z1(ACNO-)TMSFR}ho-8wXdj$IejYp;WV9=CIFqNqBQtZc2MdJWw&^E_R z+&OGkfLRX5cyKTwMka)+T90pkR+N_Bu0+rwX`(IqpV`z0R|n3I8RynjR+>WeHvT|}tg=H#`H~=5t;RoSG6hoe2 zV%UMH2leYCl2!|xcFcf&TEC7b5f>H~%18WL76Fm~n}uQ-5Wu$GBZhQik4^1IK9G0$>^&2Z(@HDqpnMQhBy^gp}F&*wUn z(gS4?WIadkgx=ib=;CA_8hyA>Nr|FgF#P$>M9oV7sHzzmw=oyi-%K4NO~?Jpn6NNm zidiWS=nMqqB2e3LvtyC-r>kzX1B6hz{zeW@RLbzyUrYl0xR z120A#32nTy)ouc7mLF47`sesrJwJW3^G814wof3qb z3t%(jt?|RRxBVDR_y3Z=xVQ*FJ5$Veg3@B|0(uPMd3eR673 zwlhbk9%^VfR<|})+kL5|F4=MbQfGQAPH z@LB}-NZiq6iN)?I&j+dAXT zSuddbpr-HBCnFZfBGy%X2`rrRDC6_pK!hPNi>1$J*tV&{U4NTGe~`_we+?CLIN7}MIr zeI4=`&ZsoQ+ffw7iRgoN+1+}`Yp2IBpW!H!v(yLC* z%FnUgktR`qYMb5*ey02RYYb9nVOw3WcXicQQA09i1|M9&58M9y%5Jv`d*#SOUVfyL z-c$176rWlUtakjpngwJOl7QjiGz-YpdRV4b-2n_1VmaDZ@qyR|dB}ncp?53YgS#K7 zlc>;)2jr=r6-ek^5Mx1&@5bQO z3(6_Oa?KBugg6(fL&mG!XPE} zt1d3ST)rC*l^Kn4z&4Qp%I9nL=^%feY!P6ClzzVE>%l{P-Gyc*g|(kV2(SlRCuc|A zRFEhwFD{I}!x@TaYM>$dRfZWmtRINlQX(gmmxt%s=lfKpgErCg6WS_jUoAk$O}Fle z7_XJ^9dzjs=FC3L=H47(>**8(Ls8k+XbMrCuMr)&6^On4b23+a0ZC6C&;2OgR`dfi zRHJ`Mk!ox?hwG(ZP{P!7-I9%CH!*aO*gS?&eX|mupxMv%vR=M?zaG^M>gJ?q%z6{1 zsKQcgayM6x=<_~QGaK^k1Btke6Ec`+4A(Vvtx}S;#61B zsNhFYr&cvO&HJpr8>-7+!@(|14b2%y$WC zceKxE8KmsRCcQ3A_utU~Yv@(~@>S&AfwIz{p z0qmo*nm^b^IXsY|q=(@OF>TLu!dADFo z(RQnhXDI~?+Ysi%Ii&6fQ+!f+B8C}0yiz|VC$q)%qF}Um`IXR-!?w3+#f%6vMP>Fl zbX}c|B`pUqb{_y#7y1)JGq)hDbLBN)zC|M0^*0Bf? zDKIYHv*NGj-eHIq%h`9$lQ9xC;HR8VdTZ^7P3!A>V1|i($`W0(G5j$9-TV0^nb-9P z`36=!6UdmU5Z_Z?6dvOxNr$WRLsT;X^115nY;hXmb1AQz(PGoa9|kqj?Z?@lF#BFw z)0mZ$sL|giBk$$re$$35-=-=>q9jIbeYDV8auNS-tb#h~J*HTy0 zx-#*9=ip9Kqnh7Hz zy*Ch!B4BTDV&9rbdF|RA@Z{=ZLt*!nc z8D3u!`cnO@J~=&C_;b~8mKDa}=?0EGkhXZYYuYtw+RmCleiQD!H~J+Yzv4z_A4kad zTJ`HZE-La#&?)coey3u#S?Q~ZqW!iBkN`R~JKa-!jn7*XSz#z|92~S4;k_8Zj%`qT z$k~Ke>ukhta28sz{6tH@7ni9ruLG#4^`Ev0q>INajB5*0ZDSU5k=9T*t>Er0bW>@_ zBb(>?u_Ey|QP+4bbKFO*FV7I8EQ4Hqpqc*6T)LzEyi$v3ht9A;A@A3`h1>t#DeeN~ zl0VVUGkPo~x+G-sQuk8*Qe9+^m}Ct280`FRrN5ryX1?VOWo7*d4>7-Il@2oO&LD5i{iCf7c;w``LKh7hCV)Yu<^^# z%Q%xLU2lW%*O6C6r|}!1xg82*z!=e&eSIr>`}_68dxa$KOYeV%%OmFjY8fuSq65^s4wAYg z#l<5{T%_o@xeI?+lr*h0R2qnDb;Lr|)v>Jdk;_9x4DzIT(a}Y}vF^C6A2i_D`|{R_l|I5)EVYYN{|z2YJL;Mi>*69n5>OS~`dPw71!&QklO zpM47=u4U}LW(XKj_z#!XmLv|k+&;k^4$T7tp@Q3O0=``*ap9L+IvKwhhYS8K&a5d+ zwbP~#)qm>#TDvMV?ZQS}kZpRBZrD!Wjov1O9XX?Ml(D!1P_}*X2>?#@P7B?xpS@FJ zekWhk{xos2uL>jsr6VIiwoc|~^uyXiu(eBWA&p5tQFksNLhvChVVWG{oNcTW>xYI9 zv!|=3h)}1938ZqcNfHNaAjLoO5hU5#T%ebrImspyPSP8QHO5Nsr9x7Q^40*1PiLF@ zd+xxaX;iAWl>tudG^ZX0zYsH_VRf>O7K_?vGHX`F`*pKqv_6XAb9&VH&-?Aa^hd%o z3TWVP(;t$E`7PxNx#~9d=i>3pQFxu=&-PG)d}dxcwTV_cNIf*`U&C1dwP6_}SI@xK zgfbgM6z^^w+3ssyoN=zULRHoAl2D$DeB8bpuQg3@S>w}}5<*U@)+`JO zt%dckksd|Qc{l6a*=D+CY=6+O!#TcWWMrfxJkbIigx&xofJU3GM4X%FVH1rNhLVJR zwg0a;<<DD{H2N!1%iq6#wG*a)gf95mXzrkuk-OsU8Pz(SU#h)D zDr%xagC~lsAXMenO45z=Iz-kOz@uu;iG)X(A9GM_vAg*J9J)n>rjS^B27cmWiJ4i6 zjXro3a4`k|8tvpEXZ#qlJAismQc*kymB_G4ZS2B?X&N zc<9G}_7VT(f@+a@&~Wc~b#ZYqK|mX}zQkq<3GL*#2ziuO21T!S3BH1tD;_T_EHn+I z$5Lq=ib1_X_2Cr;DH}F?^YIjR^y62*%i%31zTX|^e#HAV)xG|nC88!7P9R6oRUzE? zh~p(^Ym+U}3f|M*hBo%c_^%Xz>wqX8LQ=#f&uVEJQXK$@^*W(wA;Kx{EStC&tF{Pv zw!*OWmtjR2K~vS`*V71)QlM%dx4TGK)%ic~{ifj90K3H)_h7+_np)^>@kO&E0{$>f zH)FR0TGn@ec0xmj({=MZxMR&o~(dPr{NUNCr$B$~kHv~sUI1JyN9k|$8xEKIsl6E?{~-YiPGx{A9}rDWVo zp0;sa#-DxN({fdrBCz;adH{Diyi zHPpW((6)!f=HN^I;Zr=P3lmR`dgral1-X5BZn4>3)+y{-O_@KnzU!+agk@I(>7@?b z{#0nf^2BzQ1vz1sJ7pVj=uy&v!A`{`T_jP;o3VlQ_?SOh0l<1f=t+7^t>!PizuaH| zufuuAC?JD)YfY$BRBFdjIlAR?Q4M@e&3#L?DTigB?i24Y8^OdtEc%6Av?#W|FZ`(= z4HW)^OBnhTciyu&rB35s^TSRF^z)8%!LSu_kf_kwe(I{j+vV{BU$9_5a%{s+vX_IE zNfL6JLSv`@VE2ue&iigS9PdBea@wO*bhuM?b#E{1XmD(4k;!cLIcBPy6){WD};v z%J2~3pb1$|H#*3pve_ksl^wQ!&&}xcomT3Q7UJjmyFs$e0cL$r*>}%kBnH=Af)Dw) ziRBf(pS*HH>DB0~E4a8rmr>^-^Z9q6W7}4$A0mBF@x-mS33-h2Ol2Sb=>zPzX67e* zilKMLha*bay4owdtoKrXkuo`QQu+=RcDCi!ZlqPv!Jyg4BU7$)%q|-FaXx9cSla&#Ai8e!WcEL`0`zT}i1e zY*<5RBq;0s^hu|9wtOT&7-Rqb8#mP_ZR^WZM92!ODeFe&iurO3c@D7~UWkp0gF3(o z?{6=zLl((1a`?K1^G_@L_wRL(9*^-4+4n4C7;7Rqre3`Gv!b>(wUXfN<3l)fd9s-a zY2bILYgUW4xn&#=6|jd{iAa&No7-cE!}l}v+Pk|?+u{N>#Q!qy7rt?l<#0y1@|9Vk z3VN<-n0B<6D>A0JU$X3j6HceHWqzT{cSS%V=0g9I=7;Bh9->)U*ts+g{!Nb}@%`p( zb7;-;=fMtPFAaY*p$(iHH5q^^(^iOk=SLy$JdR(sa3rsFVO%|W))tTU8)6*EvAFfn zK!ph+jxs({6c)CcE8Y<}of~GLG_mn+MC7qTZu)||!N$B-4}K|e)KJcm+}DMvOIw&W z+$e9cm>ziNf$?N%bB9vyMi*pY(^ga}W^T@E@YGaR`ZT2 zdFWKKLlslf!Hk5S;+9=>eOVqsnDBMEiLpihlcLWH#+H_0M-zQFVS4P!sqHc>vkG2c zM#Dmm5-lob{|VD>uHcBKS_iveA?)Q_+*byYhwcSt7`!j8b#G;Ulj8h2&fF(hkBawH z|M27D;%v@g!D~-yVE3fiEXX>gUk|O{AW!F-jYKkqsCCpudVX%Jh#P02I^K~G@aI|o>k>%HMYdxfcicFTfMwxREs{xOxX`7`54XAx4JBaLSif&l`{4O-v##e4?ccQ?I&dRP{I_# zao$C0Yozy(v{5v)+jZfhrA^>?_B@Q&MNcZ~IJWJjPBoy#R4AO6AlNPe{@61%=UCwd z`nF_}8I+A9G*b^!$vc$NSlx+qUYWg%PqXRfikt|&p)0mHn+r-ezB@bLYU$mr2T14H zr1wTF@#>l(tD=T<`Y6EKdV%j$m$_AXY1FwjAE)!w5Y3d=s5&2jo}ko3qgA_Qsuiv-Hl(%SG=CML+gW6N8&CB4p*{c$ZJ7E`@}F+-QbTOX*4^@<51B7nj5-T*r9Dx?UonKan`C`fwG38Ty$MVLvveqX+S$aCnu+t&1ab}CRKkvF8fKy)?iB^Fxn5k z9zYXRm*)TL&p$iu-AwLgj}{`UrfzzlQ=|*B|Ne`$U0Fs$zL%poMl)aISq%%d*+N>ZsNI0kUEmBk4k=8C8 zkior?mOhCFep0kVuxZ-z#&>s0j{65s&WxHQ;q6K8?(UEhc8rnVUx8mEt6tfPWWjr# zP?`Saetj;pEGj_#_fq!`GnV*snma1Y^Tewz*jdFo&K}_0A46-`9$g^Wj;;qE#Ymuf~x!q_lYvINUW@FUR%`$8GCz{e` znST_>X;70Lk@_k-JG;KIt;~1BVBf!+4n<)w;}0$BMTq{h3X}%R8PTy4&3!yztW!$@ zMA}-UI^}wz8)HTfxzKx#+wfj(@sx~9*wR|a>{hnygo15S;Fr{`;CJj3@02$|oXcmZ zHO8`n=9`!&A6L||A1FfT>H?Ga*{E$Hmn#aZr>k5S)gZ6+5-HbJb}md@kAfUs@*)FL zU~sCWUY&&FE$xS7HQGsCpgegACB7pS44~-u5FBX+|zsVyH zOfS9O3dlr9f{jU6+_!?Utn)$3?;h~->lavQyPo>nxeGwgmvhh7o8-qUEocws-kqJ! zaWgvd?`AdWO(^E#&7wOm#1;K;iZG|^^k(MmWzmtqpzG@E=e#aL;EvlW-xS%MP^oawD4v+qxt6(yPJ1hAjs1p zH>#j$KM}Y$t13q(D#FVeA6n-=&Y4Npqv|l~YI-tBJ#5c;<80OYM>zujPK931DYTvq zq*=`mdM76Vi+OKHwslybG_}<68|P@MBfbuGo7qByC$MaEa~sks!X5~5mtpagePjHa zyOOz_%#t}cZRY~TNXN_&Xjk0b^kl;NCzbFat%U`vNGToIeara_I$idpEBzmJssFqJ z_k3q$meTi~*baGaB=M+@O2cMR&Mu%7-%>scCF^oJ(W`s0;;dB5ZEYutjpds*7csB4 zlQXek&A<2Gt=~E>3-Wh5pLlq9@)%y3H-}c0m!H0L;MqAaB@VJlUgYtXZ4I0OtF*1# z_GL1HZBVlT^c<}b6NrsI`NN;>W-EFqh)m3xCK&M-zkR5^6XO>Pl+id`EUaU;bhGZ&0o#t^#$;Xs0yngY~exx3`fr462&d1jjC z_O~jrVa0bFZn*V7yKRYN?Rhk1t0zv4}1oSBqco~`Pd@q3H9efQSza!1=j zYihkm4Sb=pPPjJ1dh^JoUXZ9J^%KN5cT&Ai;jF%v-cZCi54kyS$WwJ@zl?msuUNvU zV>F;=D@&>U($5=r9*W*aKa}7+RU5){n;*PQs05w6jfiQa1)pEFNx!4?pg8SuYsplM zm6+982SwiXxxFL&Qk-`2x2<)I4!k`!*N}#!L3f^=JeUmZG6V}qb zqtLEcybr-bEsioPxp4&Y8#Q52=zxYzel93(kf@HFF&yetWa>h?4pN!NKck9ZQ%u0B z?MHJMGOlBg!$`k2gs6o=`<)_&tWnpH*GE6Z>6s_Q4^`zSQl7Ih-^eb(NF8KJ6r(~l z+VKHTnULYbT2wuIXj!9VdG_zm=8TS$-!+>{Z`VkAOk#BX<|D(4TB04+yLRA`j3)&1 zIeH+=Z=C+bOZ8>h-)22`4$@aNY4Z!|pE(aYJoGEqIGXMJ1`MGf!o=BiwZx|Gz9q6S zL7{_`M5c;8Z4a72CO`wn(M@g>cZuje4#w{~)YMnd7Lyx<{dh&b5fZFmiMGH^mAXDp9n(HX5qDa_(RBV7m|h?CCLe(eHpf zw2(RgTy5QJ|N6-r3gG@-#OHT!2y5-6Hd2Ka2&fZ=pv_N(wZaqBF?)G*e?_RG|1$C_ zPVnV~D$lOEO^BWDiV-c>kPg93Dq_i-&|h)h98iR5w#{z7aZLoN}_0fa}YI- zZhifCHh1{@>cT_?r~RidUml-1%8K8XKtd4J`b}4E(F^e`Mb? zUm7JE;H!-5!#$xX0&BNjO6>tOLLdUA{%AsnWT2vkeF{0KU3p+E($n`tn?a%?hcDsW*)bgdy1pZbP!WNeM(&ok`uj|^hx z`8?fRVxK6nQAlNDeGYAsIyG^bOTs#D=gysw*c1g0jG8*BXXPFaSF+i|(k1QUkz~@T z?f;)DOkV$}E@8%`36HjlG6|$FBF&^mw7o?fIRPCyM)9)6^LHq?ESxlDL| zUCwYV3%GomROOS! zKQ48bpTcl0M1R;&;5#0^UM#zxhA(X0bK$yV@B>qo2GN6Fz%VPP!^%RI-|0am`Ud?* zwnS+tj*d;sR^Ufg>+hSAMjvJvuVO~eJH_L>E6-X%dw4z}ADd&du0MpM4l!@O?Swt}JgKq`|ve&nxAu^yDva`O=&xNgzxpV5E=hUF*;z%Ck4T&*?4ZrukIs~zG`T*XiveT; zr0=XLK9u7n$RoJE3TfI*6*)p(M3E zRu&fHgiqtpgVRYM9%}O}HyOXXWUb8z%f=mhzV^!2q9?>orhK=`?XgWdu-RfmwMmsd zvRkg-Hz5r6Ib<1A{;FBpBfVY9K>~>v0p8$|(=G$}Q+o~)p8uz}eTbdZ`5`VY?whl= zIQxQ0@-!5IBw??=iia<{2kRwy%Z*s;k2X6ApNDKs4cg)ByY%fQA9PB@9|BE&`+TwY zQV8zJO$hh`U+hGVK_64`Cb81DsWEgx|QbNKM2sHWK16kT}ih>MOufTt(38O`rKl@8bk}L5-PBxEq zuUQkhnmjMdNp5QKnc32_|5V+Pv!deXKpU#Q4i7&(t$Lb-r1_~%uqK<_~t zrPQ`+FCybe!!hVi>r{m&^ehPLQN7^1V*CgcK}-&#tmNiGOWqjz6-5o|Q*^sG_KZ$Oi<8>7kcPz`=uRJB^$n2Ip zCbM7iK?%k=iu8}&*x$T zu^!NOn|NWnhcNTG+zZnChoSuRox{?zZ0$ePItCAozFti2IsLqo;PV_zeFsn9#gyx) zsgs#_>Bj*rT}q-rAKU;OcWR4k2s~CP4hg#?2?={}OJjO0_cVPc27gsVJnzq4Skphu z%A!v<1p7Trq4JFs)215q#7)*Lso1W5xLbgd0`g_JM|Ym!dOfp53=4A@5^065$4s`F zT>;L{`6f#jdzx7(cWpzVMG}mG^eJ{EvmLD$qMW5opMdRBUH{p1F9w&w3=NRd#%mI$ z_<_E@lU5E*&?Aku<$^Xqn1N*HS+RJWKY#xF)Q$&_EztY7IBR9V49c|x+CZ@vY&bu5 zEIDuOBeBRZ*Ckwd&8ZXLO$d6k5Hj-CzjJ5ogGNd+c~#DyQ-eVBqYANndo6-pG+cSx zeHDndJHF=aL6rP0F=Ln6X&cG=mvJp6t#-`5uQFH@=fc0?^wpwujv*G_1A zH_cyPPQ>C;fEJGu$;!lTJbf9zn1E)yQQBtl)Zh(ym*Gy_`&}!Sp2pFWt*WgZ8A5E! zp4hJK?0t|Q>M~Kf-vBSSw%`aP=+n@&4!eN7wp0jB;ZO=zSWr1u+`Jt5I&o8X%TOBR z>Px7rk*Y^CPFg{C#sFS4dUHd}3#PQ$CWagvBesnwDvhn(zfSH2Dd{hmfE)5yt@~W^ zZV_60c^{ zA0O+({hSJLy@4o{X%7vl^z5N|n?HTd@v0v6$DYl-WZfT0*tIE$(d2&L5@2zp?P9kdCmvH4Nt%a<>ohAw%u zTrktcNBPEAc?6!%0ro1Z-*QonW~MhQQ)u%GK2Qfa<*xi+tLRnTv2(TF;T(Q(fBA6} zGNI?Z5;XrTPEbgo5i)hsx%D@Bf~UN3g($J6N;vf46*!ULGw@I+#qw^~a(ivyC|zOD z05m9U3f?BuMj@53Gt{h8ObifTfpzo^Ur{;vkQ>#CO?b!*H-TU(o>X<3k8&GshmjLUe>FJ5ok zC{j#S&asn6qA%I{UVisDAZ7LPz|#~>wu`5MyP6f*vTb9;mXaMfRXzHF)8RJsG*4(e zb%pr1Y}HLd;Pnzr;{kdBGfxmVWVrwwF+#l;npK>JN){artrE1dj z24%X%@F=upXD-QON)!UI|`hTvG3V; zZP+opwSXVI2igj{zkf%2=@E});CE}RpN7LF>~e}qO3a{n5M|A)$pOjG942g9dd=Y< zefvnS#yV{~JEj7+$#TxT0<8Sap06#dh9D??@0j5_!EK9MJ^l>X(YK!=t9;W!k;M=;cd1)!26YTpBw z2K=Ed09dU8p{qAR4+sKg5FI-POjdl{;;q6H1M8o0 zX*P@(nK7vcj?LJ)%|#{;4-3Hrk*{Kdxxsrjmzm!kJ-P-6oCUuq`oOqjoD~jji`x_) ze^}{TGV~D_{F;$7D@GmoT34VIF*65UYG^zE=cR@W2zt1IdMytxuj%Fi7I3PHQ0O&3 zHjF+Fpz?9Fe}0Rfu}RnM5&^dGBk0Bin#OOE_A%hRBZB6(wk7C^DVq{#HtfpU;>i0! z;j3_y2#8>Hfp=dEj9utiY~cPeHD!t(weU~pCb(puidlZmbo*CzW3 z4m$YFEsohIxAmO@p*6k}rY=+-oKe^dm%Y8qENyL}>s~y~lnS|DRa$Di1!F3KyL_Bl z%G>a6D}8U1^KQfrn2VjNgSiemNM*zCKuT>2eGK?-w37m%E;nfHPBwq|1oBQFnY#KM zh{cHBU0@X-Unej_V)WifRLVe7+Ao3_8D6;x%g-=l!VBru?||@8$}xD-q{_GSKsolL zqCuP(fcK7DKE$#iOqLY)gqRrEOPTkfF8Av+2*$5+@USQ#KBz(dGqVji&J_$Syw4&) zmk9i!)DHLdV(Feiq+WrZ`f>4T>XBQJCL}2lR}2=)aAYype8rYbNeJ*m$P>0D5;o6M zDtoeSfQknF8(#4*7OqduO#+|@q|MEv1qQOiuT?1a`lSEHx98_9FoQAU%#wUQPAewpAq24h zTy}hiw)=GaKok=xgvUKO#pa4~L$vSGpjuY;` zy~&^zv$k%wSEW2@Q9IYaRm=`wnTZQa1J>=o;Sg&`cLY#>70}vgP+}?*p@`?|l5NeN zLP@Jhn(G<{K(~$Bi#_T9OW-YiX5nDg6`ne53zS|l2?@XCvXV=oq4i~ZtuV&TvXn{r=EN^OE|o+( z6GE82x5h8!B^oa|-viA@3?p&i=JPMJV|`@CP8?5!9ky%c*aQP+Sv@yk%GtAVb@e0O z<}8{+oUfDMg2Q++U>3UubC2jEAG%+^*7XH+`;KS6%TDf~h=d)2D4$%Xb>zT|Gz#$h z?A}A>?zyDk4j%0bFn&9LV}0qL&S{7jWK?sBj506!2{=T}eNcp7&ILIVe=eDxj{yvp zG3XH%tR#PE8?e!6`UVDyp>ZL>vi(VF0tfmfBw&^UVf;!wazXSrAyz*FOyV;vP|3Fe zu~lz92_+Hx`S_ehgCWBVWnC*_l+SG4fFRsSK$$%V$6RQtardI^$T$7A&pZzN@|GfY z_oI;u7l0swo>{U0i&cJc9iFze#k!AjmCuNadp4Xiil2S!yY63^GOwIS5R-n4L+kx4P1^O;!J>UYnba- zUC=P}&>cJ2kj`LcAmwd-7SgnT%MR%D_ip`1(zG# Date: Mon, 25 Sep 2023 09:20:31 -0400 Subject: [PATCH 2/4] Update parallel.Rmd editing vignette index entry --- vignettes/parallel.Rmd | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/vignettes/parallel.Rmd b/vignettes/parallel.Rmd index 308a8044..e39a81ca 100644 --- a/vignettes/parallel.Rmd +++ b/vignettes/parallel.Rmd @@ -2,7 +2,9 @@ title: "Simulating time-to-event trials in parallel" output: rmarkdown::html_vignette bibliography: simtrial.bib -vignette: "%\\VignetteIndexEntry{Simulating time-to-event trials in parallel} %\\VignetteEngine{knitr::rmarkdown}\n" +vignette: > + %\VignetteIndexEntry{Simulating time-to-event trials in parallel} + %\VignetteEngine{knitr::rmarkdown} --- ```{r setup, include=FALSE} From ecb041460a1b4106ea429deed29eb52d30c26617 Mon Sep 17 00:00:00 2001 From: cmansch <50928670+cmansch@users.noreply.github.com> Date: Mon, 25 Sep 2023 09:46:35 -0400 Subject: [PATCH 3/4] Update _pkgdown.yml adding parallel to the articles to fix pkgdown data_articles_index() call not finding new vignette --- _pkgdown.yml | 1 + 1 file changed, 1 insertion(+) diff --git a/_pkgdown.yml b/_pkgdown.yml index 5c7cd2b4..57a1d2d6 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -65,6 +65,7 @@ articles: contents: - arbitraryhazard - modestWLRTVignette + - parallel - pvalue_maxcomboVignette - simtrialroutines - workflow From 2efc9e7c80bdc9c6e9b227e892f917b06b54473b Mon Sep 17 00:00:00 2001 From: Nan Xiao Date: Mon, 25 Sep 2023 14:37:15 -0400 Subject: [PATCH 4/4] Improve parallel vignette style --- vignettes/parallel.Rmd | 220 +++++++++++++++++++++-------------------- 1 file changed, 115 insertions(+), 105 deletions(-) diff --git a/vignettes/parallel.Rmd b/vignettes/parallel.Rmd index e39a81ca..7b2e9dbe 100644 --- a/vignettes/parallel.Rmd +++ b/vignettes/parallel.Rmd @@ -1,7 +1,6 @@ --- title: "Simulating time-to-event trials in parallel" output: rmarkdown::html_vignette -bibliography: simtrial.bib vignette: > %\VignetteIndexEntry{Simulating time-to-event trials in parallel} %\VignetteEngine{knitr::rmarkdown} @@ -16,7 +15,7 @@ knitr::opts_chunk$set( ## Overview -This vignette demonstrates the ability to implement `sim_fixed_n()` using user-defined backends to parallelize simulations. We will consider the backends supported by the [`future`](https://future.futureverse.org/) framework. +This vignette demonstrates the ability to implement `sim_fixed_n()` using user-defined backends to parallelize simulations. We will consider the backends supported by the [future](https://future.futureverse.org/) framework. The backends supported by the future package include: @@ -25,24 +24,23 @@ The backends supported by the future package include: * `multicore` - uses multiple forked R processes on a single non-Windows machine outside of RStudio. * `cluster` - supports external R sessions across multiple machines. -You can also choose other backend types supported by additional future extension packages, such as the HPC job schedule backends from `future.batchtools`. +You can also choose other backend types supported by additional future extension packages, such as the HPC job scheduler backends from future.batchtools. -The function `sim_fixed_n()` provides a simulation workflow for a two-arm trial with a single endpoint. -We can vary the parameters of the trial using different functions outlined in the documentation. -This function now provides users the opportunity to implement their simulations using the previously described parallel backends to accelerate the computation. +The function `sim_fixed_n()` provides a simulation workflow for a two-arm trial with a single endpoint. +We can vary the parameters of the trial using different functions outlined in the documentation. +This function now provides users the opportunity to implement their simulations using the previously described parallel backends to accelerate the computation. ## Background -Without specifying a backend, `sim_fixed_n()` will execute sequentially. -The sequential execution will run all `n_sim` iterations within the same process or session of R. -In order to execute in parallel, we must define the environment prior to calling the function. -Setting your seed prior to calling the function will ensure that the results are reproducible. +Without specifying a backend, `sim_fixed_n()` will execute sequentially. +The sequential execution will run all `n_sim` iterations within the same process or session of R. +In order to execute in parallel, we must define the environment prior to calling the function. +Setting your seed prior to calling the function will ensure that the results are reproducible. Suppose you want to investigate the duration of a trial under two possible enrollments strategies. -Both enrollments are piecewise, but have varying durations and rates. +Both enrollments are piecewise, but have varying durations and rates. - -```{r libraries, message=FALSE, warning=FALSE} +```{r dependencies, message=FALSE, warning=FALSE} library(simtrial) library(tibble) library(future) @@ -50,41 +48,45 @@ library(doFuture) ``` ```{r set-plan-sequential, echo=FALSE} -plan("sequential") # ensure that the backend is sequential +plan("sequential") # Ensure that the backend is sequential ``` ```{r enrollments, fig.height=4, fig.width=6, fig.align="center"} set.seed(1) n <- 5000 - enroll_rate1 <- tibble(rate = c(5, 20, 10), duration = c(100, 150, 150)) - enroll_rate2 <- tibble(rate = c(10, 15, 30), duration = c(150, 175, 75)) - x1 <- rpw_enroll(n = n, enroll_rate = enroll_rate1) x2 <- rpw_enroll(n = n, enroll_rate = enroll_rate2) -plot(x1, 1:n, "l", col = "blue", - xlim = c(0, max(x1,x2)), +plot( + x1, 1:n, + type = "l", + col = "blue", + xlim = c(0, max(x1, x2)), main = "Piecewise enrollments", xlab = "Time", ylab = "Enrollment" ) lines(x2, 1:n, col = "orange") -legend(250, 1500, legend=c("Enrollment 1", "Enrollment 2"), - col=c("blue", "orange"), - lty=c(1,1)) +legend( + 250, 1500, + legend = c("Enrollment 1", "Enrollment 2"), + col = c("blue", "orange"), + lty = c(1, 1) +) ``` -We see that *Enrollment 2* enrolls individuals more quickly than *Enrollment 1* at onset. -Later, *Enrollment 1* will outpace *Enrollment 2* before eventually being overtaken again. -As such, we want to consider how the duration of the study changes under these enrollments. -## The sequential run +We see that *Enrollment 2* enrolls individuals more quickly than *Enrollment 1* at onset. +Later, *Enrollment 1* will outpace *Enrollment 2* before eventually being overtaken again. +As such, we want to consider how the duration of the study changes under these enrollments. + +## The sequential run Naively, we can execute these simulations sequentially. -We set the target of a total enrollment of 3000 individuals with the trial ending after observing 700 events. -We use `timing_type = 2` to return the correct trial duration. +We set the target of a total enrollment of 3000 individuals with the trial ending after observing 700 events. +We use `timing_type = 2` to return the correct trial duration. ```{r confirm-sequential} set.seed(1) @@ -94,58 +96,62 @@ n_sim <- 200 start_sequential <- proc.time() seq_result1 <- sim_fixed_n( - n_sim = n_sim, - sample_size = 3000, + n_sim = n_sim, + sample_size = 3000, target_event = 700, - enroll_rate = enroll_rate1, - timing_type = 2 # Time until targeted event count achieved -) + enroll_rate = enroll_rate1, + timing_type = 2 # Time until targeted event count achieved +) seq_result2 <- sim_fixed_n( - n_sim = n_sim, - sample_size = 3000, + n_sim = n_sim, + sample_size = 3000, target_event = 700, - enroll_rate = enroll_rate2, - timing_type = 2 # Time until targeted event count achieved -) + enroll_rate = enroll_rate2, + timing_type = 2 # Time until targeted event count achieved +) -duration_sequential <- proc.time() - start_sequential +duration_sequential <- proc.time() - start_sequential ``` -A message automatically appears in the console that indicates what backend is being used for processing. +A message automatically appears in the console that indicates what backend is being used for processing. -The calls to `proc.time()` allow us to evaluate the computation time of these procedures. -This function provides three outputs, we will focus on user and elapsed time. -User time represents the CPU time spent evaluating the function and elapsed time represents the "wall clock" time spent by the end user waiting for the results. +The calls to `proc.time()` allow us to evaluate the computation time of these procedures. +This function provides three outputs, we will focus on user and elapsed time. +User time represents the CPU time spent evaluating the function and elapsed time represents the "wall clock" time spent by the end user waiting for the results. ```{r sequential-time} -print(duration_sequential) +print(duration_sequential) ``` -We can see that the CPU time is `r sprintf("%.2f", duration_sequential[[1]])` and the elapsed time is `r sprintf("%.2f", duration_sequential[[3]])` seconds. -These provide our baseline for the computation time. +We can see that the CPU time is `r sprintf("%.2f", duration_sequential[[1]])` and the elapsed time is `r sprintf("%.2f", duration_sequential[[3]])` seconds. +These provide our baseline for the computation time. -As you may have anticipated, we see that for a lower number of events, enrollment 2 has a shorter average duration of -`r sprintf("%.1f", mean(seq_result2$duration))` +As you may have anticipated, we see that for a lower number of events, enrollment 2 has a shorter average duration of +`r sprintf("%.1f", mean(seq_result2$duration))` over that of enrollment 1, which is `r sprintf("%.1f", mean(seq_result1$duration))`. We also see that there is a distinction between the duration of the study under the proposed enrollment strategies. ```{r sequential-display-results, eval=FALSE, echo=FALSE} -seq_result1 %>% head(5) %>% kable(digits=2) -seq_result2 %>% head(5) %>% kable(digits=2) +seq_result1 %>% + head(5) %>% + kable(digits = 2) +seq_result2 %>% + head(5) %>% + kable(digits = 2) ``` ## Setting up a parallel backend If we instead, wanted to run more simulations for each enrollment, we can expect the time to run our simulations to increase. -As we vary and increase the number of parameter inputs that we consider, we expect the simulation process to continue to increase in duration. -To help combat the growing computational burden, we can run these simulations in parallel using the `multisession` backend available to us in `plan()`. +As we vary and increase the number of parameter inputs that we consider, we expect the simulation process to continue to increase in duration. +To help combat the growing computational burden, we can run these simulations in parallel using the `multisession` backend available to us in `plan()`. -We can adjust the default number of cores with the `future` library function `parallelly::availableCores()`. -The multisession backend will automatically use all available cores by default. -To initialize our backend, we change our plan. +We can adjust the default number of cores with the function `parallelly::availableCores()`. +The multisession backend will automatically use all available cores by default. +To initialize our backend, we change our plan. ```{r multisession} plan(multisession, workers = availableCores()) @@ -153,7 +159,7 @@ plan(multisession, workers = availableCores()) ## Execution in parallel -Once we have configured the backend details, we can execute the same code as before to automatically distribute the `n_sim` simulations across the available cores. +Once we have configured the backend details, we can execute the same code as before to automatically distribute the `n_sim` simulations across the available cores. ```{r confirm-multisession} set.seed(1) @@ -161,38 +167,39 @@ set.seed(1) start_sequential <- proc.time() seq_result1m <- sim_fixed_n( - n_sim = n_sim, - sample_size = 3000, + n_sim = n_sim, + sample_size = 3000, target_event = 700, - enroll_rate = enroll_rate1, - timing_type = 2 # Time until targeted event count achieved -) + enroll_rate = enroll_rate1, + timing_type = 2 # Time until targeted event count achieved +) seq_result2m <- sim_fixed_n( - n_sim = n_sim, - sample_size = 3000, + n_sim = n_sim, + sample_size = 3000, target_event = 700, - enroll_rate = enroll_rate2, - timing_type = 2 # Time until targeted event count achieved -) + enroll_rate = enroll_rate2, + timing_type = 2 # Time until targeted event count achieved +) -duration_sequential <- proc.time() - start_sequential +duration_sequential <- proc.time() - start_sequential ``` ```{r time-parallel} print(duration_sequential) ``` -We can see that the CPU time is `r sprintf("%.2f", duration_sequential[[1]])` and the elapsed time is `r sprintf("%.2f", duration_sequential[[3]])` seconds. -The user time here appears to be drastically reduced because of how R keeps track of time; the time used by the parent process and not the children processes are reported for the user time. -Therefore, we compare the elapsed time to see the real-world impact of the parallelization. +We can see that the CPU time is `r sprintf("%.2f", duration_sequential[[1]])` and the elapsed time is `r sprintf("%.2f", duration_sequential[[3]])` seconds. +The user time here appears to be drastically reduced because of how R keeps track of time; the time used by the parent process and not the children processes are reported for the user time. +Therefore, we compare the elapsed time to see the real-world impact of the parallelization. + +To change the implementation back to a sequential backend, we simply use what is below. -To change the implementation back to a sequential backend, we simply use what is below. ```{r plan-sequential} -plan(sequential) +plan(sequential) ``` -We can also verify that the simulation results are identical because of setting a seed and that the backend type will not affect the results. Below, it is clear that the results from our sequential and multisession backends match completely. +We can also verify that the simulation results are identical because of setting a seed and that the backend type will not affect the results. Below, it is clear that the results from our sequential and multisession backends match completely. ```{r compare-results} sum(seq_result1 != seq_result1m) @@ -200,36 +207,38 @@ sum(seq_result2 != seq_result2m) ``` *Note:* A parallel implementation may not always be faster than a serial implementation. -If there is substantial overhead associated with executing in parallel, sequential evaluation may be faster. -For a low number of simulations or available cores, it may be preferable to continue computation in serial rather than parallel. -We leave it to the end user to determine this difference based on the resources available to them. - +If there is substantial overhead associated with executing in parallel, sequential evaluation may be faster. +For a low number of simulations or available cores, it may be preferable to continue computation in serial rather than parallel. +We leave it to the end user to determine this difference based on the resources available to them. ## A nested parallel example -We provide an additional example using a nested parallel structure for users with more extensive resources, such as high-performance computing clusters, available to them. +We provide an additional example using a nested parallel structure for users with more extensive resources, such as high-performance computing clusters, available to them. Because these resources are not commonly available, we will not execute the below code herein. -Consider that you have two, accessible nodes, each with three cores (shown in the diagram below). +Consider that you have two accessible nodes, each with three cores (shown in the diagram below). -![Available Resource Schematic](schema.png){width=90%} +```{r schema, echo=FALSE, fig.cap="Available resource schematic.", fig.align="center", out.width="90%"} +knitr::include_graphics("schema.png") +``` -Ideally, all available resources will be used when executing the simulations. -To do this, we need to correctly define our backend using `plan()` and run the same code as previously. +Ideally, all available resources will be used when executing the simulations. +To do this, we need to correctly define our backend using `plan()` and run the same code as previously. The different structures, or topologies, for a backend can be changed with a more in depth explanation given in the [future topologies vignette](https://future.futureverse.org/articles/future-3-topologies.html). -Our below example follows closely their example. +Our below example follows closely their example. -In our below snippet, we consider the two nodes named `n1` and `n2` and create a function to select the number of cores to use on those named nodes. -While trivial here, a courteous user of shared machines would specify fewer than all available cores and can do such using a modification of the below code. -We then implement our backend using a list that follows the hierarchy of the available resources. +In our below snippet, we consider the two nodes named `n1` and `n2` and create a function to select the number of cores to use on those named nodes. +While trivial here, a courteous user of shared machines would specify fewer than all available cores and can do such using a modification of the below code. +We then implement our backend using a list that follows the hierarchy of the available resources. ```{r nested-topology, eval=FALSE} nodes <- c("n1", "n2") -customCores <- function(){ - switch(Sys.info()[["nodename"]], - "n1" = 3L, # Modify here for number of cores on node1 - "n2" = 3L, # Modify here for number of cores on node2 - ## Default: - availableCores()) +customCores <- function() { + switch(Sys.info()[["nodename"]], + "n1" = 3L, # Modify here for number of cores on node1 + "n2" = 3L, # Modify here for number of cores on node2 + ## Default: + availableCores() + ) } plan(list( tweak(cluster, workers = nodes), @@ -237,32 +246,33 @@ plan(list( )) ``` -The function `tweak` is necessary to override the inherent protection of nested parallelism, meant to help avoid overloading one's resources by errantly starting too many processes. -Because of the need to tweak backends, the message echoed to the console for nested backends reflects the highest level of the nested hierarchy. +The function `tweak()` is necessary to override the inherent protection of nested parallelism, meant to help avoid overloading one's resources by errantly starting too many processes. +Because of the need to tweak backends, the message echoed to the console for nested backends reflects the highest level of the nested hierarchy. + +With the backend in place, we then can run the identical code from before using all available resources and return the same results as before. -With the backend in place, we then can run the identical code from before using all available resources and return the same results as before. ```{r confirm-cluster, eval=FALSE} set.seed(1) enroll_rates <- list(enroll_rate1, enroll_rate2) seq_resultc <- foreach::foreach( - i = 1:2, - .combine = "list", - .options.future = list(seed=TRUE) + i = 1:2, + .combine = "list", + .options.future = list(seed = TRUE) ) %dofuture% { sim_fixed_n( - n_sim = n_sim, - sample_size = 3000, + n_sim = n_sim, + sample_size = 3000, target_event = 700, - enroll_rate = enroll_rates[[i]], - timing_type = 2 # Time until targeted event count achieved - ) + enroll_rate = enroll_rates[[i]], + timing_type = 2 # Time until targeted event count achieved + ) } ``` -Then, we reset the `plan` to sequential to avoid accidently continuing to execute later calls within these resources. +Then, we reset the `plan` to sequential to avoid accidentally continuing to execute later calls within these resources. ```{r plan-sequential2, eval=FALSE} -plan(sequential) +plan(sequential) ```