Note that HTML for this class were all made from Rmd, using the distill blog format

There is also a newer format, also built by Rstudio (now named Posit) called Quarto. Quarto documents are very similar to RMarkdown, have broader support for additional programming languages, and will likely eventually replace the Rmarkdown format.

*The Rmarkdown for this post is [on github]( https://github.com/rnabioco/bmsc-7810-pbda/blob/main/_posts/2023-12-14-class-wrap-up-data-analysis-tips-and-resources/resources.Rmd)*

### Caching Git is a command line tool for version control, which allows us to:

1. roll back code to a previous state if needed

2. branched development, tackling individual issues/tasks After some accumulation of code, definitely put your GitHub link on your CV/resume.

Check out the quickstart from github: https://docs.github.com/en/get-started/quickstart/hello-world

### Example repos

- [this class](https://github.com/rnabioco/bmsc-7810-pbda) Also includes many annotation and experimental datasets built into R packages and objects (See AnnotationHub and ExperimentHub)

- https://bioconductor.org/
- Use `BiocManager::install()` to install these packages Read the Guide to RMarkdown for an exhaustive description of the various formats and options for using RMarkdown documents. Note that HTML for this class were all made from Rmd, using the distill blog format


The Rmarkdown for this class is on github


Quarto documents are very similar to RMarkdown, have broader support for additional programming languages, and will likely eventually replace the Rmarkdown format.


The Rmarkdown for this post is on github


You can speed up knitting of your Rmds by using caching to store the results from each chunk, instead of rerunning them each time. Note that if you modify the code chunk, previous caching is ignored.

For each chunk, set {r, cache = TRUE}

@@ -1618,10 +1645,10 @@

styler, clean up code rea } ")

See also the sessioninfo package, which provide more details:

@@ -1692,94 +1716,79 @@
@@ -1800,14 +1809,17 @@

Benchmarking, with m res <- microbenchmark::microbenchmark( base = read.csv(path_to_file), readr = readr::read_csv(path_to_file), - times = 5 + data.table = data.table::fread(path_to_file), + times = 5, + unit = "ms" ) print(res, signif = 2)

@@ -1822,8 +1834,8 @@

Benchmarking, with m }) p

Debugging R code

R has a debugger built in. You can debug a function:

@@ -1872,7 +1884,7 @@


Check out jsonlite

 json_file <- "http://api.worldbank.org/country?per_page=10&region=OED&lendingtype=LNX&format=json"
 worldbank_data <- fromJSON(json_file, flatten=TRUE)
@@ -1977,9 +1989,9 @@

Using R on the command-line

R -e "print('hello')"
Git and Github

Git is a command line tool for version control, which allows us to:

  3. roll back code to a previous state if needed

  4. branched development, tackling individual issues/tasks

  5. collaboration

@@ -2042,11 +2054,11 @@

Git and Github

This can be handled by Rstudio as well (new tab next to Connections and Build)

Put your code on GitHub


As you write more code, especially as functions and script pipelines, hosting and documenting them on GitHub is great way to make them portable and searchable. Even the free tier of GitHub accounts now has private repositories (repos).


As you write more code, especially as functions and script pipelines, hosting and documenting them on GitHub is great way to make them portable and searchable. Even the free tier of GitHub accounts now has private repositories (repos).

If you have any interest in a career in data science/informatics, GitHub is also a common showcase of what (and how well/often) you can code. After some accumulation of code, definitely put your GitHub link on your CV/resume.

Check out the quickstart from github: https://docs.github.com/en/get-started/quickstart/hello-world


Example repos (RBI)


Example repos

  • this class
  • valr
  • @@ -2068,7 +2080,7 @@

    Finding useful packages

    vignette("Gviz") # install.packages("eulerr") # from CRAN -plot(eulerr::euler(list(set1 = c("geneA", "geneB", "geneC"), +plot(eulerr::euler(list(set1 = c("geneA", "geneB", "geneC"), set2 = c("geneC", "geneD"))))
@@ -2081,7 +2093,7 @@

Finding useful packages



2,000+ R packages dedicated to bioinformatics. Includes a coherent framework of data structures (e.g. SummarizedExperiment) built by dedicated Core members.


2,000+ R packages dedicated to bioinformatics. Includes a coherent framework of data structures (e.g. SummarizedExperiment) built by dedicated Core members. Also includes many annotation and experimental datasets built into R packages and objects (See AnnotationHub and ExperimentHub)