Skip to content

doomhammerhell/pipeline-r

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pipeline

Data analysis pipeline for the R language.

Install

Since the repo is private for now git fork are the easiest way to contribute.

Documentation

TBA - see the man/ folder - Accessible through R studio

Examples

TBA - see the examples/ folder

Code format

The pipeline should be written following the google style guide for R : https://google.github.io/styleguide/Rguide.xml

  • A lot of refactoring as to be done to comply with the google style guide
  • camelCase for variables PascalCase for functions and objects
  • Comments must follow the Roxygen (https://github.com/klutometis/roxygen) syntax and the code must ortherwise be adequatly commented for anyone to understand how the data is processed and how results are generated.
  • Use long and overly clear names like : CenterOnMeanOfMean(), correlatedVariableNamesArray etc.

Testing

It is essential that every addition should be testable. https://cran.r-project.org/web/packages/RUnit/vignettes/RUnit.pdf

    # Testing a function that convert celsius to fahrenheit
	c2f <- function(c) return(9/5 * c + 32)

    # A corresponding test function could look like this:
	test.c2f <- function() {
	checkEquals(c2f(0), 32)
	checkEquals(c2f(10), 50)
	checkException(c2f("xx"))
	}

Current functions

Please see the docs () for a complete description of the functions.

Preprocessing

Filtering

Dimensionality reduction

Statistics

Time series

Visualization

Formatting

Export

Roadmap

  • Discuss methodology with other teams
  • Research the open source projects already available to determine which functions are redundant, and what function to import/modify for integration in the package
  • Refactor the code
  • Discuss and build robust data-analysis workflows around functions in the package as well as outside the package
  • Preprocessing : Decide on the best practice and how to implement it
  • Time series analysis tools
  • Build an object wrapper (S4 ? S6?) to help in data management / versioning
  • Build I/O functions
  • Export functions to XLS, CSV, JSON
  • (Future: Discuss possibilities of making the package self-suficient ? (no external dependency) - Optimize with compiled lib )
  • Add all the R-snippets into functions (https://github.com/jvcasill/R-snippets)

Readings

Open source tools:

Sublime text workflow for R development:

Markdown:

Other projects of interest

About

Robust workflow for data analysis in R

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages