Skip to content

Latest commit

 

History

History
19 lines (17 loc) · 1.17 KB

README.md

File metadata and controls

19 lines (17 loc) · 1.17 KB

Notes

  1. Script-sanitizer requires dplyr R package to run
  2. Script expects to find UCI HAR Dataset within current directory
  3. Script writes tidied data into UCI_HAR_tidied.txt
  4. It reads datasets slowly since read.table is utilized as file reader(fread of data.table causes SIGSEGV under linux)

How it works

  1. Read and prepare/enrich dataSet(readEnrichedDataSet function):
  • Read dataSet(X_test/X_train), with correct columns(variables) names(comes from features.txt).
  • Attach SUBJECT_ID column(subject_test.txt/subject_train.txt)
  • Attach ACTIVITY_ID column(y_test.txt/y_train.txt)
  • Attach ACTIVITY_NAME column with corresponding activity names(activity_labels.txt)
  1. Filter out unneeded columns, transform column names(extractMeasuresOfInterest function)
  • retain columns containing 'std()' or 'mean()'(but meanFreq dropped) in names.
  • std() becomes standard_deviation, mean() becomes mean_value
  1. Group dataSet by SUBJECT_ID, ACTIVITY_NAME, calculate mean values of remaining columns within groups.
  2. Write processed dataSet into UCI_HAR_tidied.txt (180 observations of 68 variables (SUBJECT_ID, ACTIVITY_NAME, 66 of sensors measurements))