Skip to content

Latest commit

 

History

History
100 lines (94 loc) · 8.01 KB

notebook.md

File metadata and controls

100 lines (94 loc) · 8.01 KB

Course Notebook: Big Data Applications and Analytics (F17)

Meetings

  • 09/14/17 Watched video of meeting on jabref, latex, sharelatex, github
  • 09/18/17 Attended online zoom meeting on sharelatex, jabref, github HID format, experiments
  • 09/25/17 Attended online zoom meeting on paper1, shareLatex, github
  • 10/02/17 Attended online zoom meeting on paper1, question about table, figure, JabRef references
  • 10/16/17 Attended online zoom meeting on paper2, and revisions for paper1 report.bib and report.tex
  • 11/06/17 Attended online zoom meeting on paper2 and course project
  • 11/20/17 Attended online zoom meeting on course project, formatting issues, questions about data
  • 11/27/17 Attended online zoom meeting on course project, paper2 question

Logistics

  • 08/25-09/01/17 Read Course Organization and Introduction, and Piazza Folders curation
  • 08/25-09/01/17 Watched Introduction video: i523-organization https://youtu.be/yC3PNkb_9mI
  • 08/25-09/01/17 Watched Getting started with i523-piazza video: https://youtu.be/9hnW-327CMQ
  • 08/25-09/01/17 Posted biography on Piazza, engaged in discussion, asked questions, made comments
  • 08/25-09/01/17 Created user accounts for:
  • 09/07/17 Initialized Github repository with README.md, license, .gitignore, and created course notebook
  • 09/08/17 Submitted github pull request to correct minor typos on Course webpage
  • 09/20/17 Update github HID README, paper1 LaTex templates, report.bib, report.tex
  • 09/25/17 Updated course notebook to include sections on Meetings and Location latitude, longitude
  • 10/19/17 Submitted Project proposal abstract, approved by instructor von Laszewski
  • 10/21/17 Submitted Paper 1 to TurnItIn for online review, 31 percent similarity rating
  • 11/06/17 Submitted Paper 2 to TurnItIn for online review, 20 percent similarity rating
  • 11/07-11/08/17 Project: Obtained dataset on National Drug Survey, created ipynb for project, variable selection
  • 11/09/17 Project: National Drug and Health Survey, data preprocessing, feature selection
  • 11/11/17 Project: NSDUH survey questions, variable selection, construction of data subset
  • 11/12/17 Project: Updated project notebook with NSDUH variable, dataset construction
  • 11/16/17 Project: Loaded NSDUH dataset as data frame, recoded features, prepare data for analysis
  • 11/20/17 Project: Formatted project folder in HID335 to include necessary files, folders, and formats
  • 11/27/17 Paper2: Resubmitted paper as report2.tex, and addressed issues in issues.tex
  • 12/04/17 Project: Workflow Pipelines readme.md created https://github.com/bigdata-i523/hid335/blob/master/project/readme.md
  • 12/04/17 Project: Report draft submitted to TurnItIn had 9 percent similarity index, uploaded to project folder

Theory

  • 08/25-09/01/17 Read Introduction and Watched Course Motivation Video Lectures by Geoffrey Fox:
  • 08/25-09/01/17 Lessons 1-6a: Emerging Technologies, Data Deluge, Jobs, Trends(A-C), Digital Disruption, Computing Models(A)
  • 09/14/17 Lesson 6B: Computer Model, Lessons: 7, 8, 9
  • 09/15/17 Lesson 16: Course Motivation Conclusions
  • (TO DO) Watch remaining video lessons

Practice

  • 09/07/17 Configured computer to use pyenv python 2.7.13 and pyenv python 3.6.2 on OSX
  • 09/08-09/09/17 Python: Review programming examples, posted to github
  • 09/14/17 Python: Create PANDAS iPython notebook for review: Data Structures, Indexing, Functions
  • 10/02/17 Python experiment: Creating plots in Python with Seaborn
  • 10/03/17 Python experiment: Using Pandas to Merge CSV, JSON, and SQL files
  • 10/25/17 Python experiment: Using SciKit-Learn for KNNeighbors Classification of Iris Dataset
  • 10/28/17 Python experiment: SKLearn Logistic Regression Linear Model for Classifier with Cancer dataset
  • 10/28/17 Python experiment: SKLearn Decision Trees and Random Forest Classification of Cancer dataset
  • 10/28/17 Python experiment: SKLearn Support Vector Machines and SVC Classification with Cancer dataset
  • 11/03/17 Python experiment: SKLearn Neural Networks, Multilayer Perceptron (MLP) Classifier on Cancer data
  • 11/15-11/16/17 Project: Python notebook for cleaning and preparing data from the NSDUH dataset
  • 11/16/17 Python experiment: Pandas Data Cleaning and Preparation, subsetting data frames, export to CSV
  • 11/19/17 Project: Subset NSDUH dataset in Pandas and created aggregated data subset for course project
  • 11/24/17 Project: Wrote get-data.py function to download, unzip, extract NSDUH data, write to csv file
  • 11/24/17 Project: Updated Pandas Data Cleaning and Preparation, to include get-data() function
  • 11/28/17 Project: Python notebooks for Logistic regression Classification of Heroin Use and Opioids Misuse
  • 11/30/17 Project ipynb: Analytics-Classifier Models of Heroin Use, Logistic Regression, Decision Trees, Random Forests
  • 12/01/17 Project ipynb: Analytics-Classifier Models of Prescription Opioid Pain Reliever (PRL) Misuse
  • 12/02/17 Project ipynb: Uploaded Project Data Exploratory Analysis and Data Visualization notebooks

Writing

  • 09/05/17 Topic for Paper 1: Big Data Analytics, Data Mining, Health Informatice: Social Media, Population, Epidemics
  • 09/07/17 Installed LaTex, aquaMacs, and JabRef; still learning
  • 09/12/17 Topic for Paper 2: Health Coverage, Wearables, Internet of Things
  • 09/20/17 Updated Paper 2 title: Big Health Data, Wearable Electronic Sensors, Addiction Treatment
  • 09/20-09/24/17 Paper 1: Article search, literature review, writing, updating report.bib, report.tex
  • 09/28-09/30/17 Paper 1: Writing body of the text, reviewing, updating report.bib, report.tex
  • 10/04/17 Paper 1: Update abstract, introduction, limitations subsection, conclusion, and report.bib
  • 10/06/17 Paper 1: Reformatted images for Figures, and added Table and Figures to report.tex
  • 10/08/17 Paper 1: Final revisions, integration of Table and Figures the text of the report.tex
  • 10/10/17 Review of HID312: An Overview of Big Data Applications in Mental Health Treatment
  • 10/14-10/15/17 Paper 2: Update report.tex, update references in bib.tex, review articles
  • 10/16/17 Paper 1: Updated report.tex and report.bib according to issues in review.pdf
  • 10/21-10/22/17 Paper 2: Review literature, revise report.tex and report.bib
  • 10/29/17 Paper 2: Read and review literature, revise report.tex, add Figures
  • 10/31/17 Paper 2: Writing literature review, revise report.tex, report.bib, add Figures 1-8, add Table 1
  • 11/03-11/05/17 Paper 2: Writing, review, revision of report.tex and report.bib for submission
  • 11/15-11/16/17 Project: NSDUH data codebook includes survey variables and frequency summary
  • 11/20/17 Project: Created report.tex and report.bib files in SharelaTex with accompanying files and folders
  • 11/21-11/25/17 Project: Writing Introduction sections, Method in report.tex, adding reference to report.bib
  • 11/26/17 Project Report: Added Figure 1 and created Tables 1 and 2 Summaries of Substance Use and Treatment
  • 11/29/17 Project Report: Added Figures 2 through 4, regression scatterplot, logistic classifier variable plots
  • 11/30/17 Project Report: Added Figures 4 through 9, decision trees, random forests, gradient boosting classifiers
  • 12/01/17 Project Report: Writing Introduction on Classifier Algorithms, Describing Results of Classifier Models
  • 12/03/17 Project Report: Finished Results section for Classification of Heroin Use, Rewriting Discussion section
  • 12/03/17 Project Report: Finished Results section for Classification of Prescription Opioid Pain Reliever Misuse
  • 12/03/17 Project Report: Submitted Project report.pdf paper to TurnItIn for review; similarity index 9 percent.
  • 12/04/17 TODO: Project Report: Rewrite Discussion and Conclusion, Proofread, upload final document version.

Location

  • Address: Galway Drive, Dallas, Texas, 75218
  • Latitude: 32.840054
  • Longitude: -96.697841