- About Project
- About the Data Set
- Resources
- Dependencies
This project has 2 different purposes. The first one is to pratice performing data analysis, especially on well-knwon data set so that it is easier to be proven myself. The scond purpose is to be familiar with python when performing data analysis including using plotting, math, data frame libraries. The end goal of this project is mainly focused on predicting a passenger's survivability.
The data set includes 891 entries (observations), and each entry has 11 different variables to describe a passenger.
- survival : Survival (0 = No, 1 = Yes)
- pclass : Ticket class (1 = 1st/Upper, 2 = 2nd/Middle, 3 = 3rd/Lower)
- sex : Sex
- Age : Age in years
- sibsp : # of siblings / spouses aboard the Titanic (siblings = brother/sister/stepbrother/stepsister, spouse = husband/wife)
- parch : # of parents / children aboard the Titanic (parent = mother/father, child = daughter/son/stepdaughter/stepson)
- some children travelled only with a nanny, therefore parch=0 for them
- ticket : Ticket number
- fare : Passenger fare
- cabin : Cabin number
- embarked : Port of Embarkation (C = Cherbourg, Q = Queenstown, S = Southampton)
- titanic-analysis.ipynb: IPython (Jupyter Notebook)
- titanic-data.csv: Titanic Data set formatted in csv style
- pandas: data frame library
- numpy: mathmatical library + somewhat data frame library
- matplotlib: plotting library