Skip to content

Latest commit

 

History

History
35 lines (18 loc) · 1.07 KB

File metadata and controls

35 lines (18 loc) · 1.07 KB

MachineLearning_AppliedStatistics

Imported the necessary libraries

Read the data as a data frame

Performed basic EDA which included the following and printed out the insights at every step.

a. Shape of the data

b. Data type of each attribute

c. Checking the presence of missing values

d. 5 point summary of numerical attributes

e. Distribution of ‘bmi’, ‘age’ and ‘charges’ columns.

f. Measure of skewness of ‘bmi’, ‘age’ and ‘charges’ columns

g. Checking the presence of outliers in ‘bmi’, ‘age’ and ‘charges columns

h. Distribution of categorical columns (include children)

i. Pair plot that includes all the columns of the data frame

The notebook also analyzed the below questions with the statistical evidence

a. Do charges of people who smoke differ significantly from the people who don't?

b. Does bmi of males differ significantly from that of females?

c. Is the proportion of smokers significantly different in different genders?

d. Is the distribution of bmi across women with no children, one child and two children,the same ?