-
Notifications
You must be signed in to change notification settings - Fork 0
/
Hypothesis_testing.Rmd
139 lines (80 loc) · 5.42 KB
/
Hypothesis_testing.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
---
title: "Assignment 4. Hypothesis testing"
author: "Saikrishna Javvadi"
student ID: "20236648"
date: "`r Sys.Date()`"
output: word_document
---
## Context
An engineer wishes to compare the mean lifetimes of two types of transistors in an application involving high-temperature performance. A random sample of 30 transistors of type A and a random sample of 30 of type B were tested and the lifetime (minutes) of each transistor recorded.
Create relevant plots and analyse the data appropriately to decide if there evidence of a significant difference in the mean lifetime between the two transistor types.
```{r}
library(infer)
library(tidyverse)
library(tolerance)
```
```{r}
lifetimes_df <- read.csv("lifetimes.csv")
glimpse(lifetimes_df)
```
## Summary Statistics
1. Calculate the mean and standard deviation of lifetime for each type separately
```{r}
lifetimes_df %>% group_by(Type) %>% summarise(n=n(),
mean=mean(Lifetime),
sd=sd(Lifetime))
```
## Boxplot
2. Create side-by-side boxplots of lifetimes by type and **interpret these plots**
```{r}
ggplot(lifetimes_df, aes(x = Type, y = Lifetime, fill = Type)) +
geom_boxplot() + labs(y = "Lifetime")
```
Interpretation: From the Boxplot, It can be observed that for type A the sample is normally distributed and also there exists a potential outlier, whereas for type B the sample is possibly normally distributed but is right skewed as the tail is longer on the right.
It can also be observed that Boxplot for Type A has a median of approximately 1800 and for type B the median is 1600.
On comparing the plots together, it can be concluded that the median lifetime of Type A is better than Type B.
## t-test and 95% Confidence Interval for difference in the population mean
3. Carry out a two sample t-test to compare lifetime of transistors of types A and B
```{r}
lifetimes_df %>%
t_test(Lifetime ~ Type, order = c("A", "B"),
alternative = "two_sided",
mu=0,
conf_level = 0.95,
paired=FALSE,
var.equal=FALSE)
```
4. What are the null and alternative hypotheses for this test?
$H_0$ : The mean difference in Lifetimes between Type A and Type B is 0 \
$H_A$ : The mean difference in Lifetimes between Type A and Type B is not 0
5. Use the p-value to decide whether to reject the null hypothesis or not
Since the p-value is very small i.e <0.05 ; we have sufficient evidence to reject the null hypothesis.
6. Interpret the 95% confidence interval for the mean difference
With 95% confidence, it is likely that the mean difference between Type A and Type B transistors is in the range of 106.7139 and 295.2774. Since there is no zero in this interval, it can be concluded that Type A has larger mean lifetime than Type B transistors.
## 95% Bootstrap CI for difference in means
7. Use the bootstrap to obtain a 95% confidence interval for the population mean difference
8. Why is this interval not exactly the same as the confidence interval?
```{r}
lifetime_bootstrap <- lifetimes_df %>%
specify(response =Lifetime, explanatory = Type) %>%
generate(reps = 1000, type = "bootstrap") %>%
calculate(stat = "diff in means", order = c("A", "B"))
get_ci(lifetime_bootstrap)
lifetime_bootstrap %>% visualize(endpoints = get_ci(lifetime_bootstrap), direction = "between")
```
The confidence interval obtained above was between (106.7139 and 295.2774) ,while after the bootstrap the confidence interval becomes smaller and more precise (111.1225,284.9681). We can observe this difference because before bootstrap there were only 30 samples in each type but after bootstrap the values are more in number(1000) due to repeated sampling which gives a better estimate of the confidence interval.
## Tolerance Intervals
9. Obtain 95%/95% tolerance intervals for the lifetimes of transistors of type A and B separately and **interpret** these intervals carefully.
### Type A
```{r}
normtol.int(lifetimes_df$Lifetime[lifetimes_df$Type=="A"], alpha = 0.05, P = 0.95, side = 2)
```
With 95% confidence it can be estimated that 95% of Type A transistors have a lifetime between 1371.682 and 2183.085
### Type B
```{r}
normtol.int(lifetimes_df$Lifetime[lifetimes_df$Type=="B"], alpha = 0.05, P = 0.95, side = 2)
```
With 95% confidence it can be estimated that 95% of Type B transistors have a lifetime between 1057.674 and 2095.103
## Conclusion
10. Write an overall conclusion and decision as to which type of transistor is to be preferred
Since the null hypothesis is rejected , we can therefore conclude that the mean difference in lifetimes will not be zero , and hence it follows the alternate hypothesis which states that they are unequal. After observing the Confidence Interval values,we can say that the mean difference between A and B is a positive value greater than 0(order A and then B) which means the mean lifetime of Type A, with 95% confidence, is greater than the mean lifetime of Type B. The tolerance interval provides us with a clearer picture of the two types of transistors, i.e. Since 95% of the transistors in Type A lies in a range higher than 95% of the transistors in Type B. Based on these evidences, we can come a conclusion that with respect to lifetimes , Type A transistors can be preferred over Type B transistors.