forked from ujjwalkarn/DataScienceR
-
Notifications
You must be signed in to change notification settings - Fork 0
/
common_functions.Rmd
90 lines (73 loc) · 2.38 KB
/
common_functions.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
---
Introduction to several commonly used functions in R
--
## with()
The with( ) function applys an expression to a dataset. It is similar to
DATA= in SAS.
```{r}
# with(data, expression)
# example applying a t-test to a data frame mydata
with(mydata, t.test(y ~ group))
```
Please look at other examples [here](http://stackoverflow.com/questions/21827572/what-is-the-difference-between-with-and-within-in-r) and [here](http://www.magesblog.com/2012/01/say-it-in-r-with-by-apply-and-friends.html).
## by()
The by( ) function applys a function to each level of a factor or factors.
It is similar to BY processing in SAS.
```{r}
# by(data, factorlist, function)
# example obtain variable means separately for
# each level of byvar in data frame mydata
by(mydata, mydatat$byvar, function(x) mean(x))
```
Please look [here](http://www.magesblog.com/2012/01/say-it-in-r-with-by-apply-and-friends.html) for more details.
## do.call()
do.call calls a function with a list of arguments, lapply applies a function to each element of the list
```{r}
do.call(sum, list(c(1,2,4,1,2), na.rm = TRUE))
#10
lapply(c(1,2,4,1,2), function(x) x + 1)
#2
#3
#5
#2
#3
do.call("+",list(4,5))
#9
```
More examples [here](https://stat.ethz.ch/R-manual/R-devel/library/base/html/do.call.html).
## more()
more() is a user-defined function that is helpful in printing out a large object. Taken from [here](http://stackoverflow.com/questions/21610468/pause-print-of-large-object-when-it-fills-the-screen).
```{r}
# to print out an object such as data.frame mydf 20 lines at a time, use:
more(mydf)
# where more() is defined as
more <- function(expr, lines=20) {
out <- capture.output(expr)
n <- length(out)
i <- 1
while( i < n ) {
j <- 0
while( j < lines && i <= n ) {
cat(out[i],"\n")
j <- j + 1
i <- i + 1
}
if(i<n){
rl <- readline()
if( grepl('^ *q', rl, ignore.case=TRUE) ) i <- n
if( grepl('^ *t', rl, ignore.case=TRUE) ) i <- n - lines + 1
if( grepl('^ *[0-9]', rl) ) i <- as.numeric(rl)/10*n + 1
}
}
invisible(out)
}
```
## options()
options() can be used to increase the limit for max.print in R. More info [here](http://stackoverflow.com/questions/6758727/how-to-increase-the-limit-for-max-print-in-r).
```r
options(max.print=1000000)
```
## To check which columns in the data frame `df` have missing values
```r
colnames(df)[colSums(is.na(df)) > 0]
```