-
Notifications
You must be signed in to change notification settings - Fork 0
/
chap14_supp_code.Rmd
553 lines (456 loc) · 25.2 KB
/
chap14_supp_code.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
---
title: "Supplementary Materials to Chapter 14: Persistence of Passive Immunity, Natural Immunity (and Vaccination)"
author: "Amy K. Winter & C. Jessica E. Metcalf"
output:
html_document: default
pdf_document: default
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Summary
In this supplement to Chapter 14 titled 'Persistence of Passive Immunity, Natural Immunity (and Vaccination),' we illustrate some examples of fitting models to serological data including i) fitting waning of maternal immunity (with simulated data, generated here); ii) fitting the age profile of seropositivity, across a range of age class designs (with simulated data, available as 'simulated.seroprevalence.profile.csv'); iii) extracting the force of infection (again with 'simulated.seroprevalence.profile.csv'); and iv) exploring a case study for rubella (using existing serology data extracted from the literature, available as 'serology.data.csv' combined with data from United Nations Population Division, 2015 on fertility over age 'asfr.data.csv' and population structure 'female.pop.data.csv').
## Methods for analysis of IgG titers
Following Waaijenborg et al. (2013), we can define a function that captures decay of maternal antibody titers with time (ignoring for simplicity fluctuations in status associated with vaccinated individuals):
```{r maternalImmunity1}
waning <- function(age, m, d, b){
log(m*exp(-d*age)+b)
}
```
We can generate data according to this function assuming that log titers are normally distributed (here parameters reflect those estimated for measles in Waaijenborg et al. (2013)):
```{r maternalImmunity2}
ages <- seq(1,60,length=100)/12
dist.titers <- rnorm(length(ages),waning(ages,m=1.61,d=7.77,b=0.011),sd=1.11)
```
and then write down a likelihood function for this data (returning the negative log likelihood, as minimization is more tractable than maximization):
```{r maternalImmunity3}
like.waning <- function(par,age,titer){
par<- exp(par) #simple way of ensuring all parameters are positive
pred <- waning(age=age, m=par[1], d=par[2], b=par[3])
u <- dnorm(titer, pred,par[4],log=TRUE)
return(-sum(u))
}
```
We can then try and recover these parameters by applying optimization techniques to this data with the function 'optim':
```{r maternalImmunity4}
tmp <- optim(par=log(c(1,1,0.1,1)),like.waning,age=ages,titer=dist.titers,
method="Nelder-Mead")
#check fit has converged - should be 0
print(tmp$convergence)
#compare to the values set in the simulation:
print(cbind(estimated=round(exp(tmp$par),2),true=c(1.61,7.77,0.011,1.11)))
```
We can plot out these results, showing the data (filled points) and the 'true' relationship (grey curve); and the fitted relationship (red curve). This indicates that even if parameters recovered are not precisely those used in the simulation (see above), the curves are similar (and further optimization might also move the parameters closer to the 'true' values):
```{r maternalImmunity5}
plot(ages, dist.titers, xlab="Ages (years)", ylab="Titers", pch=19)
points(ages,waning(ages,m=1.61,d=7.77,b=0.011), type="l",col="grey")
points(ages,waning(ages,m=exp(tmp$par[1]),d=exp(tmp$par[2]),b=exp(tmp$par[3])),
type="l",col="red")
```
## Methods for analysis of seropositivity
We move from analysis of antibody titers (requiring techniques associated with analysis of continuous variables) to considering analysis of seropositivity, proportion of individuals with titers above a defined threshold (generally established as being reflective of protection from infection), usually within an age class. This requires techniques for analysis of binary variables.
The design of serological surveys oftens reflects a choice of age bins (generally dictated by logistical constraints), and the implications of this choice are not always fully considered. In particular, grouping younger ages when the shape of the seropositivity curve is the steepest can result in underestimates of seropositivity across age. We illustrate this using simulated data, based around a model introduced in Metcalf et al. 2012, assuming age contact based around POLYMOD estimates from Mossong et al. 2008, with $R_0$=6, and a crude birth rate set to 30 per 1000 people. For each age (corresponding to individuals at the exact age in years), the simulation provides the number out of 100 individuals that are seropositive, and the number that are seronegative. We use this to estimate the mean proportion that would have been seropositive if data from a serological survey had been available in 5 year and 10 year age bins (corresponding to Figure 2A in Chapter 15).
```{r Estimating Age-Specific seropositivity with Simulated Data}
#Figure 2A Code
sim.data <- read.csv("./data/simulated.seroprevalence.profile.csv")
prop.seropos <- sim.data$Pos/sim.data$Tot
index <- findInterval(1:49,seq(5,45,5))+1
prop.seropos.5year <- rep(NA,10)
for (a in 1:10){
prop.seropos.5year[a] <- mean(prop.seropos[index==a])
}
index <- findInterval(1:49,seq(10,50,10))+1
prop.seropos.10year <- rep(NA,5)
for (a in 1:5){
prop.seropos.10year[a] <- mean(prop.seropos[index==a])
}
mid.age.choices <- list(seq(1,49,1), c(2.5,seq(7,47,5)), c(5,seq(14.5,44.5,10)))
```
To explore the implications of grouping age classes, we plot the proportion seropositive by age, and illustrate the overall pattern by fitting a line using a smoothed spline. As age bins widen across the ages where the seroprevalence curve is concave (ages 5-15 years), our underestimation of the proportion seropositive increases.
```{r plot Figure 2A, fig.width=6, fig.height=6}
par(mar=c(5,4,4,5)+.1, bty="l",pty="s")
plot(mid.age.choices[[1]], prop.seropos, type="p", pch=16, col="black", xlim=c(0,15),
xlab="Age (yrs)", ylab="Proportion seropositive", cex.lab=1.5, cex.axis=1.5)
lines(1:15, predict(smooth.spline(mid.age.choices[[1]][1:15],
prop.seropos[1:15]), 1:15)$y, col="black", lty=1)
points(mid.age.choices[[2]], prop.seropos.5year[1:10], col="dark grey", pch=17, cex=1.2)
lines(2.5:15, predict(smooth.spline(mid.age.choices[[2]][1:7],
prop.seropos.5year[1:7]), 2.5:15)$y, col="dark grey",
lty=2)
points(mid.age.choices[[3]], prop.seropos.10year[1:5], col="black", pch=18, cex=1.2)
lines(5:15, predict(smooth.spline(mid.age.choices[[3]][1:4],
prop.seropos.10year[1:4]), 5:15)$y, col="black", lty=3)
legend("topleft", c("annual age bins","five year age bins", "ten year age bins"),
lty=c(1,2,3), cex=1.1, col=c("black","dark grey", "black"), pch=c(16,17,18))
```
The power of seropositivity data is the insights it can yield into the underlying dynamics. Returning to the highest resolution data (1 year age bins), we can fit models to the age profiles of seropositivity, and use these to infer the pattern of the force of infection over age. Here, we illustrate different parametric methods: fractional polynomials, piecewise constant, semi-parametric penalized regression splines, and parametric local polynomials. This code is based on Hens et al. (2012), framed below in the form of functions that can be applied to data in the form provided.
```{r Estimating Age-Specific Seroprevalence and ASFOI with Simulated Data}
#Figure 2B Code
### Function to Extract ASFOIs for a serosurvey using fractional polynomials
###
##' @param - pos - number of positives for each age
##' @param - Tot - number of samples for each age
##' @param - Age - age in years
###
### outputs estimated ASFOIs and Fitted Seroprevalence
ExtractASFOIEstimates.FP <- function(pos, Tot, Age){
#determing to use first or second degree
best.1 <- search.fracpoly.one(y=pos,tot=Tot,x=Age)
best.2 <- search.fracpoly.two(y=pos,tot=Tot,x=Age)
firstdegree <- (best.1$deviance - best.2$deviance < 4.605)
print(c(firstdegree, "Use First Degree:"))
#chi square with 2 degrees of freedom 90% CI,
if (firstdegree){
if (best.1$power!=0){
fit.best <- glm(cbind(pos,Tot-pos) ~ I(Age^best.1$power),
family=binomial(link="logit")); print("power!=0")
} else {
fit.best <- glm(cbind(pos,Tot-pos) ~ I(log(Age)),
family=binomial(link="logit")); print("power=0")
}
} else {
if (best.2$power[1]==best.2$power[2]){
fit.best <- glm(cbind(pos,Tot-pos) ~ I(Age^(best.2$power[1])) +
I(Age^(best.2$power[2])*log(Age)), family=binomial(link="logit"))
} else {
if (best.2$power[1]==0){
fit.best <- glm(cbind(pos,Tot-pos) ~ I(log(Age)) +
I(Age^(best.2$power[2])), family=binomial(link="logit"))
} else if (best.2$power[2]==0){
fit.best <- glm(cbind(pos,Tot-pos) ~ I(Age^(best.2$power[1])) +
I(log(Age)), family=binomial(link="logit"))
} else {
fit.best <- glm(cbind(pos,Tot-pos) ~ I(Age^(best.2$power[1])) +
I(Age^(best.2$power[2])), family=binomial(link="logit"))
}
}
}
age.grid=seq(1,max(Age),by=1)
p=predict(fit.best, newdata=data.frame(Age=age.grid),type="response", se.fit=T)
fitted <- p$fit
asfoi <-foi.num(x=age.grid,p=fitted)
return=list(asfoi=asfoi$foi,
ages=asfoi$grid,
fitted=fitted)
}
### Function to search for the best 1st degree polynomial
###
##' @param - y - the number of seropos per age, x
##' @param - tot - the number of sample per age, x
##' @param - x - age
##' @param - mc - monotonicity constraint, assumes x is ordered
###
### outputs the "best" power (lowest deviance), its associated deviance
### and the number of non-converged models for the first degree polynomial
search.fracpoly.one<-function(y,tot,x,mc=TRUE){
pow1<-seq(-2,3,0.01) #searching through these powers
deviance<-deviance(glm(cbind(y,tot-y)~x, family="binomial"(link=logit)))
#we compare all models to the baseline deviance of the null model
power<-1 #only looking at 1st degree polynomials
mistake<-NULL
for (i in 1: (length(pow1))){
if(pow1[i]==0){term1<-log(x)} else{term1<-(x)^(pow1[i])}
glm.try<- suppressWarnings(glm(cbind(y,tot-y)~term1, family="binomial"(link=logit)))
if(glm.try$converged==FALSE){mistake<-rbind(mistake, c(1,pow1[i]))}
else{#replace deviance if this one is better -> ordered approach to finding best model
if(deviance(glm.try)<deviance){
if (((mc)&&(sum(diff(predict(glm.try))<0)==0))|(!mc)){
deviance<-deviance(glm.try)
power<-pow1[i]
}
}
}
}
return(list(power=power, deviance=deviance, mistake=mistake))
}
### Function to search for the best 2nd degree polynomial
###
##' @param - y - the number of seropos per age, x
##' @param - tot - the number of sample per age, x
##' @param - x - age
##' @param - mc - monotonicity constraint, assumes x is ordered
###
### outputs the "best" powers (lowest deviance), its associated deviance
### and the number of non-converged models for the second degree polynomial
search.fracpoly.two<-function(y,tot,x,mc=TRUE){
pow<-seq(-2,3,0.1)
deviance<-deviance(glm(cbind(y,tot-y)~x+I(x^2), family="binomial"(link=logit)))
mistake<-NULL
for (i in 1: (length(pow))){
for (j in i: (length(pow))){
if(pow[i]==0){term1<-log(x)} else{term1<-(x)^(pow[i])}
if(pow[j]==pow[i]){term2<-term1*log(x)}
else if(pow[j]==0){term2<-log(x)}
else{term2<-(x)^(pow[j])}
glm.try<-glm(cbind(y,tot-y)~term1+term2, family="binomial"(link=logit))
if(glm.try$converged==FALSE){mistake<-rbind(mistake, c(1,pow[i],pow[j]))}
else{
if(deviance(glm.try)<deviance){
if (((mc)&&(sum(diff(predict(glm.try))<0)==0))|(!mc)){
deviance<-deviance(glm.try)
power<-c(pow[i],pow[j])
}
}
}
}
}
return(list(power=power, deviance=deviance, mistake=mistake))
}
### Function to Extract ASFOIs for all serosurveys using Semi-parametric
### penalized smoothing splines
###
##' @param - pos - number of positives for each age
##' @param - Tot - number of samples for each age
##' @param - Age - age in years
###
### outputs estimated ASFOIs and Fitted Seroprevalence
ExtractASFOIEstimates.GAM <- function(pos, Tot, Age){
#Smooth then constrain approach
#ungroup the data
df <- rbind(data.frame(a=rep(Age, c(pos)), y=1),
data.frame(a=rep(Age, c(Tot-pos)), y=0))
y<-df$y[order(df$a)]
a<-df$a[order(df$a)]
fit.best <-gam(y~s(a))
#gam.check(fit.best)
age.grid=seq(1,max(Age),by=1)
p=predict(fit.best,newdata=data.frame(a=age.grid),type="response", se.fit=T)
fitted <- p$fit[1:49]
asfoi.orig <-foi.num(x=age.grid,p=fitted)
#issue with monotonicity dealt with up front using the 'Pool Adjacent Violator Algorithm"
#data is pooled if not always increasing, otherwise data remains the same
Monotonicity.pos <- pavit(pos=round(fitted*Tot),tot=Tot)$pai2
asfoi <-foi.num(x=age.grid,p=Monotonicity.pos)
return=list(asfoi=asfoi$foi,
ages=asfoi$grid,
fitted=fitted,
Monotonicity.pos=Monotonicity.pos,
asfoi.orig=asfoi.orig$foi)
}
### Function pavit - Extracted directly from Hens et al. 2012 textbook
### The pool adjacent violator algorithm in R
### data represents the ordered fitted values
pavit<- function(pos=pos,tot=rep(1,length(pos)))
{
gi<- pos/tot
pai1 <- pai2 <- gi
N <- length(pai1)
ni<-tot
for(i in 1:(N - 1)) {
if(pai2[i] > pai2[i + 1]) {
pool <- (ni[i]*pai1[i] + ni[i+1]*pai1[i + 1])/(ni[i]+ni[i+1])
pai2[i:(i + 1)] <- pool
k <- i + 1
for(j in (k - 1):1) {
if(pai2[j] > pai2[k]) {
pool.2 <- sum(ni[j:k]*pai1[j:k])/(sum(ni[j:k]))
pai2[j:k] <- pool.2
}
}
}
}
return(list(pai1=pai1,pai2=pai2))
}
### Function to get ASFOI from fitted values of seroprevalence by age
### Taken directly from Hens et al. 2012 textbook
###
##' @param - x - age
##' @param - p - fitted seroprevalence associated with age, x in order
###
### outputs ASFOI
foi.num<-function(x,p)
{
grid<-sort(unique(x))
pgrid<-(p[order(x)])[duplicated(sort(x))==F]
dp<-diff(pgrid)/diff(grid) #slope
numerator <-approx((grid[-1]+grid[-length(grid)])/2,dp,grid[c(-1,-length(grid))])$y
denominator <- (1-pgrid[c(-1,-length(grid))])
foi <- numerator/denominator
return(list(grid=grid[c(-1,-length(grid))],foi=foi,
derivative=numerator,
prop.seroneg=denominator))
}
### Function to Extract ASFOIs for all serosurveys
###
##' @param - pos - number of positives for each age
##' @param - Tot - number of samples for each age
##' @param - Age - age in years
##' @param - age.break - age breaks for the piecewise function
###
### outputs estimated ASFOIs and Fitted Seroprevalence
ExtractASFOIEstimates.piecewise.agebreak <- function(pos, Tot,
Age, age.break=c(0,seq(5,50,5))){
df <- data.frame(cbind(age=Age,Tot=Tot, Neg=Tot-pos, Pos=pos))
#number of time optimizing starting at random values,
## and then starting from mean dist from random values
noptims <- 20
asfoi.bytest <- array(NA, dim=c((length(age.break)-1), noptims))
for (test in 1:noptims){
set.seed(test)
pars <- abs(rnorm((length(age.break)-1), 0.11, 0.05))
out <- optim(log(pars), fn=ll.data.as.foi, data=df, age.break=age.break)
asfoi.bytest[,test] <- exp(out$par)
}
asfoi.mean <- apply(asfoi.bytest, 1, FUN=mean, na.rm=T)
out <- optim(log(asfoi.mean), fn=ll.data.as.foi, data=df, age.break=age.break)
asfoi.piecewise <- exp(out$par)
#turn piecewise ASFOI into ASFOI over all ages
ages.asfoi <- 1:49
index <- findInterval(ages.asfoi-1, age.break)
asfoi <- asfoi.piecewise[index]
#get predicted seroprevalence predicted values
fitted <- 1-(exp(-cumsum(asfoi)))
return=list(asfoi=asfoi,
ages=ages.asfoi,
fitted=fitted)
}
### -LogLikelihood by which to estimate the age specific force of infection (ASFOI)
### using piecewise constant hazard without numerical integration
### must start after maternal immunity - assume everyone is susceptible by age 1yo
###
##' @param - log.lambdas - starting parameter for age specific FOI
##' @param - data - dataframe with age, positives, and number samples per age
##' @param - age.break - upper ages used in piecewise fitting
###
### returns negative log likelihood
ll.data.as.foi <- function(log.lambdas, data, age.break, mort=c()){
par <- log.lambdas
dur <- c(diff(age.break))
cate <- age.break
cate <- cate[-length(cate)]
ll <- 0
for(a in 1:length(data$age)){
dummy2 <- data$age[a]>cate & !c(data$age[a]>cate[-1], FALSE)
dummy1 <- c(data$age[a]>cate, FALSE)[-1]
inte <- sum(dur*exp(par)*dummy1)+(exp(par[dummy2])*(data$age[a]-cate[dummy2]))
p <- pmax(pmin(1-exp(-inte),1-1e-10),1e-10)
# likelihood
ll <- ll+dbinom(data$Pos[a],data$Tot[a],p,log=TRUE)
}
return(-ll)
}
```
Using the functions developed above, we can estimate the pattern of age-seropositivity, and from this the force of infection over age. We organize the data and proceed accordingly:
```{r Running Figure 2B}
#Simulated data
npos <- sim.data$Pos
Nsamp <- sim.data$Tot
nneg <- sim.data$Neg
ages <- sim.data$age
###local polynomial fit - non-parametric
require(locfit)
lpfit <- locfit(npos/Nsamp~lp(ages,nn=0.7,deg=2),family="binomial", subset=ages>=1)
lpfitd1 <- locfit(npos/Nsamp~lp(ages,nn=0.7,deg=2),deriv=1,family="binomial",
alpha=0.7,deg=2, subset=ages>=1)
lpfoi <- (predict(lpfitd1,newdata=data.frame(ages=1:49)))*
(predict(lpfit,newdata=data.frame(ages=1:49)))
lp.fitted <- (predict(lpfit,newdata=data.frame(ages=1:49)))
###fractional polynomials - parametric
fp <- ExtractASFOIEstimates.FP(pos=npos, Tot=Nsamp, Age=ages)
###penalized regression splines - semi-parametric
library(mgcv)
gam <- ExtractASFOIEstimates.GAM(pos=npos, Tot=Nsamp, Age=ages)
###piecewise constant - flexible parametric
#the following line of code can take up to 3 minutes to optimize
pw <- ExtractASFOIEstimates.piecewise.agebreak(pos=npos, Tot=Nsamp,
Age=ages, age.break=c(0,seq(5,50,5)))
```
To compare the inference from the different approaches, we plot them all on the same figure:
```{r Plotting Figure 2B, fig.width=6, fig.height=6}
par(mar=c(5,4,4,5)+.1, bty="l",pty="s")
plot(ages, npos/Nsamp, pch=19, xlab="Age (yrs)", ylab="Proportion seropositive",
type="n", xlim=c(0,30), ylim=c(0,1), cex.lab=1.5, cex.axis=1.5)
points(ages, npos/Nsamp, type="p", cex=1.7, col="grey", pch=19)
axis(4, c(0,0.1,0.2,0.3), c(0,0.1,0.2,0.3), cex.axis=1)
mtext("Force of Infection", side=4, line=3, at=0.15, cex=1.25)
lines(1:49,lp.fitted, lwd=2, lty=1)
lines(1:49, pmax(lpfoi,0), lwd=2, col=2)
lines(1:49, fp$fitted, lwd=2, lty=2)
lines(fp$ages, fp$asfoi, lwd=2, lty=2, col=2)
lines(1:49, gam$Monotonicity.pos, lwd=2, lty=3)
lines(gam$ages, gam$asfoi, lwd=2, lty=3, col=2)
lines(1:49, pw$fitted, lwd=2, lty=4)
lines(1:49, pw$asfoi, lwd=2, lty=4, col=2)
legend("right", c("fractional polynomials","penalized regression splines",
"local polynomials", "piecewise constant"),
lty=c(2,3,1,4), cex=1)
```
## Rubella case study
We illustrate the use of serological surveys to address public health issues around rubella, with three different examples from the literature (Ouattara et al. 1986; Modarres et al. 1996 and Nessa et al. 2008); each paired with data from the United Nations Population Division (2015) yielding the age specific fertility rate, and the age distribution of the population for that geographic region at that time. Both of these are necessary to estimate the burden of Congenital Rubella Syndrome. We start by bringing in this data:
```{r Rubella Case Study}
data <- read.csv("./data/serology.data.csv")
#WCB = women of childbearing age
WCB.pop <- read.csv("./data/female.pop.data.csv", header=TRUE,
row.names=1)
#ASFR = age specific fertility rate per 1 person
ASFR <- read.csv("./data/asfr.data.csv", header=TRUE, row.names=1)
country <- colnames(WCB.pop)
```
Using the functions defined above, we can loop through each of these countries, fit the age profile of seropositivity using local polynomial estimators based on the function defined above (chosen as an example - other functional forms are possible (Hens et al. 2012)), and from this, infer the force of infection over age. We can plot each of these and additionally estimate the burden of CRS using methods defined in the Chapter 15.
```{r Plotting Figure 3, fig.width=4, fig.height=4}
library("locfit")
for (j in 1:3) {
study <- data$study
Nsamp <- data[data$study==j,"N"]
N <- sum(Nsamp)
npos <- round(Nsamp*data[data$study==j,"seropositive.immune"])
nneg <- Nsamp-npos
mid.ages <- apply(data[study==j,7:8], 1, mean) #mid-age of age range
max.mid.ages <- mid.ages[length(mid.ages)]
#plot out data
plot(1, 1, pch=19, xlab="Age (yrs)", ylab="Proportion seropositive",
type="n", xlim=c(0,55), ylim=c(0,1))
points(mid.ages, data[study==j,5], type="p", cex=log(data[study==j,4]/10), col="grey",
pch=19)
jj <- which(study==j)[1]
title(paste(as.character(data[jj,6]), ", ",as.character(data[jj,3]), sep=""))
mtext(paste("(N=",N,")",sep=""), 3, cex=0.7)
axis(4, c(0,0.05,0.1,0.15,0.2, 0.25), c(0,0.05,0.1,0.15,0.2, 0.25))
mtext("Force of Infection", side=4, line=3, at=0.1, cex=0.7)
## estimate age-specific force of infection (FOI) and CRS incidence for each country
#local polynomial fit
lpfit <- locfit(npos/Nsamp~lp(mid.ages,nn=0.7,deg=2),
family="binomial", subset=mid.ages>=1)
lpfitd1 <- locfit(npos/Nsamp~lp(mid.ages,nn=0.7,deg=2),deriv=1,
family="binomial",alpha=0.7,deg=2, subset=mid.ages>=1)
lpfoi <- (predict(lpfitd1,newdata=data.frame(mid.ages=1:max.mid.ages)))*
(predict(lpfit,newdata=data.frame(mid.ages=1:max.mid.ages)))
## add to plot
points(1:max.mid.ages,(predict(lpfit,newdata=data.frame(mid.ages=1:max.mid.ages))),
type="l",lwd=1, lty=1)
points(1:max.mid.ages,pmax(lpfoi,0), type="l", col="darkgrey", lty=2)
## estimate CRS incidence rate
prop.seropos.WCB <- predict(lpfit,newdata=data.frame(mid.ages=15:49))
foi.WCB <- c(lpfoi[15:length(lpfoi)], rep(lpfoi[length(lpfoi)],
max(0,(49-length(lpfoi)))))[1:35]
#CRS incidence by WCB age
I.crs.byage <- (1-prop.seropos.WCB)*(1-exp(-foi.WCB*16/52))*0.65*100000
#births per maternal age group (15-19, 20-24, 25-29, 30-34, 35-39, 40-44, 45-49)
births.WCB.byage <- rep(WCB.pop[,j]*ASFR[,j],each=5)/5
#CRS incidence
I.crs <- sum(I.crs.byage*births.WCB.byage)/sum(births.WCB.byage)
print(paste(country[j], ", CRS incidence: ", round(I.crs, 2), " per 100,000", sep=""))
## estimate average age of infection, excluding ages >30
age.test.1 <- seq(1,30,length=5000)
freq <- 1-predict(lpfit,newdata=data.frame(mid.ages=age.test.1))
freq <- freq/sum(freq) #rescale frequency
avg.age <- sum(freq*(age.test.1+diff(age.test.1)[1]))
#get R0 using simple (G/A)
## Crude birth rate per 1000 from the World Bank for each country and year:
## Ivory Coast 1987, Iran 1996, Bangladesh 2008
birth.rate <- c(44.483,22.549,22.262)
R0 <- (1000/birth.rate[j])/avg.age
print(paste("R0", round(R0), "; Average age", round(avg.age,1),sep=" "))
}
```
## Conclusion
In this supplement to Chapter 14, we provide some basic methods for analysis of serological data, focusing on directly transmitted immunizing infections such as measles and rubella. For extensive details on these methods, and a number of important extensions, see Hens et al. 2012.
## References
Hens, N., Z. Shkedy, M. Aerts, C. Faes, P.V. Damme, and P. Beutels, Modeling infectious disease parameters based on serological and social contact data : a modern statistical perspective. . 2012, New York: Springer.
Metcalf CJE, Lessler J, Klepac P, Morice A, Grenfell BT, Bjornstad ON. Structured models of infectious disease: Inference with discrete data. Theor Popul Biol 2012. 82: p. 275-282.
Modarres, S., and N.N. Oskoii, Immunity of children and adult females to rubella virus infection in Tehran. Iran. j. méd. sci, 1996. p. 69-73.
Mossong, J., N. Hens, M. Jit, P. Beutels, K. Aranen, R. Mikolajczyk, M. Massari, S. Salmaso, G. Scalia Tomba, J. Wallinga, J. Heijne, M. Sadkowska-Todys, M. Rosinska, and W.J. Edmunds, Social Contacts and Mixing Patterns Relevant to the Spread of Infectious Diseases. PloS Medicine, 2008. 5: p. e74.
Nessa, A., M. Islam, S. Tabassum, S. Munshi, M. Ahmed, and R. Karim, Seroprevalence of rubella among urban and rural Bangladeshi women emphasises the need for rubella vaccination of pre-pubertal girls. Indian journal of medical microbiology, 2008. 26(1): p. 94.
Ouattara, S., J. Brettes, R. Kodjo, K. Penali, G. Gershy-Damet, A. Sangare, Y. Aron, and V. Akran, Seroepidemiology of rubella in the Ivory coast. Geographic distribution. Bulletin de la Societe de pathologie exotique et de ses filiales, 1986. 80(4): p. 655-664.
Population Division, Department of Economic, Social Affairs and United Nations. (2015). wpp2015: World Population Prospects 2015. R package version 1.0-1. http://CRAN.R-project.org/package=wpp2015
Waaijenborg, S., S.J. Hahné, L. Mollema, G.P. Smits, G.A. Berbers, F.R. van der Klis, H.E. de Melker, and J. Wallinga, Waning of maternal antibodies against measles, mumps, rubella, and varicella in communities with contrasting vaccination coverage. Journal of Infectious Diseases, 2013. 208(1): p. 10-16