Jenior_Metatransciptomics_mSphere_2018.Rmd

---
csl: msphere.csl
output:
 word_document:
  keep_md: true
  reference_docx: manuscript_format.docx
bibliography: references.bib
---

#### Title:
*Clostridium difficile* alters the structure and metabolism of distinct cecal microbiomes during initial infection to promote sustained colonization

##### Authors:
Matthew L. Jenior^1^, Jhansi L. Leslie^1^, Vincent B. Young^1,2^, and Patrick D. Schloss^1\*^


1. Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan
2. Department of Internal Medicine/Infectious Diseases Division, University of Michigan Medical Center, Ann Arbor, Michigan

<br>
<br>

\* Correspondence to Patrick D. Schloss at pschloss@umich.edu

### Abstract

Susceptibility to *Clostridium difficile* infection is primarily associated with previous exposure to antibiotics, which compromise the structure and function of the gut bacterial community. Specific antibiotic classes correlate more strongly with recurrent or persistent *C. difficile* infection. As such, we utilized a mouse model of infection to explore the effect of distinct antibiotic classes on the impact that infection has on community-level transcription and metabolic signatures shortly following pathogen colonization and how those changes may associate with persistence of *C. difficile*. Untargeted metabolomic analysis revealed that *C. difficile* infection had significantly larger impacts on the metabolic environment across cefoperazone and streptomycin-pretreated mice, which become persistently colonized compared to clindamycin-pretreated mice where infection quickly became undetectable. Through metagenome-enabled metatranscriptomics we observed that transcripts for genes associated with carbon and energy acquisition were greatly reduced in infected animals, suggesting those niches were instead occupied by *C. difficile*. Furthermore, the largest changes in transcription were seen in the least abundant species indicating that *C. difficile* may "attack the loser" in gut environments where sustained infection occurs more readily. Overall, our results suggest that *C. difficile* is able to restructure the nutrient-niche landscape in the gut to promote persistent infection.

#### Importance

*Clostridium difficile* has become the most common single cause of hospital-acquired infection over the last decade in the United States. Colonization resistance to the nosocomial pathogen is primarily provided by the gut microbiota, which is also involved in clearing the infection as the community recovers from perturbation. As distinct antibiotics are associated with different risk levels for CDI, we utilized a mouse model of infection with 3 separate antibiotic pretreatment regimes to generate alternative gut microbiomes that each allowed for *C. difficile* colonization but varied in clearance rate. To assess community-level dynamics, we implemented an integrative multi-omic approach that revealed that infection significantly changed many aspects of the gut community. The degree to which the community changed was inversely correlated with clearance during the first six days of infection, suggesting that *C. difficile* differentially modifies the gut environment to promote persistence. This is the first time metagenome-enabled metatranscriptomics have been employed to study the behavior of a host-associated microbiota in response to an infection. Our results allow for a previously unseen understanding of the ecology associated with *C. difficile* infection and provides groundwork for identification of context-specific probiotic therapies.


### Introduction

One of the many beneficial functions provided by the indigenous gut bacterial community is its ability to protect the host from infection by pathogens [@Vollaard1994]. This attribute, termed colonization resistance, is one of the main mechanisms that protect healthy individuals from the gastrointestinal pathogen *Clostridium difficile* [@Freter1955;@Fekety1979;@Britton2012]. *C. difficile* infection (CDI) is responsible for most cases of antibiotic-associated colitis, a toxin-mediated diarrheal disease that has dramatically increased in prevalence over the last 10 years. There are an estimated 453,000 cases of CDI resulting in 29,000 deaths in the United States annually [@Lessa2015]. Antibiotics are a major risk factor for CDI and are thought to increase susceptibility by disrupting the gut bacterial community structure; however, it is still unclear what specific changes to the microbiota contribute to this susceptibility [@Antonopoulos2009;@Buffie2012]. Although most classes of antibiotics have been associated with initial susceptibility to CDI, fluoroquinolones, clindamycin, and cephalosporins are linked to increased risk of recurrent or persistent infection [@Thomas2003;@Brown2013;@Bignardi1998]. This raises questions about the groups of bacteria that are differentially impacted by certain therapies and how these changes effect duration or severity of the infection.

Associations between the membership and functional capacity of the microbiota as measured by the metabolic output suggest that antibiotics increase susceptibility by altering the nutrient milieu in the gut to one that favors *C. difficile* metabolism [@Antunes2011;@Jump2014;@Theriot2014]. One hypothesis is that *C. difficile* colonization resistance is driven by competition for growth substrates by an intact community of metabolic specialists. This has been supported by animal model experiments over the past several decades [@Wilson1988;@Sambol2002;@Perez-Cobas2014]. This line of reasoning has been carried through to the downstream restoration of colonization resistance with the application of fecal microbiota transplant (FMT). Although an individual's microbiota may not return to its precise original state following FMT, it is hypothesized that the functional capacity of the new microbiota is able to outcompete *C. difficile* for resources and clear the infection [@Zaura2015;@Theriot2014].

Leveraging distinct antibiotic treatment regimens in a murine model of CDI [@Schubert2015], we and others have shown that *C. difficile* adapts its physiology to the distinct cecal microbiomes that resulted from exposure to antibiotics [@Schubert2015;@Jenior2017]. We went on to show that *C. difficile* appears to adapt portions of its metabolism to fit alternative nutrient niche landscapes. As the diet of the mice remained unchanged, changes in the cecal metabolome are likely driven by the intestinal microbiota. Although it has been established that *C. difficile* colonizes these communities effectively, it is unknown whether the differences in the metabolic activity of communities following antibiotic treatment are impacted by *C. difficile* colonization or if they correlate with prolonged infection. Historically, it has been difficult to ascribe specific metabolic contributions to individual taxa within the microbiota during perturbations, especially within the context of a host. To address this limited understanding, we employed an integrative untargeted metabolomic and metagenome-enabled metatransciptomic approach to investigate specific responses to infection of the gut microbiota in a murine model of CDI. This high-dimensional analysis allowed us to not only characterize the metabolic output of the community, but to also identify which subgroups of bacteria were differentially active during mock infection and CDI. Our results supported the hypothesis that CDI was indeed associated with altered community-level gene transcription and metabolomic profile of susceptible environments. This effect was significantly more pronounced in communities where *C. difficile* was able to maintain colonization. This work highlights the need for increased appreciation of the differential, combined effects of antibiotics and CDI on the gut microbiota to develop more successful targeted therapies that eliminate *C. difficile* colonization.

```{r startup, echo=F, message=F, warning=F, cache=F}

# Load dependencies
deps <- c('vegan', 'shape', 'plotrix', 'reshape2', 'GMD', 'randomForest', 'RColorBrewer', 'gplots','viridis', 'scales','Hmisc','VennDiagram','AUCRF')
for (dep in deps){
  if (dep %in% installed.packages()[,"Package"] == FALSE){
    install.packages(as.character(dep), quiet=TRUE);
  }
  library(dep, verbose=FALSE, character.only=TRUE)
}
rm(dep, deps)

# Filter out columns that have values in at least 3 samples (ignores first column if needed)
filter_table <- function(data) {

  drop <- c()
  if (class(data[,1]) != 'character') {
    if (sum(data[,1] != 0) < 3) {
      drop <- c(drop, colnames(data)[1])
    }
  }

  for (index in 2:ncol(data)) {
    if (sum(data[,index] != 0) < 3) {
      drop <- c(drop, colnames(data)[index])
    }
  }

  filtered_data <- data[,!(colnames(data) %in% drop)]
  return(filtered_data)
}

# Neatly merge 2 matices with shared row names
clean_merge <- function(data_1, data_2){

  clean_merged <- merge(data_1, data_2, by='row.names', all.y=TRUE)
  rownames(clean_merged) <- clean_merged$Row.names
  clean_merged$Row.names <- NULL

  return(clean_merged)
}


# Significant feature filter with random forest (2 classes only)
susceptibilityRF <- function(feature_data){

  # Breiman (2001). Random Forests. Machine Learning.
  lvls <- as.vector(unique(feature_data$susceptibility))
  x <- round(length(rownames(feature_data[which(feature_data$susceptibility==lvls[1]),])) * 0.623)
  y <- round(length(rownames(feature_data[which(feature_data$susceptibility==lvls[2]),])) * 0.623)
  z <- max(c(round(x / y), round(y / x))) * 3
  nTrees <- round(ncol(feature_data) - 1) * z
  mTries <- round(sqrt(ncol(feature_data) - 1))
  rm(lvls, x, y, z)

  # Finish reformatting data
  training_data <- feature_data
  susceptibility <- training_data$susceptibility
  susceptibility <- as.factor(susceptibility)
  training_data$susceptibility <- NULL
  names(training_data) <- make.names(names(training_data))
  training_data <- data.matrix(training_data)
  training_data[is.na(training_data)] <- 0

  # Run random forest and get MDA values
  set.seed(906801)
  modelRF <- randomForest(susceptibility ~ ., data=training_data,
                          importance=TRUE, replace=FALSE, err.rate=TRUE,
                          ntree=nTrees, mtry=mTries)
  featRF <- importance(modelRF, type=1)
  featRF <- as.data.frame(featRF)
  featRF$features <- rownames(featRF)
  rm(modelRF)

  # Filter to significant features (Strobl 2002) and sort
  sigFeatRF <- as.data.frame(subset(featRF, featRF$MeanDecreaseAccuracy > (abs(min(featRF$MeanDecreaseAccuracy)))))

  # Subset trainng data to significant features
  finalFeat <- training_data[,which(rownames(sigFeatRF) %in% colnames(training_data))]
  rm(training_data, sigFeatRF)
  return(finalFeat)
}

# Significant feature filter with random forest (2 classes only)
infectionRF <- function(feature_data){

  # Breiman (2001). Random Forests. Machine Learning.
  lvls <- as.vector(unique(feature_data$infection))
  x <- round(length(rownames(feature_data[which(feature_data$infection==lvls[1]),])) * 0.623)
  y <- round(length(rownames(feature_data[which(feature_data$infection==lvls[2]),])) * 0.623)
  z <- max(c(round(x / y), round(y / x))) * 3
  nTrees <- round(ncol(feature_data) - 1) * z
  mTries <- round(sqrt(ncol(feature_data) - 1))
  rm(lvls, x, y, z)

  # Finish reformatting data
  training_data <- feature_data
  infection <- training_data$infection
  infection <- as.factor(infection)
  training_data$infection <- NULL
  names(training_data) <- make.names(names(training_data))
  training_data <- data.matrix(training_data)
  training_data[is.na(training_data)] <- 0

  # Run random forest and get MDA values
  set.seed(906801)
  modelRF <- randomForest(infection ~ ., data=training_data,
                          importance=TRUE, replace=FALSE, err.rate=TRUE,
                          ntree=nTrees, mtry=mTries)
  featRF <- importance(modelRF, type=1)
  featRF <- as.data.frame(featRF)
  featRF$features <- rownames(featRF)
  rm(modelRF)

  # Filter to significant features (Strobl 2002) and sort
  sigFeatRF <- as.data.frame(subset(featRF, featRF$MeanDecreaseAccuracy > (abs(min(featRF$MeanDecreaseAccuracy)))))
  sigFeatRF <- sigFeatRF[order(-sigFeatRF$MeanDecreaseAccuracy),]
  return(sigFeatRF)
}

# Significant feature filter with random forest (2 or more classes)
abxRF <- function(feature_data){

  mTries <- round(sqrt(ncol(feature_data) - 1))

  # Finish reformatting data
  training_data <- feature_data
  abx <- training_data$abx
  abx <- as.factor(abx)
  training_data$abx <- NULL
  names(training_data) <- make.names(names(training_data))
  training_data <- data.matrix(training_data)
  training_data[is.na(training_data)] <- 0

  # Run random forest and get MDA values
  set.seed(906801)
  modelRF <- randomForest(abx ~ ., data=training_data,
                          importance=TRUE, replace=FALSE, err.rate=TRUE, mtry=mTries)
  featRF <- importance(modelRF, type=1)
  featRF <- as.data.frame(featRF)
  featRF$features <- rownames(featRF)
  rm(modelRF)

  # Filter to significant features (Strobl 2002) and sort
  sigFeatRF <- as.data.frame(subset(featRF, featRF$MeanDecreaseAccuracy > (abs(min(featRF$MeanDecreaseAccuracy)))))

  # Subset trainng data to significant features
  finalFeat <- training_data[,which(rownames(sigFeatRF) %in% colnames(training_data))]
  rm(training_data, sigFeatRF)
  return(finalFeat)
}

# Calculates distance of a point from x=y in 2-d space
dist_xy <- function(x) {
  v1 <- c(20,20) - x
  v2 <- c(0,0) - c(20,20)
  m <- cbind(v1, v2)
  distance <- abs(det(m)) / sqrt(sum(v1 * v1))
  return(distance)
}

```

### Results

```{r figure_1, echo=F, warning=F, message=F, results='hide', cache=F}

# Load in data
metadata <- read.delim('data/metadata.tsv', sep='\t', header=T, row.names=1)
metadata <- subset(metadata, type == 'conventional') # remove germfree
metadata$type <- NULL
metadata$cage <- NULL
metadata$mouse <- NULL
metadata$gender <- NULL
wetlab <- read.delim('data/wetlab_assays.tsv', sep='\t', header=T, row.names=1)
cfu <- read.delim('data/cfu_time.tsv', sep='\t', header=TRUE)
taxonomy_family <- read.delim('data/16S_analysis/all_treatments.family.cons.format.taxonomy', sep='\t', header=T)
shared_family <- read.delim('data/16S_analysis/all_treatments.family.subsample.shared', sep='\t', header=T, row.names=2)
shared_family <- shared_family[!rownames(shared_family) %in% c('CefC5M2'), ]
shared_family <- shared_family[ ,!(names(shared_family) == 'Otu008')]
shared_family$numOtus <- NULL
shared_family$label <- NULL

# Format wetlab assay data
wetlab <- subset(wetlab, infection == '630') # Remove uninfected controls
wetlab <- subset(wetlab, treatment != 'germfree') # Remove germfree
wetlab$infection <- NULL
wetlab$cfu_spore <- NULL
wetlab$toxin_titer <- NULL
wetlab$cfu_vegetative <- as.numeric(wetlab$cfu_vegetative)
wetlab$cfu_vegetative[wetlab$cfu_vegetative == 0] <- 100
wetlab$cfu_vegetative <- log10(wetlab$cfu_vegetative)
wetlab$treatment <- droplevels(wetlab$treatment)
wetlab$treatment <- factor(wetlab$treatment, levels=c('conventional', 'streptomycin', 'cefoperazone', 'clindamycin'))
wetlab$cfu_vegetative[wetlab$cfu_vegetative <= 2.0] <- 1.7 # Undetectable points below LOD
noabx_veg <- as.numeric(subset(wetlab, treatment == 'conventional')$cfu_vegetative)
strep_veg <- as.numeric(subset(wetlab, treatment == 'streptomycin')$cfu_vegetative)
cef_veg <- as.numeric(subset(wetlab, treatment == 'cefoperazone')$cfu_vegetative)
clinda_veg <- as.numeric(subset(wetlab, treatment == 'clindamycin')$cfu_vegetative)
rm(wetlab)
# Significance testing
p_val <- p.adjust(c(wilcox.test(noabx_veg, strep_veg, exact=FALSE)$p.value,
                    wilcox.test(noabx_veg, cef_veg, exact=FALSE)$p.value,
                    wilcox.test(noabx_veg, clinda_veg, exact=FALSE)$p.value),
                  method='BH')
veg_pval <- as.character(max(round(p_val, 3)))
if (veg_pval == '0') {
  veg_pval = '0.001'
}
rm(noabx_veg, strep_veg, cef_veg, clinda_veg, p_val)

# Format CFU over time
cfu <- cfu[,c(1,11)]
strep_stoolCFU <- subset(cfu, abx == 'streptomycin')[,2]
cef_stoolCFU <- subset(cfu, abx == 'cefoperazone')[,2]
clinda_stoolCFU <- subset(cfu, abx == 'clindamycin')[,2]
rm(cfu)
# Significance testing
p_val <- p.adjust(c(wilcox.test(strep_stoolCFU, clinda_stoolCFU, exact=FALSE)$p.value,
                    wilcox.test(cef_stoolCFU, clinda_stoolCFU, exact=FALSE)$p.value),
                  method='BH')
stoolCFU_pval <- as.character(max(round(p_val, 3)))
rm(strep_stoolCFU, cef_stoolCFU, clinda_stoolCFU, p_val)

# Convert to relative abundance
relabund_family <- (shared_family / rowSums(shared_family)) * 100
rm(shared_family)

# Rename OTUs to families
relabund_family <- t(relabund_family)
relabund_family <- merge(taxonomy_family, relabund_family, by.x='otu', by.y='row.names')
rownames(relabund_family) <- relabund_family$family
relabund_family$otu <- NULL
relabund_family$phylum <- NULL
relabund_family$family <- NULL
relabund_family <- t(relabund_family)

# Combine with metadata and subset to pretreatments
relabund_family <- clean_merge(metadata, relabund_family)
relabund_family$susceptibility <- NULL
relabund_family <- subset(relabund_family, infection == '630')
relabund_family$infection <- NULL
relabund_family_cef <- subset(relabund_family, abx == 'cefoperazone')
relabund_family_cef <- subset(relabund_family_cef, rownames(relabund_family_cef) != 'CefC3M2')
relabund_family_cef$abx <- NULL
relabund_family_strep <- subset(relabund_family, abx == 'streptomycin')
relabund_family_strep$abx <- NULL
relabund_family_clinda <- subset(relabund_family, abx == 'clindamycin')
relabund_family_clinda$abx <- NULL
rm(relabund_family)

# Get % of community for specific groups
cef_bacter <- median(relabund_family_cef$Bacteroidaceae)
cef_bacter <- as.character(round(cef_bacter, 2))
rm(relabund_family_cef)
strep_lacto <- median(relabund_family_strep$Lactobacillaceae)
strep_lacto <- as.character(round(strep_lacto, 2))
rm(relabund_family_strep)
clinda_lacto <- median(relabund_family_clinda$Lactobacillaceae)
clinda_lacto <- as.character(round(clinda_lacto, 2))
rm(relabund_family_clinda)

```

#### Distinct antibiotic pretreatments are associated with alternative community structures that are equally susceptible to initial *C. difficile* colonization, but differ in patterns of clearance.
We have previously shown that when conventionally-reared SPF mice were pretreated with one of three different antibiotics (streptomycin, cefoperazone, and clindamycin; Table S1), each pretreatment was associated with altered patterns of *C. difficile* virulence factor expression [@Jenior2017]. These antibiotics were chosen for not only the ability to reduce *C. difficile* colonization resistance in a mouse model [@Schubert2015], but also for distinct and significant impacts on the structure and diversity of the cecal microbiota (Fig. 1A) [@Jenior2017]. In each antibiotic pretreatment, we observed equally high levels of *C. difficile* colonization on the day after infection, however, over the subsequent 9 days  the amount of *C. difficile* in the feces of clindamycin-pretreated mice were the only mice to fall below the limit of detection, while mice receiving the other pretreatments remained highly colonized (*p* = `r stoolCFU_pval`; Fig. 1A). We hypothesized that this occurred in the clindamycin-pretreated mice because the perturbed intestinal community occupied niche space that overlapped with that of *C. difficile*.

We chose to focus our remaining experiments on cecal samples collected 18 hours after infection to the assess behavior of *C. difficile* directly prior to the reduction in detectable *C. difficile*. This end point corresponded with a previous study where *C. difficile* reached maximum cecal vegetative cell load with few detectable spores [@Koenigsknecht2015]. We also elected to examine cecal content because it was more likely to be a site of active bacterial metabolism compared to stool and would allow for an assessment of functional differences in the microbiota. At 18 hours after infection, we found that the communities remained highly differentiated from untreated controls as measured by 16S rRNA gene sequencing of the V4 region (Fig. 1B). The composition of streptomycin-pretreated communities was more variable between cages, but was generally enriched for members of the *Bacteroidetes* phylum. Cefoperazone and clindamycin-pretreated cecal communities were consistently dominated by members of the *Lactobacillaceae* and *Enterobacteriaceae* families, respectively. Despite variation in the community structures, there were no significant differences in the number of vegetative cells between any antibiotic-pretreatment group (Fig. 1C). All susceptible mice were colonized with ~1x10^8^ vegetative colony forming units (CFU) per gram of cecal content and untreated mice maintained *C. difficile* colonization resistance. 

```{r figure_2, echo=F, warning=F, message=F, results='hide', cache=F}

aucrfClearance <- function(training_data){
  # Format levels of susceptibility for AUCRF
  colnames(training_data) <- make.names(colnames(training_data))
  levels <- as.vector(unique(training_data$clearance))
  training_data$clearance <- as.character(training_data$clearance)
  training_data$clearance[which(training_data$clearance==levels[1])] <- 0
  training_data$clearance[which(training_data$clearance==levels[2])] <- 1
  training_data$clearance <- as.factor(as.numeric(training_data$clearance))
  rm(levels)
  # Run AUCRF with reproduceable parameters
  set.seed(906801)
  data_RF <- AUCRF(clearance ~ ., data=training_data, pdel=0.05, k0=5, ranking='MDA')
  return(data_RF)
}

#----------------#

# Read in data

# Metabolomes
metabolome <- read.delim('data/metabolome/scaled_intensities.log10.tsv', sep='\t', header=TRUE)

# 16S
shared_otu <- read.delim('data/16S_analysis/all_treatments.0.03.unique_list.conventional.shared', sep='\t', header=T, row.names=2)
shared_otu <- shared_otu[!rownames(shared_otu) %in% c('CefC5M2'), ]  # Remove possible contaminated sample
cdf_otu <- shared_otu[,(names(shared_otu) %in% c('Otu0004','Otu0308'))]
shared_otu <- shared_otu[,!(names(shared_otu) %in% c('Otu0004','Otu0308'))] # Remove residual C. difficile OTUs
shared_noncdf <- subset(shared_otu, rowSums(cdf_otu) != 0)
cdf_otu <- subset(cdf_otu, rowSums(cdf_otu) != 0)
cdf_percent <- as.character(round(mean(rowSums(cdf_otu) / rowSums(shared_noncdf)) * 100, 3))
rm(cdf_otu, shared_noncdf)
shared_otu$numOtus <- NULL
shared_otu$label <- NULL

# Metadata
metadata <- read.delim('data/metadata.tsv', sep='\t', header=T, row.names=1)

# Metadata
metadata$type <- NULL
metadata$cage <- NULL
metadata$mouse <- NULL
metadata$gender <- NULL
metadata$clearance <- c(rep('colonized',18), rep('cleared',18), rep('colonized',18),
                        rep('mock',12), rep('colonized',18))

# Metabolomes
metabolome$BIOCHEMICAL <- gsub('_', ' ', metabolome$BIOCHEMICAL)
rownames(metabolome) <- metabolome$BIOCHEMICAL
metabolome$BIOCHEMICAL <- NULL
metabolome$PUBCHEM <- NULL
metabolome$KEGG <- NULL
metabolome$SUB_PATHWAY <- NULL
metabolome$SUPER_PATHWAY <- NULL
metabolome <- as.data.frame(t(metabolome))
metabolome <- clean_merge(metadata, metabolome)
metabolome <- subset(metabolome, abx != 'germfree')
metabolome <- subset(metabolome, abx != 'none')
metabolome <- subset(metabolome, infection == 'mock')
metabolome$abx <- NULL
metabolome$susceptibility <- NULL
metabolome$infection <- NULL

# 16S
shared_otu <- clean_merge(metadata, shared_otu)
shared_otu <- subset(shared_otu, abx != 'germfree')
shared_otu <- subset(shared_otu, abx != 'none')
shared_otu <- subset(shared_otu, infection == 'mock')
shared_otu$abx <- NULL
shared_otu$susceptibility <- NULL
shared_otu$infection <- NULL

# Ordination analysis
# Metabolome - 728 metabolites
metabolome_dist <- vegdist(metabolome[,2:ncol(metabolome)], method='bray') # Bray-Curtis
# permANOVA
metabolome_permANOVA_pval <- adonis(metabolome_dist ~ metabolome$clearance, metabolome, perm=999)$aov.tab[[6]][1]
metabolome_permANOVA_pval <- as.character(round(metabolome_permANOVA_pval, 3))
rm(metabolome_dist)
# 16S - 810 OTUs
otu_dist <- vegdist(shared_otu[,2:ncol(shared_otu)], method='bray') # Bray-Curtis
# permANOVA
otu_permANOVA_pval <- adonis(otu_dist ~ shared_otu$clearance, shared_otu, perm=999)$aov.tab[[6]][1]
otu_permANOVA_pval <- as.character(round(otu_permANOVA_pval, 3))
rm(otu_dist)
rm(metadata)

# Feature selection
# Metabolome
colnames(metabolome) <- make.names(colnames(metabolome))
metabolome_aucrf <- aucrfClearance(metabolome)
# Get OOB
metabolome_aucrf_oob <- metabolome_aucrf$RFopt
metabolome_aucrf_oob <- metabolome_aucrf_oob$err.rate
metabolome_aucrf_oob <- as.character(round(median(metabolome_aucrf_oob[,1]) * 100, 3))
rm(metabolome_aucrf, metabolome)
# 16S
# AUCRF feature selection/reduction (to 0% OOB)
colnames(shared_otu) <- make.names(colnames(shared_otu))
shared_otu_aucrf <- aucrfClearance(shared_otu)
# Get OOB
shared_otu_aucrf_oob <- shared_otu_aucrf$RFopt
shared_otu_aucrf_oob <- shared_otu_aucrf_oob$err.rate
shared_otu_aucrf_oob <- as.character(round(median(shared_otu_aucrf_oob[,1]) * 100, 3))
rm(shared_otu, shared_otu_aucrf)

```

#### Multiple biological signatures in the bacterial community and metabolome differentiated cecal microbiomes that remained colonized by *C. difficile* from those that did not.
Pretreatment with antibiotics not only alters the structure of the resident microbiota, but also has a dramatic impact on the intestinal metabolome [@Antunes2011; @Jump2014; @Theriot2014]. To understand the ramifications each antibiotic had on the cecal metabolomic environment, we performed untargeted metabolomic analysis on the cecal contents that were also utilized in the 16S rRNA gene sequencing. We identified a total of 727 distinct metabolites. In combination with our 16S rRNA gene sequencing results, we first characterized the differences between the microbiomes (i.e. the microbiota, plus the associated metabolome) of the mock-infected animals to quantify possible drivers of communities that cleared the infection. To focus our analysis on ascertaining changes in discrete populations within the microbiota, we generated operational taxonomic units (OTUs) clustered at 97% similarity. We also removed all *C. difficile* 16S rRNA gene sequences, which represented an average of `r cdf_percent`% sequencing reads across infection groups to eliminate its direct impact in downstream calculation. Using these methods we discovered that the Bray-Curtis dissimilarity of both the community structure (*p* < `r otu_permANOVA_pval`) and metabolome (*p* < `r metabolome_permANOVA_pval`) were significantly different between cleared and colonized groups during the early stages of infection (Fig. 2A & 2C). These results supported the hypothesis that the cecal environment created by clindamycin pretreatment was highly divergent from the other groups, and likely contributed to the clearance seen in the subsequent days.

To identify the populations and metabolites that were associated with sustained colonization, we utilized Random Forest machine learning with cross validation to identify the smallest optimal subset of features that could successfully differentiate microbiomes associated with infection clearance and those that remain colonized [@Calle2011]. We identified a model with 5 OTUs that correctly classified all samples to their corresponding groups (Fig. 2B; Out-of-bag error=`r shared_otu_aucrf_oob`%). Interestingly, these OTUs were not consistently abundant in antibiotic-pretreated communities. Similarly, when we used the same approach with the metabolomic data, we identified a model that used 5 metabolites that correctly differentiated the groups (Fig. 2D; Out-of-bag error=`r metabolome_aucrf_oob`%). Together these results further supported the hypothesis that the environment of the cecum, even early during infection, is distinct between groups that clear the infection and those that maintain *C. difficile* at high levels. Furthermore, results from machine learning analysis suggest that rare members of the communities had a disproportionate influence on the clearance patterns observed between pretreatment regimes and that changes in community structure may be less consistent than changes in the metatransciptome or metabolome.

```{r figure_s1, echo=F, warning=F, message=F, results='hide', cache=F}

aucrfSusceptibility <- function(training_data){

  # Format levels of susceptibility for AUCRF
  colnames(training_data) <- make.names(colnames(training_data))
  levels <- as.vector(unique(training_data$susceptibility))
  training_data$susceptibility <- as.character(training_data$susceptibility)
  training_data$susceptibility[which(training_data$susceptibility==levels[1])] <- 0
  training_data$susceptibility[which(training_data$susceptibility==levels[2])] <- 1
  training_data$susceptibility <- as.factor(as.numeric(training_data$susceptibility))
  rm(levels)

  # Run AUCRF with reproduceable parameters
  set.seed(906801)
  data_RF <- AUCRF(susceptibility ~ ., data=training_data, pdel=0.05, k0=5, ranking='MDA')
  return(data_RF)
}

# Metadata
metadata <- read.delim('data/metadata.tsv', sep='\t', header=T, row.names=1)
metadata$type <- NULL
metadata$cage <- NULL
metadata$mouse <- NULL
metadata$gender <- NULL

# Metabolomes
metabolome <- read.delim('data/metabolome/scaled_intensities.log10.tsv', sep='\t', header=TRUE)
metabolome$BIOCHEMICAL <- gsub('_', ' ', metabolome$BIOCHEMICAL)
rownames(metabolome) <- metabolome$BIOCHEMICAL
metabolome$BIOCHEMICAL <- NULL
metabolome$PUBCHEM <- NULL
metabolome$KEGG <- NULL
metabolome$SUB_PATHWAY <- NULL
metabolome$SUPER_PATHWAY <- NULL
metabolome <- as.data.frame(t(metabolome))
metabolome <- clean_merge(metadata, metabolome)
metabolome <- subset(metabolome, abx != 'germfree')
metabolome <- subset(metabolome, infection == 'mock')
metabolome$abx <- NULL
metabolome$infection <- NULL

# 16S
shared_otu <- read.delim('data/16S_analysis/all_treatments.0.03.unique_list.conventional.shared', sep='\t', header=T, row.names=2)
shared_otu <- shared_otu[!rownames(shared_otu) %in% c('CefC5M2'), ]  # Remove possible contaminated sample
shared_otu <- shared_otu[,!(names(shared_otu) %in% c('Otu0004','Otu0308'))]
shared_otu$numOtus <- NULL
shared_otu$label <- NULL
shared_otu <- clean_merge(metadata, shared_otu)
shared_otu <- subset(shared_otu, abx != 'germfree')
shared_otu <- subset(shared_otu, infection == 'mock')
shared_otu$abx <- NULL
shared_otu$infection <- NULL
rm(metadata)

# Metabolome
# AUCRF feature selection/reduction (to 0% OOB)
colnames(metabolome) <- make.names(colnames(metabolome))
metabolome_aucrf <- aucrfSusceptibility(metabolome)
# Get OOB
metabolome_aucrf_oob <- metabolome_aucrf$RFopt
metabolome_aucrf_oob <- metabolome_aucrf_oob$err.rate
metabolome_aucrf_oob <- as.character(round(median(metabolome_aucrf_oob[,1]) * 100, 3))
# Get features
metabolome_aucrf <- as.character(OptimalSet(metabolome_aucrf)$Name)
res_metabolome <- subset(metabolome, susceptibility == 'resistant')[, metabolome_aucrf]
res_metabolome$susceptibility <- NULL
sus_metabolome <- subset(metabolome, susceptibility == 'susceptible')[, metabolome_aucrf]
sus_metabolome$susceptibility <- NULL
rm(metabolome, metabolome_aucrf)
# Find significant differences
metabolome_pval <- c()
for (i in 1:ncol(res_metabolome)){metabolome_pval[i] <- wilcox.test(res_metabolome[,i], sus_metabolome[,i], exact=FALSE)$p.value}
metabolome_pval <- round(p.adjust(metabolome_pval, method='BH'), 3)
metabolome_sus_pval <- as.character(max(metabolome_pval))
if (metabolome_sus_pval == '0') {
  metabolome_sus_pval = '0.001'
}
rm(res_metabolome, sus_metabolome, metabolome_pval)

# 16S
# AUCRF feature selection/reduction (to 0% OOB)
colnames(shared_otu) <- make.names(colnames(shared_otu))
shared_otu_aucrf <- aucrfSusceptibility(shared_otu)
# Get OOB
shared_otu_aucrf_oob <- shared_otu_aucrf$RFopt
shared_otu_aucrf_oob <- shared_otu_aucrf_oob$err.rate
shared_otu_aucrf_oob <- as.character(round(median(shared_otu_aucrf_oob[,1]) * 100, 3))
# Get features
shared_otu_aucrf <- as.character(OptimalSet(shared_otu_aucrf)$Name)
res_shared_otu <- subset(shared_otu, susceptibility == 'resistant')[, shared_otu_aucrf]
res_shared_otu$susceptibility <- NULL
sus_shared_otu <- subset(shared_otu, susceptibility == 'susceptible')[, shared_otu_aucrf]
sus_shared_otu$susceptibility <- NULL
rm(shared_otu, shared_otu_aucrf)
# Find significant differences
otu_pval <- c()
for (i in 1:ncol(res_shared_otu)){otu_pval[i] <- wilcox.test(res_shared_otu[,i], sus_shared_otu[,i], exact=FALSE)$p.value}
otu_pval <- round(p.adjust(otu_pval, method='BH'), 3)
otu_sus_pval <- as.character(max(otu_pval))
if (otu_sus_pval == '0') {
  otu_sus_pval = '0.001'
}
rm(res_shared_otu, sus_shared_otu, otu_pval)

```


```{r figure_s2, echo=F, warning=F, message=F, results='hide', cache=F}

# Metadata
metadata <- read.delim('data/metadata.tsv', sep='\t', header=T, row.names=1)
metadata$type <- NULL
metadata$cage <- NULL
metadata$mouse <- NULL
metadata$gender <- NULL

# Metabolomes
metabolome <- read.delim('data/metabolome/scaled_intensities.log10.tsv', sep='\t', header=TRUE)
metabolome$BIOCHEMICAL <- gsub('_', ' ', metabolome$BIOCHEMICAL)
rownames(metabolome) <- metabolome$BIOCHEMICAL
metabolome$BIOCHEMICAL <- NULL
metabolome$PUBCHEM <- NULL
metabolome$KEGG <- NULL
metabolome$SUB_PATHWAY <- NULL
metabolome$SUPER_PATHWAY <- NULL
metabolome <- as.data.frame(t(metabolome))
metabolome <- clean_merge(metadata, metabolome)
metabolome <- subset(metabolome, abx != 'germfree')
metabolome <- subset(metabolome, susceptibility == 'susceptible')
metabolome$abx <- NULL
metabolome$susceptibility <- NULL

# 16S
shared_otu <- read.delim('data/16S_analysis/all_treatments.0.03.unique_list.conventional.shared', sep='\t', header=T, row.names=2)
shared_otu <- shared_otu[!rownames(shared_otu) %in% c('CefC5M2'), ]  # Remove possible contaminated sample
shared_otu <- shared_otu[,!(names(shared_otu) %in% c('Otu0004','Otu0308'))] # Remove residual C. difficile OTUs
shared_otu$numOtus <- NULL
shared_otu$label <- NULL
shared_otu <- clean_merge(metadata, shared_otu)
shared_otu <- subset(shared_otu, abx != 'germfree')
shared_otu <- subset(shared_otu, susceptibility == 'susceptible')
shared_otu$abx <- NULL
shared_otu$susceptibility <- NULL
rm(metadata)

# Metabolome
metabolome_dist <- vegdist(metabolome[,2:ncol(metabolome)], method='bray') # Bray-Curtis
metabolome_permANOVA_pval <- adonis(metabolome_dist ~ metabolome$infection, metabolome, perm=999)$aov.tab[[6]][1]
metabolome_permANOVA_pval <- as.character(round(metabolome_permANOVA_pval, 3))
rm(metabolome_dist, metabolome)

# 16S
otu_dist <- vegdist(shared_otu[,2:ncol(shared_otu)], method='bray') # Bray-Curtis
otu_permANOVA_pval <- adonis(otu_dist ~ shared_otu$infection, shared_otu, perm=999)$aov.tab[[6]][1]
otu_permANOVA_pval <- as.character(round(otu_permANOVA_pval, 3))
rm(otu_dist, shared_otu)

```

#### Amino-acid metabolism by *C. difficile* appears important for sustained colonization across susceptible environments.
*C. difficile*'s ability to metabolize amino acids via Stickland fermentation may be a critical nutrient niche that enables it to colonize some perturbed communities [@Fletcher2018]. We were curious whether this behavior was conserved across multiple distinct gut environments where *C. difficile* was able to colonize. We assessed the changes between the antibiotic-pretreated, mock-infected microbiomes and those of untreated, *C. difficile*-resistant animals. Not only were the relative abundances of Stickland fermentation substrates increased across susceptible environments, but several secondary bile acids, which have been shown to be negatively correlated with *C. difficile* susceptibility were significantly decreased (Fig. S1D; *p* < `r metabolome_sus_pval`). Additionally, when we constructed a Random Forest classification model to differentiate the groups, we identified multiple members of the *Clostridia* which are capable of metabolizing amino acids for growth [@Dai2015]. The relative abundances of these populations were significantly lower in susceptible animals (Fig. S1B; *p* < `r otu_sus_pval`). We also performed a similar analysis to investigate changes induced by *C. difficile* colonization itself in these susceptible conditions. Although CDI alone did not induce significant shifts in the global community structure or metabolome (Fig. S2 A & C; *p* = `r otu_permANOVA_pval`, `r metabolome_permANOVA_pval`), several features were able to discriminate infected and uninfected microbiomes with high accuracy. This analysis highlighted numerous growth substrates that are known for *C. difficile* in all pretreated mice including 6 Stickland substrates, 4 of which were proline conjugates, along with arabonate/xlyonate (Fig. S2D). Furthermore 5-aminovalerate, the most common end product of Stickland fermentation, was significantly increased during infection in almost all of the metabolomes. Inspection of these specific metabolites revealed that clindamycin pretreatment was only condition where both the inputs and outputs of Stickland fermentation were less abundant relative to the untreated mice (Fig. S3). 

```{r figure_3, echo=F, warning=F, message=F, results='hide', cache=F}

# Read in data
# Normalized Metatranscriptomes
cef_normalized_reads <- read.delim('data/read_mapping/cef_normalized_metaT.tsv', sep='\t', header=TRUE, row.names=7)
clinda_normalized_reads <- read.delim('data/read_mapping/clinda_normalized_metaT.tsv', sep='\t', header=TRUE, row.names=7)
strep_normalized_reads <- read.delim('data/read_mapping/strep_normalized_metaT.tsv', sep='\t', header=TRUE, row.names=7)
# KEGG organism file
kegg_tax <- read.delim('data/kegg/kegg_taxonomy.tsv', sep='\t', header=TRUE)
kegg_tax[] <- lapply(kegg_tax, as.character)
# 16S data
genus_shared <- read.delim('data/16S_analysis/all_treatments.0.03.unique_list.0.03.filter.0.03.subsample.shared', sep='\t', header=TRUE, row.names=2)
genus_shared$label <- NULL
genus_shared$numOtus <- NULL
genus_tax <- read.delim('data/16S_analysis/all_treatments.genus.final.taxonomy', sep='\t', header=TRUE)
rownames(genus_tax) <- genus_tax$OTU
genus_tax$OTU <- NULL
# Metadata
metadata <- read.delim('data/metadata.tsv', sep='\t', header=TRUE, row.names=1)

# Format transcription
cef_normalized_reads$gene <- NULL
cef_normalized_reads[,c(1:2)] <- log2(cef_normalized_reads[,c(1:2)] + 1)
clinda_normalized_reads$gene <- NULL
clinda_normalized_reads[,c(1:2)] <- log2(clinda_normalized_reads[,c(1:2)] + 1)
strep_normalized_reads$gene <- NULL
strep_normalized_reads[,c(1:2)] <- log2(strep_normalized_reads[,c(1:2)] + 1)
# Remove those genes without an organism annotation
cef_annotated <- cef_normalized_reads[!is.na(cef_normalized_reads$organism),]
clinda_annotated <- clinda_normalized_reads[!is.na(clinda_normalized_reads$organism),]
strep_annotated <- strep_normalized_reads[!is.na(strep_normalized_reads$organism),]
rm(cef_normalized_reads, clinda_normalized_reads, strep_normalized_reads)
# Remove any C. difficile genes with transcription only in infected
cdiff_omit <- c('cdf','pdc','cdc','cdl','pdf')
cef_annotated <- subset(cef_annotated, !(cef_annotated$organism %in% cdiff_omit))
clinda_annotated <- subset(clinda_annotated, !(clinda_annotated$organism %in% cdiff_omit))
strep_annotated <- subset(strep_annotated, !(strep_annotated$organism %in% cdiff_omit))
rm(cdiff_omit)
# Remove all possible mammalian genes
mamm_omit <- c('fab','cfa','ggo','hgl','hsa','mcc','mdo','pon','aml',
               'ptr','rno','shr','ssc','aml','bta','cge','ecb',
               'pps','fca','mmu','oaa','gga','ola','acs','aga')
strep_annotated <- subset(strep_annotated, !(strep_annotated$organism %in% mamm_omit))
cef_annotated <- subset(cef_annotated, !(cef_annotated$organism %in% mamm_omit))
clinda_annotated <- subset(clinda_annotated, !(clinda_annotated$organism %in% mamm_omit))
rm(mamm_omit)
# Calculate correlation coefficients
strep_corr <- as.character(round(cor.test(strep_annotated[,2], strep_annotated[,1], method='spearman', exact=FALSE)$estimate, digits=3))
cef_corr <- as.character(round(cor.test(cef_annotated[,2], cef_annotated[,1], method='spearman', exact=FALSE)$estimate, digits=3))
clinda_corr <- as.character(round(cor.test(clinda_annotated[,2], clinda_annotated[,1], method='spearman', exact=FALSE)$estimate, digits=3))

# Collate transcript abundances at genus-level
genus_shared <- t(genus_shared)
genus_tax_shared <- merge(genus_tax, genus_shared, by='row.names')
genus_tax_shared$Row.names <- NULL
genus_tax_agg_shared <- aggregate(. ~ Genus, data=genus_tax_shared, FUN=sum)
rownames(genus_tax_agg_shared) <- genus_tax_agg_shared$Genus
genus_tax_agg_shared$Genus <- NULL
genus_tax_agg_shared <- t(genus_tax_agg_shared)
rm(genus_tax, genus_shared, genus_tax_shared)

# Combine with metadata and collate within treatment
genus_tax_agg_metadata_shared <- merge(metadata, genus_tax_agg_shared, by='row.names')
genus_tax_agg_metadata_shared$Row.names <- NULL
genus_tax_agg_metadata_shared$cage <- NULL
genus_tax_agg_metadata_shared$mouse <- NULL
genus_tax_agg_metadata_shared$gender <- NULL
genus_tax_agg_metadata_shared$type <- NULL
genus_tax_agg_metadata_shared$susceptibility <- NULL
genus_tax_agg_metadata_shared$infection <- NULL
genus_tax_agg_metadata_shared <- subset(genus_tax_agg_metadata_shared, genus_tax_agg_metadata_shared$abx %in% c('cefoperazone','clindamycin','streptomycin'))
genus_tax_agg_metadata_shared <- aggregate(. ~ abx, data=genus_tax_agg_metadata_shared, FUN=median)
rownames(genus_tax_agg_metadata_shared) <- genus_tax_agg_metadata_shared$abx
genus_tax_agg_metadata_shared$abx <- NULL
genus_tax_agg_metadata_shared <- (genus_tax_agg_metadata_shared / rowSums(genus_tax_agg_metadata_shared)) * 100
genus_tax_agg_metadata_shared <- as.data.frame(t(genus_tax_agg_metadata_shared))
cef_genus <- genus_tax_agg_metadata_shared
cef_genus$clindamycin <- NULL
cef_genus$streptomycin <- NULL
clinda_genus <- genus_tax_agg_metadata_shared
clinda_genus$cefoperazone <- NULL
clinda_genus$streptomycin <- NULL
strep_genus <- genus_tax_agg_metadata_shared
strep_genus$clindamycin <- NULL
strep_genus$cefoperazone <- NULL
rm(genus_tax_agg_shared, genus_tax_agg_metadata_shared, metadata)

# Using previously defined lines, find outliers to y = x
strep_630_outliers <- subset(strep_annotated, strep_annotated$strep_630_metaT_reads > strep_annotated$strep_mock_metaT_reads + 2)
strep_mock_outliers <- subset(strep_annotated, strep_annotated$strep_mock_metaT_reads > strep_annotated$strep_630_metaT_reads + 2)
cef_630_outliers <- subset(cef_annotated, cef_annotated$cef_630_metaT_reads > cef_annotated$cef_mock_metaT_reads + 2)
cef_mock_outliers <- subset(cef_annotated, cef_annotated$cef_mock_metaT_reads > cef_annotated$cef_630_metaT_reads + 2)
clinda_630_outliers <- clinda_annotated[(clinda_annotated$clinda_630_metaT_reads > clinda_annotated$clinda_mock_metaT_reads + 2),]
clinda_mock_outliers <- clinda_annotated[(clinda_annotated$clinda_mock_metaT_reads > clinda_annotated$clinda_630_metaT_reads + 2),]
rm(strep_annotated, cef_annotated, clinda_annotated)

# Calculate mean distance of outliers to x=y
dists_630 <- c()
for (i in 1:nrow(strep_630_outliers)){
  dists_630[i] <- dist_xy(c(strep_630_outliers$strep_630_metaT_reads[i], strep_630_outliers$strep_mock_metaT_reads[i]))
}
dists_mock <- c()
for (i in 1:nrow(strep_mock_outliers)){
  dists_mock[i] <- dist_xy(c(strep_mock_outliers$strep_630_metaT_reads[i], strep_mock_outliers$strep_mock_metaT_reads[i]))
}
strep_mean_dist <- as.character(round(mean(c(dists_630, dists_mock)), 3))
strep_outs <- as.character(as.numeric(length(c(dists_630, dists_mock))))
dists_630 <- c()
for (i in 1:nrow(cef_630_outliers)){
  dists_630[i] <- dist_xy(c(cef_630_outliers$cef_630_metaT_reads[i], cef_630_outliers$cef_mock_metaT_reads[i]))
}
dists_mock <- c()
for (i in 1:nrow(cef_mock_outliers)){
  dists_mock[i] <- dist_xy(c(cef_mock_outliers$cef_630_metaT_reads[i], cef_mock_outliers$cef_mock_metaT_reads[i]))
}
cef_mean_dist <- as.character(round(mean(c(dists_630, dists_mock)), 3))
cef_outs <- as.character(as.numeric(length(c(dists_630, dists_mock))))
dists_630 <- c()
for (i in 1:nrow(clinda_630_outliers)){
  dists_630[i] <- dist_xy(c(clinda_630_outliers$clinda_630_metaT_reads[i], clinda_630_outliers$clinda_mock_metaT_reads[i]))
}
dists_mock <- c()
for (i in 1:nrow(clinda_mock_outliers)){
  dists_mock[i] <- dist_xy(c(clinda_mock_outliers$clinda_630_metaT_reads[i], clinda_mock_outliers$clinda_mock_metaT_reads[i]))
}
clinda_mean_dist <- as.character(round(mean(c(dists_630, dists_mock)), 3))
clinda_outs <- as.character(as.numeric(length(c(dists_630, dists_mock))))
rm(dists_630, dists_mock, i)

# Drop levels
strep_630_outliers[] <- lapply(strep_630_outliers, as.character)
strep_mock_outliers[] <- lapply(strep_mock_outliers, as.character)
cef_630_outliers[] <- lapply(cef_630_outliers, as.character)
cef_mock_outliers[] <- lapply(cef_mock_outliers, as.character)
clinda_630_outliers[] <- lapply(clinda_630_outliers, as.character)
clinda_mock_outliers[] <- lapply(clinda_mock_outliers, as.character)

# Save KEGG ID names
strep_630_outliers$kegg_id <- rownames(strep_630_outliers)
strep_mock_outliers$kegg_id <- rownames(strep_mock_outliers)
cef_630_outliers$kegg_id <- rownames(cef_630_outliers)
cef_mock_outliers$kegg_id <- rownames(cef_mock_outliers)
clinda_630_outliers$kegg_id <- rownames(clinda_630_outliers)
clinda_mock_outliers$kegg_id <- rownames(clinda_mock_outliers)

# Merge with KEGG taxonomy
strep_630_outliers <- merge(x=strep_630_outliers, y=kegg_tax, by.x='organism', by.y='org_code', all.x=TRUE)
strep_630_outliers$org_code <- NULL
strep_630_outliers$strep_630_metaT_reads <- as.numeric(strep_630_outliers$strep_630_metaT_reads)
strep_630_outliers$strep_mock_metaT_reads <- as.numeric(strep_630_outliers$strep_mock_metaT_reads)
strep_mock_outliers <- merge(x=strep_mock_outliers, y=kegg_tax, by.x='organism', by.y='org_code', all.x=TRUE)
strep_mock_outliers$org_code <- NULL
strep_mock_outliers$strep_630_metaT_reads <- as.numeric(strep_mock_outliers$strep_630_metaT_reads)
strep_mock_outliers$strep_mock_metaT_reads <- as.numeric(strep_mock_outliers$strep_mock_metaT_reads)
cef_630_outliers <- merge(x=cef_630_outliers, y=kegg_tax, by.x='organism', by.y='org_code', all.x=TRUE)
cef_630_outliers$org_code <- NULL
cef_630_outliers$cef_630_metaT_reads <- as.numeric(cef_630_outliers$cef_630_metaT_reads)
cef_630_outliers$cef_mock_metaT_reads <- as.numeric(cef_630_outliers$cef_mock_metaT_reads)
cef_mock_outliers <- merge(x=cef_mock_outliers, y=kegg_tax, by.x='organism', by.y='org_code', all.x=TRUE)
cef_mock_outliers$org_code <- NULL
cef_mock_outliers$cef_630_metaT_reads <- as.numeric(cef_mock_outliers$cef_630_metaT_reads)
cef_mock_outliers$cef_mock_metaT_reads <- as.numeric(cef_mock_outliers$cef_mock_metaT_reads)
clinda_630_outliers <- merge(x=clinda_630_outliers, y=kegg_tax, by.x='organism', by.y='org_code', all.x=TRUE)
clinda_630_outliers$org_code <- NULL
clinda_630_outliers$clinda_630_metaT_reads <- as.numeric(clinda_630_outliers$clinda_630_metaT_reads)
clinda_630_outliers$clinda_mock_metaT_reads <- as.numeric(clinda_630_outliers$clinda_mock_metaT_reads)
clinda_mock_outliers <- merge(x=clinda_mock_outliers, y=kegg_tax, by.x='organism', by.y='org_code', all.x=TRUE)
clinda_mock_outliers$org_code <- NULL
clinda_mock_outliers$clinda_630_metaT_reads <- as.numeric(clinda_mock_outliers$clinda_630_metaT_reads)
clinda_mock_outliers$clinda_mock_metaT_reads <- as.numeric(clinda_mock_outliers$clinda_mock_metaT_reads)
rm(kegg_tax)

# Find difference in expression for outliers
colnames(strep_630_outliers) <- colnames(strep_mock_outliers)
strep_outliers <- rbind(strep_630_outliers, strep_mock_outliers)
strep_difference <- abs(((2^strep_outliers$strep_mock_metaT_reads) - 1) - ((2^strep_outliers$strep_630_metaT_reads) - 1))
strep_difference <- as.data.frame(cbind(strep_outliers$genus, strep_difference))
colnames(strep_difference) <- c('genus', 'difference')
strep_gene_count <- as.data.frame(table(strep_difference$genus))
colnames(strep_gene_count) <- c('genus', 'count')
strep_difference <- aggregate(. ~ genus, data=strep_difference, FUN=sum)
strep_difference <- merge(strep_difference, strep_gene_count, by='genus')
strep_difference$difference <- strep_difference$difference / strep_difference$count
strep_difference$count <- NULL
rm(strep_gene_count, strep_630_outliers, strep_mock_outliers)
strep_difference$difference <- log2(as.numeric(as.character(strep_difference$difference)) + 1)
strep_genus_diff <- merge(strep_genus, strep_difference, by.x='row.names', by.y='genus')
rownames(strep_genus_diff) <- strep_genus_diff$Row.names
strep_genus_diff$Row.names <- NULL
colnames(strep_genus_diff) <- c('relAbund','transcriptChange')
rm(strep_difference, strep_genus)
colnames(cef_630_outliers) <- colnames(cef_mock_outliers)
cef_outliers <- rbind(cef_630_outliers, cef_mock_outliers)
cef_difference <- abs(((2^cef_outliers$cef_mock_metaT_reads) - 1) - ((2^cef_outliers$cef_630_metaT_reads) - 1))
cef_difference <- as.data.frame(cbind(cef_outliers$genus, cef_difference))
colnames(cef_difference) <- c('genus', 'difference')
cef_gene_count <- as.data.frame(table(cef_difference$genus))
colnames(cef_gene_count) <- c('genus', 'count')
cef_difference <- aggregate(. ~ genus, data=cef_difference, FUN=sum)
cef_difference <- merge(cef_difference, cef_gene_count, by='genus')
cef_difference$difference <- cef_difference$difference / cef_difference$count
cef_difference$count <- NULL
rm(cef_gene_count, cef_630_outliers, cef_mock_outliers)
cef_difference$difference <- log2(as.numeric(as.character(cef_difference$difference)) + 1)
cef_genus_diff <- merge(cef_genus, cef_difference, by.x='row.names', by.y='genus')
rownames(cef_genus_diff) <- cef_genus_diff$Row.names
cef_genus_diff$Row.names <- NULL
colnames(cef_genus_diff) <- c('relAbund','transcriptChange')
rm(cef_difference, cef_genus)
colnames(clinda_630_outliers) <- colnames(clinda_mock_outliers)
clinda_outliers <- rbind(clinda_630_outliers, clinda_mock_outliers)
clinda_difference <- abs(((2^clinda_outliers$clinda_mock_metaT_reads) - 1) - ((2^clinda_outliers$clinda_630_metaT_reads) - 1))
clinda_difference <- as.data.frame(cbind(clinda_outliers$genus, clinda_difference))
colnames(clinda_difference) <- c('genus', 'difference')
clinda_gene_count <- as.data.frame(table(clinda_difference$genus))
colnames(clinda_gene_count) <- c('genus', 'count')
clinda_difference <- aggregate(. ~ genus, data=clinda_difference, FUN=sum)
clinda_difference <- merge(clinda_difference, clinda_gene_count, by='genus')
clinda_difference$difference <- clinda_difference$difference / clinda_difference$count
clinda_difference$count <- NULL
rm(clinda_gene_count, clinda_630_outliers, clinda_mock_outliers)
clinda_difference$difference <- log2(as.numeric(as.character(clinda_difference$difference)) + 1)
clinda_genus_diff <- merge(clinda_genus, clinda_difference, by.x='row.names', by.y='genus')
rownames(clinda_genus_diff) <- clinda_genus_diff$Row.names
clinda_genus_diff$Row.names <- NULL
colnames(clinda_genus_diff) <- c('relAbund','transcriptChange')
rm(clinda_difference, clinda_genus)
rm(strep_outliers, cef_outliers, clinda_outliers)

# Spearman correlation of 16S and metatransctiptomic change
strep_16S_r <- as.character(round(cor.test(strep_genus_diff$transcriptChange, strep_genus_diff$relAbund, method='spearman', exact=FALSE)$estimate, 3))
strep_16S_p <- as.character(round(cor.test(strep_genus_diff$transcriptChange, strep_genus_diff$relAbund, method='spearman', exact=FALSE)$p.value, 3))
cef_16S_r <- as.character(round(cor.test(cef_genus_diff$transcriptChange, cef_genus_diff$relAbund, method='spearman', exact=FALSE)$estimate, 3))
cef_16S_p <- as.character(round(cor.test(cef_genus_diff$transcriptChange, cef_genus_diff$relAbund, method='spearman', exact=FALSE)$p.value, 3))
clinda_16S_r <- as.character(round(cor.test(clinda_genus_diff$transcriptChange, clinda_genus_diff$relAbund, method='spearman', exact=FALSE)$estimate, 3))
clinda_16S_p <- as.character(round(cor.test(clinda_genus_diff$transcriptChange, clinda_genus_diff$relAbund, method='spearman', exact=FALSE)$p.value, 3))
rm(strep_genus_diff, cef_genus_diff, clinda_genus_diff)

```

#### Infection corresponded with larger shifts in the metatranscriptomes of communities that allowed sustained *C. difficile* colonization.
Despite the strong associations between bacterial community structure and the metabolome with colonization resistance, it was difficult to associate specific populations with changes in those metabolites that were associated with the duration of infection. To gain a more specific understanding of how the microbiota or *C. difficile* shaped the metabolic environment, we employed parallel metagenomic and metatranscriptomic shotgun sequencing of the samples collected from the cecal content of the mice used in the previous analyses. To achieve usable concentrations of bacterial mRNA after rRNA depletion, we had to pool the samples within each treatment and infection group. To establish confidence in the results of a pooled analysis, we calculated within-group sample variance among replicates using CFU, OTU relative abundance, and metabolomic relative abundance data (Table S3). These analyses revealed low levels of variance within control and experimental groups. Following sequencing, metagenomic reads from mock-infected cecal communities were assembled *de novo* into contigs and putative genes were identified resulting in 234,868 (streptomycin), 83,534 (cefoperazone), and 35,681 (clindamycin) open reading frames in each metagenome. Of these putative genes, 28.5% could be annotated to a known function based on the KEGG database, and many of these annotations were homologs to genes in species that were found in our dataset. Streptomycin pretreatment resulted in a significantly more diverse community than other groups based on 16S rRNA gene sequence data, so a more diverse metagenome was expected (Table S1). Supporting this prediction, 2408 unique functionally annotated genes were detected in the streptomycin pretreatment metagenome, at least 1163 more genes than were found in either the cefoperazone or clindamycin metagenomes (Fig. S4A-D). Metagenome-enabled mapping of the metatranscriptomic reads revealed that we were able to obtain informative depths of sequencing from across the metagenomic libraries (Fig. S4E-F). As expected, genes with any detectable transcript in any metatranscriptome were a subset of their corresponding metagenome. Metatranscriptomic read abundances were normalized to corresponding metagenomic coverage per gene to normalize for the abundance of the contributing bacterial taxa. This step was followed by a final subsampling of reads from each conditions to control for uneven sequencing effort and to identify genes with the largest changes in transcription relative to uninfected animals.

We hypothesized that the degree of change in the metatranscriptome corresponding with *C. difficile* colonization would reflect the shifts seen at in the metabolome. As disparate bacterial taxa possess vastly different metabolic capabilities and the antibiotic pretreatments induced distinct species profiles in each community, we tested our hypothesis by delineating the transcriptomic contributions of separate bacterial taxa within each metatranscriptome. Since many genes lack a specific functional annotation in KEGG but retain general taxonomic information, we continued the analysis at the genus level of classification for all genes contributed to each metagenome. Using this approach, we directly compared the normalized transcript abundances for each gene between infected and uninfected states for each antibiotic pretreatment and calculated the Spearman correlation to identify distinct patterns of transcription (Fig. 3). This resulted in `r strep_outs` genes that had an average distance from the center of `r strep_mean_dist` associated with streptomycin-pretreatment, `r cef_outs` genes at an average distance of `r cef_mean_dist` in cefoperazone-pretreatment, and only `r clinda_outs` genes at an average distance of `r clinda_mean_dist` with clindamycin-pretreatment. Overall, the clindamycin pretreatment was associated with the fewest transcription outliers between uninfected and infection conditions compared with those of the other antibiotic groups. This suggested that the degree to which the metatranscriptome was altered by infection corresponded to prolonged colonization.

This analysis also revealed that outlier genes originated in underrepresented genera. In streptomycin-pretreated mice, 937 genes belonging to *Lactobacillus* were increased in transcription during *C. difficile* infection; where *Lactobacillus* accounted for `r strep_lacto`% of the 16S rRNA gene sequences (Fig. 3A). In cefoperazone-pretreated mice, 2290 genes belonging to *Bacteroides* had lower transcription during *C. difficile* infection; *Bacteroides* accounted for `r cef_bacter`% of the 16S rRNA gene sequences (Fig. 3B). A consistent trend in streptomycin and cefoperazone-pretreated mice was an overrepresentation of highly transcribed genes from genera belonging to *Bacteroidetes* during mock infection. The metatransciptomes among mice from both of these pretreatment conditions poorly correlated between mock and infected conditions, indicating a high degree of change induced by *C. difficile* colonization (*R* = `r strep_corr` & *R* = `r cef_corr`). In clindamycin-pretreated mice the largest difference in transcription was for 510 *Lactobacillus* genes with increased transcription during CDI; *Lactobacillus* accounted for `r clinda_lacto`% of the 16S rRNA gene sequences (Fig. 3C). Infected and uninfected metatranscriptomes from mice pretreated with clindamycin were more strongly correlated with each other than either of the other pretreatments (*R* = `r clinda_corr`). This suggests that although *C. difficile* altered the streptomycin and cefoperazone-pretreated communities in which it was able to remain stably colonized, it had minimal impact on the clindamycin-pretreated community in which it was not able to remain colonized.


```{r figure_4, echo=F, warning=F, message=F, results='hide', cache=F}

# Metadata
metadata <- read.delim('data/metadata.tsv', sep='\t', header=TRUE, row.names=1)

# Normalized Metatranscriptomes
cef_normalized_reads <- read.delim('data/read_mapping/cef_normalized_metaT.tsv', sep='\t', header=TRUE, row.names=7)
clinda_normalized_reads <- read.delim('data/read_mapping/clinda_normalized_metaT.tsv', sep='\t', header=TRUE, row.names=7)
strep_normalized_reads <- read.delim('data/read_mapping/strep_normalized_metaT.tsv', sep='\t', header=TRUE, row.names=7)

# KEGG organism file
kegg_tax <- read.delim('data/kegg/kegg_taxonomy.tsv', sep='\t', header=TRUE)
kegg_tax[] <- lapply(kegg_tax, as.character)

# 16S data
genus_shared <- read.delim('data/16S_analysis/all_treatments.0.03.unique_list.0.03.filter.0.03.subsample.shared', sep='\t', header=TRUE, row.names=2)
genus_shared$label <- NULL
genus_shared$numOtus <- NULL
genus_tax <- read.delim('data/16S_analysis/all_treatments.genus.final.taxonomy', sep='\t', header=TRUE)
rownames(genus_tax) <- genus_tax$OTU
genus_tax$OTU <- NULL

# Taxonomy colors
tax_colors <- read.delim('data/taxonomy_color.tsv', sep='\t', header=TRUE)
tax_colors[] <- lapply(tax_colors, as.character)

# Format transcription
cef_normalized_reads$gene <- NULL
cef_normalized_reads[,c(1:2)] <- log2(cef_normalized_reads[,c(1:2)] + 1)
clinda_normalized_reads$gene <- NULL
clinda_normalized_reads[,c(1:2)] <- log2(clinda_normalized_reads[,c(1:2)] + 1)
strep_normalized_reads$gene <- NULL
strep_normalized_reads[,c(1:2)] <- log2(strep_normalized_reads[,c(1:2)] + 1)

# Remove those genes without an organism annotation
cef_annotated <- cef_normalized_reads[!is.na(cef_normalized_reads$organism),]
clinda_annotated <- clinda_normalized_reads[!is.na(clinda_normalized_reads$organism),]
strep_annotated <- strep_normalized_reads[!is.na(strep_normalized_reads$organism),]

# Remove any C. difficile genes with transcription only in infected
cdiff_omit <- c('cdf','pdc','cdc','cdl','pdf')
cef_annotated <- subset(cef_annotated, !(cef_annotated$organism %in% cdiff_omit))
clinda_annotated <- subset(clinda_annotated, !(clinda_annotated$organism %in% cdiff_omit))
strep_annotated <- subset(strep_annotated, !(strep_annotated$organism %in% cdiff_omit))
rm(cdiff_omit)

# Remove all possible mammalian genes
mamm_omit <- c('fab','cfa','ggo','hgl','hsa','mcc','mdo','pon','aml',
               'ptr','rno','shr','ssc','aml','bta','cge','ecb',
               'pps','fca','mmu','oaa','gga','ola','acs','aga')
strep_annotated <- subset(strep_annotated, !(strep_annotated$organism %in% mamm_omit))
cef_annotated <- subset(cef_annotated, !(cef_annotated$organism %in% mamm_omit))
clinda_annotated <- subset(clinda_annotated, !(clinda_annotated$organism %in% mamm_omit))
rm(mamm_omit)

# Calculate correlation coefficients
strep_corr <- as.character(round(cor.test(strep_annotated[,2], strep_annotated[,1], method='spearman', exact=FALSE)$estimate, digits=3))
cef_corr <- as.character(round(cor.test(cef_annotated[,2], cef_annotated[,1], method='spearman', exact=FALSE)$estimate, digits=3))
clinda_corr <- as.character(round(cor.test(clinda_annotated[,2], clinda_annotated[,1], method='spearman', exact=FALSE)$estimate, digits=3))

# Collate transcript abundances at genus-level
genus_shared <- t(genus_shared)
genus_tax_shared <- merge(genus_tax, genus_shared, by='row.names')
genus_tax_shared$Row.names <- NULL
genus_tax_agg_shared <- aggregate(. ~ Genus, data=genus_tax_shared, FUN=sum)
rownames(genus_tax_agg_shared) <- genus_tax_agg_shared$Genus
genus_tax_agg_shared$Genus <- NULL
genus_tax_agg_shared <- t(genus_tax_agg_shared)
rm(genus_tax, genus_shared, genus_tax_shared)

# Combine with metadata and collate within treatment
genus_tax_agg_metadata_shared <- merge(metadata, genus_tax_agg_shared, by='row.names')
genus_tax_agg_metadata_shared$Row.names <- NULL
genus_tax_agg_metadata_shared$cage <- NULL
genus_tax_agg_metadata_shared$mouse <- NULL
genus_tax_agg_metadata_shared$gender <- NULL
genus_tax_agg_metadata_shared$type <- NULL
genus_tax_agg_metadata_shared$susceptibility <- NULL
genus_tax_agg_metadata_shared$infection <- NULL
genus_tax_agg_metadata_shared <- subset(genus_tax_agg_metadata_shared, genus_tax_agg_metadata_shared$abx %in% c('cefoperazone','clindamycin','streptomycin'))
genus_tax_agg_metadata_shared <- aggregate(. ~ abx, data=genus_tax_agg_metadata_shared, FUN=median)
rownames(genus_tax_agg_metadata_shared) <- genus_tax_agg_metadata_shared$abx
genus_tax_agg_metadata_shared$abx <- NULL
genus_tax_agg_metadata_shared <- (genus_tax_agg_metadata_shared / rowSums(genus_tax_agg_metadata_shared)) * 100
genus_tax_agg_metadata_shared <- as.data.frame(t(genus_tax_agg_metadata_shared))
cef_genus <- genus_tax_agg_metadata_shared
cef_genus$clindamycin <- NULL
cef_genus$streptomycin <- NULL
clinda_genus <- genus_tax_agg_metadata_shared
clinda_genus$cefoperazone <- NULL
clinda_genus$streptomycin <- NULL
strep_genus <- genus_tax_agg_metadata_shared
strep_genus$clindamycin <- NULL
strep_genus$cefoperazone <- NULL
rm(genus_tax_agg_shared, genus_tax_agg_metadata_shared)

# Using previously defined lines, find outliers to y = x
strep_630_outliers <- subset(strep_annotated, strep_annotated$strep_630_metaT_reads > strep_annotated$strep_mock_metaT_reads + 2)
strep_mock_outliers <- subset(strep_annotated, strep_annotated$strep_mock_metaT_reads > strep_annotated$strep_630_metaT_reads + 2)
cef_630_outliers <- subset(cef_annotated, cef_annotated$cef_630_metaT_reads > cef_annotated$cef_mock_metaT_reads + 2)
cef_mock_outliers <- subset(cef_annotated, cef_annotated$cef_mock_metaT_reads > cef_annotated$cef_630_metaT_reads + 2)
clinda_630_outliers <- clinda_annotated[(clinda_annotated$clinda_630_metaT_reads > clinda_annotated$clinda_mock_metaT_reads + 2),]
clinda_mock_outliers <- clinda_annotated[(clinda_annotated$clinda_mock_metaT_reads > clinda_annotated$clinda_630_metaT_reads + 2),]

# Remove outliers from the rest of the genes
strep_annotated <- strep_annotated[!row.names(strep_annotated) %in% row.names(strep_630_outliers), ]
strep_annotated <- strep_annotated[!row.names(strep_annotated) %in% row.names(strep_mock_outliers), ]
cef_annotated <- cef_annotated[!row.names(cef_annotated) %in% row.names(cef_630_outliers), ]
cef_annotated <- cef_annotated[!row.names(cef_annotated) %in% row.names(cef_mock_outliers), ]
clinda_annotated <- clinda_annotated[!row.names(clinda_annotated) %in% row.names(clinda_630_outliers), ]
clinda_annotated <- clinda_annotated[!row.names(clinda_annotated) %in% row.names(clinda_mock_outliers), ]

# Calculate mean distance of outliers to x=y
dists_630 <- c()
for (i in 1:nrow(strep_630_outliers)){
  dists_630[i] <- dist_xy(c(strep_630_outliers$strep_630_metaT_reads[i], strep_630_outliers$strep_mock_metaT_reads[i]))
}
dists_mock <- c()
for (i in 1:nrow(strep_mock_outliers)){
  dists_mock[i] <- dist_xy(c(strep_mock_outliers$strep_630_metaT_reads[i], strep_mock_outliers$strep_mock_metaT_reads[i]))
}
round(mean(c(dists_630, dists_mock)), 3)
as.numeric(length(c(dists_630, dists_mock)))
dists_630 <- c()
for (i in 1:nrow(cef_630_outliers)){
  dists_630[i] <- dist_xy(c(cef_630_outliers$cef_630_metaT_reads[i], cef_630_outliers$cef_mock_metaT_reads[i]))
}
dists_mock <- c()
for (i in 1:nrow(cef_mock_outliers)){
  dists_mock[i] <- dist_xy(c(cef_mock_outliers$cef_630_metaT_reads[i], cef_mock_outliers$cef_mock_metaT_reads[i]))
}
round(mean(c(dists_630, dists_mock)), 3)
as.numeric(length(c(dists_630, dists_mock)))
dists_630 <- c()
for (i in 1:nrow(clinda_630_outliers)){
  dists_630[i] <- dist_xy(c(clinda_630_outliers$clinda_630_metaT_reads[i], clinda_630_outliers$clinda_mock_metaT_reads[i]))
}
dists_mock <- c()
for (i in 1:nrow(clinda_mock_outliers)){
  dists_mock[i] <- dist_xy(c(clinda_mock_outliers$clinda_630_metaT_reads[i], clinda_mock_outliers$clinda_mock_metaT_reads[i]))
}
round(mean(c(dists_630, dists_mock)), 3)
as.numeric(length(c(dists_630, dists_mock)))
rm(dists_630, dists_mock)

# Break down outliers into taxonomic groups
# Drop levels
strep_630_outliers[] <- lapply(strep_630_outliers, as.character)
strep_mock_outliers[] <- lapply(strep_mock_outliers, as.character)
cef_630_outliers[] <- lapply(cef_630_outliers, as.character)
cef_mock_outliers[] <- lapply(cef_mock_outliers, as.character)
clinda_630_outliers[] <- lapply(clinda_630_outliers, as.character)
clinda_mock_outliers[] <- lapply(clinda_mock_outliers, as.character)
strep_annotated[] <- lapply(strep_annotated, as.character)
cef_annotated[] <- lapply(cef_annotated, as.character)
clinda_annotated[] <- lapply(clinda_annotated, as.character)

# Save KEGG ID names
strep_630_outliers$kegg_id <- rownames(strep_630_outliers)
strep_mock_outliers$kegg_id <- rownames(strep_mock_outliers)
cef_630_outliers$kegg_id <- rownames(cef_630_outliers)
cef_mock_outliers$kegg_id <- rownames(cef_mock_outliers)
clinda_630_outliers$kegg_id <- rownames(clinda_630_outliers)
clinda_mock_outliers$kegg_id <- rownames(clinda_mock_outliers)
strep_annotated$kegg_id <- rownames(strep_annotated)
cef_annotated$kegg_id <- rownames(cef_annotated)
clinda_annotated$kegg_id <- rownames(clinda_annotated)

# Merge with KEGG taxonomy
strep_630_outliers <- merge(x=strep_630_outliers, y=kegg_tax, by.x='organism', by.y='org_code', all.x=TRUE)
strep_630_outliers$org_code <- NULL
strep_630_outliers$strep_630_metaT_reads <- as.numeric(strep_630_outliers$strep_630_metaT_reads)
strep_630_outliers$strep_mock_metaT_reads <- as.numeric(strep_630_outliers$strep_mock_metaT_reads)
strep_mock_outliers <- merge(x=strep_mock_outliers, y=kegg_tax, by.x='organism', by.y='org_code', all.x=TRUE)
strep_mock_outliers$org_code <- NULL
strep_mock_outliers$strep_630_metaT_reads <- as.numeric(strep_mock_outliers$strep_630_metaT_reads)
strep_mock_outliers$strep_mock_metaT_reads <- as.numeric(strep_mock_outliers$strep_mock_metaT_reads)
cef_630_outliers <- merge(x=cef_630_outliers, y=kegg_tax, by.x='organism', by.y='org_code', all.x=TRUE)
cef_630_outliers$org_code <- NULL
cef_630_outliers$cef_630_metaT_reads <- as.numeric(cef_630_outliers$cef_630_metaT_reads)
cef_630_outliers$cef_mock_metaT_reads <- as.numeric(cef_630_outliers$cef_mock_metaT_reads)
cef_mock_outliers <- merge(x=cef_mock_outliers, y=kegg_tax, by.x='organism', by.y='org_code', all.x=TRUE)
cef_mock_outliers$org_code <- NULL
cef_mock_outliers$cef_630_metaT_reads <- as.numeric(cef_mock_outliers$cef_630_metaT_reads)
cef_mock_outliers$cef_mock_metaT_reads <- as.numeric(cef_mock_outliers$cef_mock_metaT_reads)
clinda_630_outliers <- merge(x=clinda_630_outliers, y=kegg_tax, by.x='organism', by.y='org_code', all.x=TRUE)
clinda_630_outliers$org_code <- NULL
clinda_630_outliers$clinda_630_metaT_reads <- as.numeric(clinda_630_outliers$clinda_630_metaT_reads)
clinda_630_outliers$clinda_mock_metaT_reads <- as.numeric(clinda_630_outliers$clinda_mock_metaT_reads)
clinda_mock_outliers <- merge(x=clinda_mock_outliers, y=kegg_tax, by.x='organism', by.y='org_code', all.x=TRUE)
clinda_mock_outliers$org_code <- NULL
clinda_mock_outliers$clinda_630_metaT_reads <- as.numeric(clinda_mock_outliers$clinda_630_metaT_reads)
clinda_mock_outliers$clinda_mock_metaT_reads <- as.numeric(clinda_mock_outliers$clinda_mock_metaT_reads)
strep_annotated <- merge(x=strep_annotated, y=kegg_tax, by.x='organism', by.y='org_code', all.x=TRUE)
strep_annotated$org_code <- NULL
strep_annotated$strep_630_metaT_reads <- as.numeric(strep_annotated$strep_630_metaT_reads)
strep_annotated$strep_mock_metaT_reads <- as.numeric(strep_annotated$strep_mock_metaT_reads)
cef_annotated <- merge(x=cef_annotated, y=kegg_tax, by.x='organism', by.y='org_code', all.x=TRUE)
cef_annotated$org_code <- NULL
cef_annotated$cef_630_metaT_reads <- as.numeric(cef_annotated$cef_630_metaT_reads)
cef_annotated$cef_mock_metaT_reads <- as.numeric(cef_annotated$cef_mock_metaT_reads)
clinda_annotated <- merge(x=clinda_annotated, y=kegg_tax, by.x='organism', by.y='org_code', all.x=TRUE)
clinda_annotated$org_code <- NULL
clinda_annotated$clinda_630_metaT_reads <- as.numeric(clinda_annotated$clinda_630_metaT_reads)
clinda_annotated$clinda_mock_metaT_reads <- as.numeric(clinda_annotated$clinda_mock_metaT_reads)
rm(kegg_tax)

# Define colors based on genus
strep_630_outliers <- merge(x=strep_630_outliers, y=tax_colors, by.x='genus', by.y='taxonomy', all.x=TRUE)
strep_630_outliers$color[is.na(strep_630_outliers$color)] <- 'white'
strep_mock_outliers <- merge(x=strep_mock_outliers, y=tax_colors, by.x='genus', by.y='taxonomy', all.x=TRUE)
strep_mock_outliers$color[is.na(strep_mock_outliers$color)] <- 'white'
cef_630_outliers <- merge(x=cef_630_outliers, y=tax_colors, by.x='genus', by.y='taxonomy', all.x=TRUE)
cef_630_outliers$color[is.na(cef_630_outliers$color)] <- 'white'
cef_mock_outliers <- merge(x=cef_mock_outliers, y=tax_colors, by.x='genus', by.y='taxonomy', all.x=TRUE)
cef_mock_outliers$color[is.na(cef_mock_outliers$color)] <- 'white'
clinda_630_outliers <- merge(x=clinda_630_outliers, y=tax_colors, by.x='genus', by.y='taxonomy', all.x=TRUE)
clinda_630_outliers$color[is.na(clinda_630_outliers$color)] <- 'white'
clinda_mock_outliers <- merge(x=clinda_mock_outliers, y=tax_colors, by.x='genus', by.y='taxonomy', all.x=TRUE)
clinda_mock_outliers$color[is.na(clinda_mock_outliers$color)] <- 'white'

# Find difference in expression for outliers
colnames(strep_630_outliers) <- colnames(strep_mock_outliers)
strep_outliers <- rbind(strep_630_outliers, strep_mock_outliers)
strep_difference <- abs(((2^strep_outliers$strep_mock_metaT_reads) - 1) - ((2^strep_outliers$strep_630_metaT_reads) - 1))
strep_difference <- as.data.frame(cbind(strep_outliers$genus, strep_difference))
colnames(strep_difference) <- c('genus', 'difference')
strep_gene_count <- as.data.frame(table(strep_difference$genus))
colnames(strep_gene_count) <- c('genus', 'count')
strep_difference <- aggregate(. ~ genus, data=strep_difference, FUN=sum)
strep_difference <- merge(strep_difference, strep_gene_count, by='genus')
strep_difference$difference <- strep_difference$difference / strep_difference$count
strep_difference$count <- NULL
rm(strep_gene_count)
strep_difference$difference <- log2(as.numeric(as.character(strep_difference$difference)) + 1)
strep_genus_diff <- merge(strep_genus, strep_difference, by.x='row.names', by.y='genus')
strep_extra <- subset(strep_difference, genus %in% c("Cellulosilyticum",'Butyrivibrio','Dorea',"Lactococcus","Alistipes","Odoribacter","Staphylococcus","Streptococcus","Roseburia","Ruminococcus","Porphyromonas","Faecalibacterium"))
strep_extra$streptomycin <- rep(0.001, nrow(strep_extra))
strep_extra <- strep_extra[,c(1,3,2)]
colnames(strep_extra) <- c('Row.names','streptomycin','difference')
colnames(strep_extra) <- colnames(strep_genus_diff)
strep_genus_diff <- as.data.frame(rbind(strep_genus_diff, strep_extra))
strep_genus_diff <- strep_genus_diff[match(unique(strep_genus_diff$Row.names), strep_genus_diff$Row.names),]
rownames(strep_genus_diff) <- strep_genus_diff$Row.names
strep_genus_diff$Row.names <- NULL
colnames(strep_genus_diff) <- c('relAbund','transcriptChange')
rm(strep_outliers, strep_difference, strep_genus, strep_extra)
colnames(cef_630_outliers) <- colnames(cef_mock_outliers)
cef_outliers <- rbind(cef_630_outliers, cef_mock_outliers)
cef_difference <- abs(((2^cef_outliers$cef_mock_metaT_reads) - 1) - ((2^cef_outliers$cef_630_metaT_reads) - 1))
cef_difference <- as.data.frame(cbind(cef_outliers$genus, cef_difference))
colnames(cef_difference) <- c('genus', 'difference')
cef_gene_count <- as.data.frame(table(cef_difference$genus))
colnames(cef_gene_count) <- c('genus', 'count')
cef_difference <- aggregate(. ~ genus, data=cef_difference, FUN=sum)
cef_difference <- merge(cef_difference, cef_gene_count, by='genus')
cef_difference$difference <- cef_difference$difference / cef_difference$count
cef_difference$count <- NULL
rm(cef_gene_count)
cef_difference$difference <- log2(as.numeric(as.character(cef_difference$difference)) + 1)
cef_genus_diff <- merge(cef_genus, cef_difference, by.x='row.names', by.y='genus')
cef_extra <- subset(cef_difference, genus %in% c('Escherichia', "Porphyromonas","Ruminococcus","Clostridium","Lactococcus","Odoribacter","Prevotella"))
cef_extra$ceftomycin <- rep(0.001, nrow(cef_extra))
cef_extra <- cef_extra[,c(1,3,2)]
colnames(cef_extra) <- c('Row.names','ceftomycin','difference')
colnames(cef_extra) <- colnames(cef_genus_diff)
cef_genus_diff <- as.data.frame(rbind(cef_genus_diff, cef_extra))
cef_genus_diff <- cef_genus_diff[match(unique(cef_genus_diff$Row.names), cef_genus_diff$Row.names),]
rownames(cef_genus_diff) <- cef_genus_diff$Row.names
cef_genus_diff$Row.names <- NULL
colnames(cef_genus_diff) <- c('relAbund','transcriptChange')
rm(cef_outliers, cef_difference, cef_genus)
colnames(clinda_630_outliers) <- colnames(clinda_mock_outliers)
clinda_outliers <- rbind(clinda_630_outliers, clinda_mock_outliers)
clinda_difference <- abs(((2^clinda_outliers$clinda_mock_metaT_reads) - 1) - ((2^clinda_outliers$clinda_630_metaT_reads) - 1))
clinda_difference <- as.data.frame(cbind(clinda_outliers$genus, clinda_difference))
colnames(clinda_difference) <- c('genus', 'difference')
clinda_gene_count <- as.data.frame(table(clinda_difference$genus))
colnames(clinda_gene_count) <- c('genus', 'count')
clinda_difference <- aggregate(. ~ genus, data=clinda_difference, FUN=sum)
clinda_difference <- merge(clinda_difference, clinda_gene_count, by='genus')
clinda_difference$difference <- clinda_difference$difference / clinda_difference$count
clinda_difference$count <- NULL
rm(clinda_gene_count)
clinda_difference$difference <- log2(as.numeric(as.character(clinda_difference$difference)) + 1)
clinda_genus_diff <- merge(clinda_genus, clinda_difference, by.x='row.names', by.y='genus')
clinda_extra <- subset(clinda_difference, genus %in% c("Cellulosilyticum","Unclassified_Clostridiales","Clostridium","Lactococcus",'Butyrivibrio','Dorea','Roseburia','Bacteroides',"Prevotella","Ruminococcus","Porphyromonas","Faecalibacterium"))
clinda_extra$clindatomycin <- rep(0.001, nrow(clinda_extra))
clinda_extra <- clinda_extra[,c(1,3,2)]
colnames(clinda_extra) <- c('Row.names','clindatomycin','difference')
colnames(clinda_extra) <- colnames(clinda_genus_diff)
clinda_genus_diff <- as.data.frame(rbind(clinda_genus_diff, clinda_extra))
clinda_genus_diff <- clinda_genus_diff[match(unique(clinda_genus_diff$Row.names), clinda_genus_diff$Row.names),]
rownames(clinda_genus_diff) <- clinda_genus_diff$Row.names
clinda_genus_diff$Row.names <- NULL
colnames(clinda_genus_diff) <- c('relAbund','transcriptChange')
rm(clinda_outliers, clinda_difference, clinda_genus)

# Convert relative abundance to categorical variable of ranges
cef_genus_diff_01 <- subset(cef_genus_diff, cef_genus_diff$relAbund < 0.1)
clinda_genus_diff_01 <- subset(clinda_genus_diff, clinda_genus_diff$relAbund < 0.1)
strep_genus_diff_01 <- subset(strep_genus_diff, strep_genus_diff$relAbund < 0.1)

# Richness testing
strep_genus_01_richness <- (as.numeric(length(strep_genus_diff_01$transcriptChange)) /
  as.numeric(length(strep_genus_diff$transcriptChange)))
strep_genus_01_richness <- strep_genus_01_richness * 100

cef_genus_01_richness <- as.numeric(length(cef_genus_diff_01$transcriptChange)) /
  as.numeric(length(cef_genus_diff$transcriptChange))
cef_genus_01_richness <- cef_genus_01_richness * 100

clinda_genus_01_richness <- as.numeric(length(clinda_genus_diff_01$transcriptChange)) /
  as.numeric(length(clinda_genus_diff$transcriptChange))
clinda_genus_01_richness <- clinda_genus_01_richness * 100

genus_01_richness <- median(c(strep_genus_01_richness, cef_genus_01_richness, clinda_genus_01_richness))
genus_01_richness <- as.character(round(genus_01_richness, 1))

rm(cef_genus_diff, clinda_genus_diff, strep_genus_diff,
   strep_genus_01_richness, cef_genus_01_richness, clinda_genus_01_richness)

# Statistical testing
p_vals <- p.adjust(c(t.test(x=cef_genus_diff_01$transcriptChange,
                                 y=strep_genus_diff_01$transcriptChange)$p.value,
                     t.test(x=cef_genus_diff_01$transcriptChange,
                                 y=clinda_genus_diff_01$transcriptChange)$p.value,
                     t.test(x=strep_genus_diff_01$transcriptChange,
                                 y=clinda_genus_diff_01$transcriptChange)$p.value),
                   method='BH')
cef_strep_pval <- as.character(round(p_vals[1], 3))
cef_clinda_pval <- as.character(round(p_vals[2], 3))
strep_clinda_pval <- as.character(round(p_vals[3], 3))
rm(p_vals)

if (cef_clinda_pval == '0') {
  cef_clinda_pval = '0.001'
}

```

#### Largest changes in metatranscriptomes in response to infection were concentrated in low abundance taxa of each pretreatment group.
To explore the observation that rare taxa were responsible for the largest differences in transcription in response to infection, we tabulated the absolute difference between mock and *C. difficile* infected transcriptomes for each genus in each antibiotic pretreatment. We further normalized these values for the number of genes detected in each genus to adjust for genera that were more successfully assembled or annotated and we eliminated genera where less than 50 genes were detected in the metatranscriptome. Taxa were then stratified into categories based on their relative abundance in each community from 16S rRNA gene sequencing (Fig. 4). This revealed that most change occurred among the rare genera and that the degree of change was inversely correlated with sustained colonization (same colors denoting genera as in Fig. 3). To this point, minority metatranscriptomic absolute differences were significantly reduced in clindamycin pretreatment (*p* < `r cef_clinda_pval`). Additionally, the proportion of taxa in the lowest relative abundance bracket was similar across pretreatment groups (~`r genus_01_richness`%). As a corollary, we predicted that the majority of unique genes or metabolic potential was held within this minority, and when following quantification this proved to be the case (Table S4). As a consequence, the downstream impacts on functionality may affect a disproportionately large effect on the overall environment of the intestine as a function of its collective metabolism.


```{r figure_5, echo=F, warning=F, message=F, results='hide', cache=F}

# Metadata
metadata <- read.delim('data/metadata.tsv', sep='\t', header=TRUE, row.names=1)

# Normalized Metatranscriptomes
noabx_normalized_reads <- read.delim('data/read_mapping/noabx_normalized_metaT.tsv', sep='\t', header=TRUE, row.names=6)
cef_normalized_reads <- read.delim('data/read_mapping/cef_normalized_metaT.tsv', sep='\t', header=TRUE, row.names=7)
clinda_normalized_reads <- read.delim('data/read_mapping/clinda_normalized_metaT.tsv', sep='\t', header=TRUE, row.names=7)
strep_normalized_reads <- read.delim('data/read_mapping/strep_normalized_metaT.tsv', sep='\t', header=TRUE, row.names=7)

# KEGG organism file
kegg_tax <- read.delim('data/kegg/kegg_taxonomy.tsv', sep='\t', header=TRUE)
kegg_tax[] <- lapply(kegg_tax, as.character)
kegg_tax$domain <- NULL
kegg_tax$group <- NULL
kegg_tax$species <- NULL

# 16S data
genus_shared <- read.delim('data/16S_analysis/all_treatments.0.03.unique_list.0.03.filter.0.03.subsample.shared', sep='\t', header=TRUE, row.names=2)
genus_shared$label <- NULL
genus_shared$numOtus <- NULL
genus_tax <- read.delim('data/16S_analysis/all_treatments.genus.final.taxonomy', sep='\t', header=TRUE)
rownames(genus_tax) <- genus_tax$OTU
genus_tax$OTU <- NULL

# Format 16S
# Collate 16S abundances within genera in each sample
genus_shared <- t(genus_shared)
genus_tax_shared <- clean_merge(genus_tax, genus_shared)
genus_tax_agg_shared <- aggregate(. ~ Genus, data=genus_tax_shared, FUN=sum)
rownames(genus_tax_agg_shared) <- genus_tax_agg_shared$Genus
genus_tax_agg_shared$Genus <- NULL
genus_tax_agg_shared <- as.data.frame(t(genus_tax_agg_shared))
rm(genus_tax, genus_shared, genus_tax_shared)
# Combine with metadata and collate within antibiotic treatment
genus_tax_agg_shared <- clean_merge(metadata, genus_tax_agg_shared)
genus_tax_agg_shared <- subset(genus_tax_agg_shared, abx %in% c('cefoperazone','clindamycin','streptomycin'))
genus_tax_agg_shared$cage <- NULL
genus_tax_agg_shared$mouse <- NULL
genus_tax_agg_shared$gender <- NULL
genus_tax_agg_shared$type <- NULL
genus_tax_agg_shared$susceptibility <- NULL
genus_tax_agg_shared$infection <- NULL
abx_genus_tax_agg_shared <- aggregate(. ~ abx, data=genus_tax_agg_shared, FUN=max)
rownames(abx_genus_tax_agg_shared) <- abx_genus_tax_agg_shared$abx
abx_genus_tax_agg_shared$abx <- NULL
rm(genus_tax_agg_shared, metadata)
# Filter to minority genera in each group (<1%)
abx_genus_tax_agg_shared <- abx_genus_tax_agg_shared[, which(colSums(abx_genus_tax_agg_shared) > 0)]
abx_genus_tax_agg_shared <- (abx_genus_tax_agg_shared / rowSums(abx_genus_tax_agg_shared)) * 100 # Convert to relative abundance
abx_genus_tax_agg_shared[abx_genus_tax_agg_shared > 1] <- 0 # 1 percent cutoff
minority_genera <- abx_genus_tax_agg_shared[, which(colSums(abx_genus_tax_agg_shared) != 0)]
rm(abx_genus_tax_agg_shared)
# Subset to each antibiotic treatment
minority_genera <- as.data.frame(t(minority_genera))
strep_minority_genera <- rownames(minority_genera[which(minority_genera$streptomycin > 0),])
cef_minority_genera <- rownames(minority_genera[which(minority_genera$cefoperazone > 0),])
clinda_minority_genera <- rownames(minority_genera[which(minority_genera$clindamycin > 0),])
rm(minority_genera)

# Format transcription
# Rarefy to the same levels
sub_level <- min(sum(round(noabx_normalized_reads$noabx_mock_metaT_reads)),
                 sum(round(strep_normalized_reads$strep_630_metaT_reads)),
                 sum(round(strep_normalized_reads$strep_mock_metaT_reads)),
                 sum(round(cef_normalized_reads$cef_630_metaT_reads)),
                 sum(round(cef_normalized_reads$cef_mock_metaT_reads)),
                 sum(round(clinda_normalized_reads$clinda_630_metaT_reads)),
                 sum(round(clinda_normalized_reads$clinda_mock_metaT_reads)))
noabx_normalized_reads$noabx_mock_metaT_reads <- as.vector(rrarefy(round(noabx_normalized_reads$noabx_mock_metaT_reads), sample=sub_level))
strep_normalized_reads$strep_630_metaT_reads <- as.vector(rrarefy(round(strep_normalized_reads$strep_630_metaT_reads), sample=sub_level))
strep_normalized_reads$strep_mock_metaT_reads <- as.vector(rrarefy(round(strep_normalized_reads$strep_mock_metaT_reads), sample=sub_level))
cef_normalized_reads$cef_630_metaT_reads <- as.vector(rrarefy(round(cef_normalized_reads$cef_630_metaT_reads), sample=sub_level))
cef_normalized_reads$cef_mock_metaT_reads <- as.vector(rrarefy(round(cef_normalized_reads$cef_mock_metaT_reads), sample=sub_level))
clinda_normalized_reads$clinda_630_metaT_reads <- as.vector(rrarefy(round(clinda_normalized_reads$clinda_630_metaT_reads), sample=sub_level))
clinda_normalized_reads$clinda_mock_metaT_reads <- as.vector(rrarefy(round(clinda_normalized_reads$clinda_mock_metaT_reads), sample=sub_level))
rm(sub_level)
# Remove those genes without an organism annotation
noabx_annotated <- noabx_normalized_reads[!is.na(noabx_normalized_reads$organism),]
cef_annotated <- cef_normalized_reads[!is.na(cef_normalized_reads$organism),]
clinda_annotated <- clinda_normalized_reads[!is.na(clinda_normalized_reads$organism),]
strep_annotated <- strep_normalized_reads[!is.na(strep_normalized_reads$organism),]
rm(cef_normalized_reads, clinda_normalized_reads, strep_normalized_reads, noabx_normalized_reads)
# Remove any C. difficile genes with transcription only in infected
cdiff_omit <- c('cdf','pdc','cdc','cdl','pdf')
noabx_annotated <- subset(noabx_annotated, !(noabx_annotated$organism %in% cdiff_omit))
cef_annotated <- subset(cef_annotated, !(cef_annotated$organism %in% cdiff_omit))
clinda_annotated <- subset(clinda_annotated, !(clinda_annotated$organism %in% cdiff_omit))
strep_annotated <- subset(strep_annotated, !(strep_annotated$organism %in% cdiff_omit))
rm(cdiff_omit)
# Remove non-microbial genes
mamm_omit <- c('fab','cfa','ggo','hgl','hsa','mcc','mdo','pon','aml',
               'ptr','rno','shr','ssc','aml','bta','cge','ecb',
               'pps','fca','mmu','oaa','gga','ola','acs','aga')
noabx_annotated <- subset(noabx_annotated, !(noabx_annotated$organism %in% mamm_omit))
noabx_annotated <- subset(noabx_annotated, noabx_annotated$pathways != 'Naphthalene_degradation:Xenobiotics_biodegradation_and_metabolism')
strep_annotated <- subset(strep_annotated, !(strep_annotated$organism %in% mamm_omit))
strep_annotated <- subset(strep_annotated, strep_annotated$pathways != 'Naphthalene_degradation:Xenobiotics_biodegradation_and_metabolism')
cef_annotated <- subset(cef_annotated, !(cef_annotated$organism %in% mamm_omit))
cef_annotated <- subset(cef_annotated, cef_annotated$pathways != 'Naphthalene_degradation:Xenobiotics_biodegradation_and_metabolism')
clinda_annotated <- subset(clinda_annotated, !(clinda_annotated$organism %in% mamm_omit))
clinda_annotated <- subset(clinda_annotated, clinda_annotated$pathways != 'Naphthalene_degradation:Xenobiotics_biodegradation_and_metabolism')
rm(mamm_omit)
# Merge with genus level taxonomic classifications
noabx_annotated$gene <- rownames(noabx_annotated)
noabx_annotated <- merge(kegg_tax, noabx_annotated, by.x='org_code', by.y='organism', all.y=TRUE)
strep_annotated$gene <- rownames(strep_annotated)
strep_annotated <- merge(kegg_tax, strep_annotated, by.x='org_code', by.y='organism', all.y=TRUE)
cef_annotated$gene <- rownames(cef_annotated)
cef_annotated <- merge(kegg_tax, cef_annotated, by.x='org_code', by.y='organism', all.y=TRUE)
clinda_annotated$gene <- rownames(clinda_annotated)
clinda_annotated <- merge(kegg_tax, clinda_annotated, by.x='org_code', by.y='organism', all.y=TRUE)
rm(kegg_tax)
# Format untreated control
noabx_annotated$pathways <- vapply(strsplit(as.character(noabx_annotated$pathways),':'), `[`, 1, FUN.VALUE=character(1))
noabx_annotated$pathways <- gsub('_', ' ', noabx_annotated$pathways)
noabx_annotated$org_code <- NULL
noabx_annotated$gene <- NULL
noabx_annotated$ko <- NULL

# Find transcription levels of minority taxa
# Curate minority taxa that are at higher taxonomic level than genus
strep_minority_genera <- c(strep_minority_genera,"Cellulosilyticum",'Butyrivibrio','Dorea',"Lactococcus","Alistipes","Odoribacter","Staphylococcus","Streptococcus","Roseburia","Ruminococcus","Porphyromonas","Faecalibacterium")
strep_minority_genera <- unique(strep_minority_genera)
cef_minority_genera <- gsub('Shigella', 'Escherichia', cef_minority_genera)
cef_minority_genera <- c(cef_minority_genera, "Porphyromonas","Ruminococcus","Clostridium","Lactococcus","Odoribacter","Prevotella")
cef_minority_genera <- unique(cef_minority_genera)
clinda_minority_genera <- c(clinda_minority_genera,"Cellulosilyticum","Unclassified_Clostridiales","Clostridium","Lactococcus",'Butyrivibrio','Dorea','Roseburia','Bacteroides',"Prevotella","Ruminococcus","Porphyromonas","Faecalibacterium")
clinda_minority_genera <- unique(clinda_minority_genera)
# Subset untreated metatranscriptome to each treatment minority by 16S
noabx_strep_minority <- subset(noabx_annotated, genus %in% strep_minority_genera)
noabx_strep_minority$genus <- NULL
noabx_strep_minority <- noabx_strep_minority[complete.cases(noabx_strep_minority), ]
noabx_cef_minority <- subset(noabx_annotated, genus %in% cef_minority_genera)
noabx_cef_minority$genus <- NULL
noabx_cef_minority <- noabx_cef_minority[complete.cases(noabx_cef_minority), ]
noabx_clinda_minority <- subset(noabx_annotated, genus %in% clinda_minority_genera)
noabx_clinda_minority$genus <- NULL
noabx_clinda_minority <- noabx_clinda_minority[complete.cases(noabx_clinda_minority), ]
rm(noabx_annotated)
# Subset by minority genera from 16S analysis
strep_minority <- subset(strep_annotated, genus %in% strep_minority_genera)
strep_minority <- strep_minority[complete.cases(strep_minority), ]
cef_minority <- subset(cef_annotated, genus %in% cef_minority_genera)
cef_minority <- cef_minority[complete.cases(cef_minority), ]
clinda_minority <- subset(clinda_annotated, genus %in% clinda_minority_genera)
clinda_minority <- clinda_minority[complete.cases(clinda_minority), ]
rm(strep_annotated, cef_annotated, clinda_annotated,
   strep_minority_genera, cef_minority_genera, clinda_minority_genera)
# Reformat columns
strep_minority$pathways <- vapply(strsplit(as.character(strep_minority$pathways),':'), `[`, 1, FUN.VALUE=character(1))
strep_minority$pathways <- gsub('_', ' ', strep_minority$pathways)
strep_minority$org_code <- NULL
strep_minority$ko <- NULL
strep_minority$genus <- NULL
strep_minority$gene <- NULL
cef_minority$pathways <- vapply(strsplit(as.character(cef_minority$pathways),':'), `[`, 1, FUN.VALUE=character(1))
cef_minority$pathways <- gsub('_', ' ', cef_minority$pathways)
cef_minority$org_code <- NULL
cef_minority$ko <- NULL
cef_minority$genus <- NULL
cef_minority$gene <- NULL
clinda_minority$pathways <- vapply(strsplit(as.character(clinda_minority$pathways),':'), `[`, 1, FUN.VALUE=character(1))
clinda_minority$pathways <- gsub('_', ' ', clinda_minority$pathways)
clinda_minority$org_code <- NULL
clinda_minority$ko <- NULL
clinda_minority$genus <- NULL
clinda_minority$gene <- NULL
# Remove uninformative pathways and genes
noabx_strep_minority <- subset(noabx_strep_minority, !noabx_strep_minority$pathways %in% c('Ribosome','Metabolic pathways'))
noabx_cef_minority <- subset(noabx_cef_minority, !noabx_cef_minority$pathways %in% c('Ribosome','Metabolic pathways'))
noabx_clinda_minority <- subset(noabx_clinda_minority, !noabx_clinda_minority$pathways %in% c('Ribosome','Metabolic pathways'))
strep_minority <- subset(strep_minority, !strep_minority$pathways %in% c('Ribosome','Metabolic pathways'))
cef_minority <- subset(cef_minority, !cef_minority$pathways %in% c('Ribosome','Metabolic pathways'))
clinda_minority <- subset(clinda_minority, !clinda_minority$pathways %in% c('Ribosome','Metabolic pathways'))
noabx_strep_minority <- subset(noabx_strep_minority, !noabx_strep_minority$description %in% c('hypothetical_protein','putative'))
noabx_cef_minority <- subset(noabx_cef_minority, !noabx_cef_minority$description %in% c('hypothetical_protein','putative'))
noabx_clinda_minority <- subset(noabx_clinda_minority, !noabx_clinda_minority$description %in% c('hypothetical_protein','putative'))
strep_minority <- subset(strep_minority, !strep_minority$description %in% c('hypothetical_protein','putative'))
cef_minority <- subset(cef_minority, !cef_minority$description %in% c('hypothetical_protein','putative'))
clinda_minority <- subset(clinda_minority, !clinda_minority$description %in% c('hypothetical_protein','putative'))
# Aggregate each treatment by pathways or genes
# Pathways
noabx_strep_minority_pathways <- aggregate(. ~ pathways, data=noabx_strep_minority, FUN=sum)
noabx_strep_minority_pathways$description <- NULL
noabx_cef_minority_pathways <- aggregate(. ~ pathways, data=noabx_cef_minority, FUN=sum)
noabx_cef_minority_pathways$description <- NULL
noabx_clinda_minority_pathways <- aggregate(. ~ pathways, data=noabx_clinda_minority, FUN=sum)
noabx_clinda_minority_pathways$description <- NULL
strep_minority_pathways <- aggregate(. ~ pathways, data=strep_minority, FUN=sum)
strep_minority_pathways$description <- NULL
cef_minority_pathways <- aggregate(. ~ pathways, data=cef_minority, FUN=sum)
cef_minority_pathways$description <- NULL
clinda_minority_pathways <- aggregate(. ~ pathways, data=clinda_minority, FUN=sum)
clinda_minority_pathways$description <- NULL
# Genes
noabx_strep_minority$pathways <- NULL
noabx_strep_minority_genes <- aggregate(. ~ description, data=noabx_strep_minority, FUN=sum)
noabx_cef_minority$pathways <- NULL
noabx_cef_minority_genes <- aggregate(. ~ description, data=noabx_cef_minority, FUN=sum)
noabx_clinda_minority$pathways <- NULL
noabx_clinda_minority_genes <- aggregate(. ~ description, data=noabx_clinda_minority, FUN=sum)
strep_minority$pathways <- NULL
strep_minority_genes <- aggregate(. ~ description, data=strep_minority, FUN=sum)
cef_minority$pathways <- NULL
cef_minority_genes <- aggregate(. ~ description, data=cef_minority, FUN=sum)
clinda_minority$pathways <- NULL
clinda_minority_genes <- aggregate(. ~ description, data=clinda_minority, FUN=sum)
rm(clinda_minority, cef_minority, strep_minority,
   noabx_clinda_minority, noabx_cef_minority, noabx_strep_minority)
# Merge treatment and control minority groups
strep_minority_pathways <- merge(strep_minority_pathways, noabx_strep_minority_pathways, by='pathways')
cef_minority_pathways <- merge(cef_minority_pathways, noabx_cef_minority_pathways, by='pathways')
clinda_minority_pathways <- merge(clinda_minority_pathways, noabx_clinda_minority_pathways, by='pathways')
strep_minority_genes <- merge(strep_minority_genes, noabx_strep_minority_genes, by='description')
cef_minority_genes <- merge(cef_minority_genes, noabx_cef_minority_genes, by='description')
clinda_minority_genes <- merge(clinda_minority_genes, noabx_clinda_minority_genes, by='description')
rm(noabx_strep_minority_pathways, noabx_cef_minority_pathways, noabx_clinda_minority_pathways,
   noabx_strep_minority_genes, noabx_cef_minority_genes, noabx_clinda_minority_genes)
# Set row names
rownames(strep_minority_pathways) <- strep_minority_pathways$pathways
strep_minority_pathways$pathways <- NULL
colnames(strep_minority_pathways) <- c('infected','mock','resistant')
rownames(cef_minority_pathways) <- cef_minority_pathways$pathways
cef_minority_pathways$pathways <- NULL
colnames(cef_minority_pathways) <- c('infected','mock','resistant')
rownames(clinda_minority_pathways) <- clinda_minority_pathways$pathways
clinda_minority_pathways$pathways <- NULL
colnames(clinda_minority_pathways) <- c('infected','mock','resistant')
rownames(strep_minority_genes) <- strep_minority_genes$description
strep_minority_genes$description <- NULL
colnames(strep_minority_genes) <- c('infected','mock','resistant')
rownames(cef_minority_genes) <- cef_minority_genes$description
cef_minority_genes$description <- NULL
colnames(cef_minority_genes) <- c('infected','mock','resistant')
rownames(clinda_minority_genes) <- clinda_minority_genes$description
clinda_minority_genes$description <- NULL
colnames(clinda_minority_genes) <- c('infected','mock','resistant')

# Tanscriptomic analysis of minority taxa
# Find correct overlaps by clearance
colonized_pathways <- intersect(rownames(strep_minority_pathways), rownames(cef_minority_pathways))
colonized_genes <- intersect(rownames(strep_minority_genes), rownames(cef_minority_genes))
cleared_pathways <- rownames(clinda_minority_pathways)
cleared_genes <- rownames(clinda_minority_genes)
diff_pathways <- setdiff(colonized_pathways, cleared_pathways)
diff_pathways <- intersect(diff_pathways, colonized_pathways)
diff_genes <- setdiff(colonized_genes, cleared_genes)
diff_genes <- intersect(diff_genes, colonized_genes)
rm(colonized_pathways, cleared_pathways, colonized_genes, cleared_genes)
# Subset treatments
strep_minority_pathways <- subset(strep_minority_pathways, rownames(strep_minority_pathways) %in% diff_pathways)
cef_minority_pathways <- subset(cef_minority_pathways, rownames(cef_minority_pathways) %in% diff_pathways)
clinda_minority_pathways <- subset(clinda_minority_pathways, !rownames(clinda_minority_pathways) %in% diff_pathways)
strep_minority_genes <- subset(strep_minority_genes, rownames(strep_minority_genes) %in% diff_genes)
cef_minority_genes <- subset(cef_minority_genes, rownames(cef_minority_genes) %in% diff_genes)
clinda_minority_genes <- subset(clinda_minority_genes, !rownames(clinda_minority_genes) %in% diff_genes)
rm(diff_pathways, diff_genes)
rm(clinda_minority_pathways, cef_minority_pathways, strep_minority_pathways)

different_genes <- nrow(strep_minority_genes) + nrow(cef_minority_genes) + nrow(clinda_minority_genes)
different_genes <- as.character(different_genes)

# Find degree of change caused by infection
strep_minority_genes$abs_diff <- abs(strep_minority_genes$infected - strep_minority_genes$mock)
strep_minority_genes <- subset(strep_minority_genes, abs_diff != 0)
cef_minority_genes$abs_diff <- abs(cef_minority_genes$infected - cef_minority_genes$mock)
cef_minority_genes <- subset(cef_minority_genes, abs_diff != 0)
clinda_minority_genes$abs_diff <- abs(clinda_minority_genes$infected - clinda_minority_genes$mock)
clinda_minority_genes <- subset(clinda_minority_genes, abs_diff != 0)
# Remove genes with very little transcriptional shift
sub_level <- round(sum(strep_minority_genes$abs_diff) * 0.02)
strep_minority_genes <- subset(strep_minority_genes, strep_minority_genes$abs_diff >= sub_level)
sub_level <- round(sum(cef_minority_genes$abs_diff) * 0.02)
cef_minority_genes <- subset(cef_minority_genes, cef_minority_genes$abs_diff >= sub_level)
sub_level <- round(sum(clinda_minority_genes$abs_diff) * 0.02)
clinda_minority_genes <- subset(clinda_minority_genes, clinda_minority_genes$abs_diff >= sub_level)
rm(sub_level)

subset_different_genes <- nrow(strep_minority_genes) + nrow(cef_minority_genes) + nrow(clinda_minority_genes)
subset_different_genes <- as.character(subset_different_genes)

# Sort tables by overall expression across genes/pathways
strep_minority_genes <- strep_minority_genes[order(strep_minority_genes$abs_diff),]
strep_minority_genes$abs_diff <- NULL
cef_minority_genes <- cef_minority_genes[order(cef_minority_genes$abs_diff),]
cef_minority_genes$abs_diff <- NULL
clinda_minority_genes <- clinda_minority_genes[order(clinda_minority_genes$abs_diff),]
clinda_minority_genes$abs_diff <- NULL

# Transform data
strep_minority_genes <- log2(strep_minority_genes + 1)
cef_minority_genes <- log2(cef_minority_genes + 1)
clinda_minority_genes <- log2(clinda_minority_genes + 1)

```

#### Altered transcription within low abundance taxa favors reduced nutrient competition with *C. difficile* in communities that permitted sustained colonization.
Based on our metabolomic and metatranscriptomic results, we hypothesized that pathways with the greatest differences between mock and *C. difficile*-infected mice would be related to catabolism of metabolites that *C. difficile* could use for for growth. To assess these changes, we identified those annotated transcripts that were associated with genera that represented less than 0.1% of the community as measured with our 16S rRNA gene sequence data (Fig. 5). This resulted in the identification of `r different_genes` genes that were differentially transcribed between clindamycin-pretreated mice and the streptomycin and cefoperazone-pretreated mice. From this group of genes we filtered the collection to identify those genes that were unique to either the clindamycin-pretreated mice or the streptomycin and cefoperazone-pretreated mice. Finally, we limited our analysis to those genes that were meaningfully different between the mock and *C. difficile*-infected groups in each antibiotic pretreatment group. This resulted in `r subset_different_genes` genes from 11 pathways. These genes and pathways were primarily involved in simple carbohydrate-containing molecule acquisition/utilization (Fig. 5). Interestingly, many of these genes had decreased transcription during infection compared to mock-infected controls. At the pathway-level, many genes associated with galactose and amino sugar acquisition (both *C. difficile* growth substrates) were reduced during infection in both streptomycin and cefoperazone-pretreated mice. Conversely, pathways uniquely associated with clindamycin-pretreated communities were related to the metabolism of a diverse array of carbon sources, which may indicate ineffective competition by *C. difficile* with this community for any particular growth substrate. Our results indeed suggest that *C. difficile* colonization induces a shift transcriptional activity for a minority subset of species, possibly in an effort to segregate a desired nutrient niche, prior to the introduction of the hallmark disease phenotypes associated with CDI.


### Discussion

Our results demonstrate that distinct intestinal ecosystems are differentially impacted by *C. difficile* colonization and that these changes to community metabolism could have implications for the ability of the pathogen to persist in those environments. We had previously demonstrated that *C. difficile* spore production and toxin activity differ between these pretreatment regimes [@Jenior2017]. As both processes have been linked to environmental concentrations of specific growth nutrients [@Bouillaut2015], these results suggested that despite high initial *C. difficile* colonization the microbiomes across pretreatments may vary in available nutrients or profiles of competitors for those niches. In the current study, our mutli-omics approach demonstrated that *C. difficile* manipulated the niche landscape of the intestinal tract. Instances of active nutrient niche restructuring in the gut have been documented previously for prominent symbiotic bacterial species in gnotobiotic mice [@Mahowald2009], but not in a conventionally-reared animal model of infection following antibiotic pretreatment. Interestingly, the taxonomic groups that produced the transcripts that were most altered by *C. difficile* colonization were rare in their cecal community. Previous studies have found that rare taxonomic groups, even those at a low abundance as a result of a spontaneous perturbation, may have disproportionate effects on the metabolome of the rest of the community [@Jousset2017]. For example in temperate lakes, conditionally rare microbes were found to be far more metabolically active than highly abundant taxa [@Shade2014]. These examples of response to perturbations are interesting models for thinking about the dynamics of bacterial populations recovering from an antibiotic perturbation. As such, *C. difficile* may compete with these organisms to ultimately affect greater change to the entire ecosystem and open a long-lasting nutrient niche. While this hypothesis requires further exploration, it provides an ecological framework to study the interactions between *C. difficile* and members of susceptible communities.

This study is one of the first *in vivo* observations that a medically relevant bacterial pathogen may alter the metabolic activity of a host-associated community to promote its own colonization. This is also the first application of metatranscriptomic analysis of the gut microbiota *in vivo* and in response to a pathogen. Other groups have identified potential metabolite markers of *C. difficile* infection in patient feces [@Rojo2015], but they were not able to identify associations with changes in community metabolism that were afforded to us by our paired metabolomic and metatranscriptomic analyses. In a recent study, a tick-vectored bacterial pathogen altered the ability of the resident microbiota of the tick by interrupting proper biofilm formation and allowing lasting colonization [@Abraham2017]. It was also recently found that bacterial metabolic generalists may be more likely to actively antagonize the growth of other species in an environment that they are colonizing [@Russel2017]. We previously showed that *C. difficile* has a wide nutrient niche-space *in vivo* where it most likely utilizes its role as a metabolic generalist to colonize diverse gut microbiomes [@Jenior2017]. The ability to simultaneously antagonize the metabolism of surrounding populations in cecal environments that support persistence would explain the more significant shifts in those metatranscriptome. While we acknowledge that this study may not elucidate the specific mechanism by which this interaction occurs, the combined systems analysis strengthens each individual level of observation. When the results from these approaches are combined reveals a clearer understanding of *C. difficile*-related microbial ecology. This research lays the groundwork for a more rationale consideration of the metabolic functionalities of bacterial taxa to consider when attempting to rebuild *C. difficile* colonization resistance across differentially perturbed gut environments.

In spite of consistent results across the different methods we used in this study, several limitations should be noted. First, as with all transcriptomic studies, the relative level of mRNA detected for a given gene does not necessarily reflect the amount of functional protein made by a cell or the post-translational modifications that are required to activate the enzymes. Additionally, due to the low relative abundance of *C. difficile* in these communities, it was necessary for us to pool samples to generate a large number of reads from each group rather than sampling multiple replicates within each group. Greater transcript read abundance per gene allowed for improved survey for the activity of lowly abundant species as well as greater confidence in genes found to be highly transcribed. Although the lack of animal-based replication for the metatranscriptomic data does potentially limit the ability to generalize our results, this approach has been successfully utilized by numerous groups in the past to accurately characterize transcriptionally activity across communities of bacteria [@Sheik2014; @Franzosa2014; @Jorth2014; @Jenior2017]. Furthermore, the metatranscriptomic data were supported by the 16S rRNA gene sequence and metabolomic data which were collected from individual animals. With respect to the metabolomic data, alternative interpretations of the data also exist. For example, we assumed that metabolites, which did not change in concentration between uninfected and infected conditions were not impacted by *C. difficile* colonization. However, it is possible that the metabolism of *C. difficile* itself simply substituted for a function that was already present in the uninfected community. The insights gathered from the metatranscriptomic data suggests that this was unlikely. By leveraging multiple methods to test our hypotheses we were able to mediate the weaknesses of any individual method and present a more unified description of the system than any of the methods on their own.

Our study supports the hypothesis that the gut microbiota of healthy individuals maintains colonization resistance to *C. difficile* by outcompeting the pathogen for preferred nutrient niche space. Moreover, these data suggest that the degree to which the environment of the intestine is altered by infection may be linked to the ability of the pathogen to remain colonized. Ultimately, our results suggest that each susceptible and subsequently infected microbiome may be unique and require specific microbes or functionalities to restore colonization resistance against *C. difficile* in that specific context. Conversely, colonization resistance against *C. difficile* may be the result of contributions by distinct sub-communities of bacteria across each unique resistant gut community. Several studies have attempted to identify single bacterial species or consortia that are able to achieve colonization resistance; however, these efforts have only resulted in partially resistance [@Reeves2012; @Lawley2012; @Petrof2013; @Buffie2014]. Considering the structure and function of the microbiome is intimately connected to colonization resistance against the *C. difficile*, it has become imperative to understand the ecological factors that allow some gut environments to be persistently colonized while others are not. This research lays the groundwork for future studies to assess context dependent restoration of *C. difficile* colonization resistance and what factors are able to interfere with the ability of *C. difficile* to modify gut ecology to promote clearance.


### Materials and Methods

#### Animal care and antibiotic administration.
Briefly, approximately equal numbers of male and female conventionally-reared six-to-eight week-old C57BL/6 mice were randomly assigned to each experimental group (genders were housed separately). Nine mice were used per experimental/control condition, and littermates were used as much as possible within each group to account for possible genetic variablility. They were administered one of three antibiotics; cefoperazone, streptomycin, or clindamycin before oral *C. difficile* infection (Table S1). A detailed description of these animal models was outlined previously [@Jenior2017]. A similar experimental design was implemented for gnotobiotic mice and was performed with the University of Michigan Germfree Mouse Center as described previously [@Jenior2017]. All animal protocols were approved by the University Committee on Use and Care of Animals at the University of Michigan and carried out in accordance with the approved guidelines from the Office of Laboratory Animal Welfare (OLAW), United States Department of Agriculture (USDA) registration, and the Association for Assessment and Accreditation of Laboratory Animal Care (AAALAC). The protocol license Institutional Animal Care and Use Committee (IACUC) number for all described experiments is PRO00006983.

#### *C. difficile* infection and necropsy
On the day of challenge, 1x10^3^ *C. difficile* spores were administered to mice via oral gavage in phosphate-buffered saline (PBS) vehicle. Mock-infected animals were given an oral gavage of 100 ul PBS at the same time as those mice administered *C. difficile* spores. 18 hours following infection, mice were euthanized by CO~2~ asphyxiation and necropsied to obtain the cecal contents. Aliquots were immediately flash frozen for later DNA extraction. A third aliquot was transferred to an anaerobic chamber for quantification of *C. difficile* abundance. The remaining content in the ceca was mixed in a stainless steel mortar housed in a dry ice and ethanol bath. Cecal contents from all mice within each pretreatment group were pooled into the mortar prior to grinding to a fine powder. The ground content was then stored at -80&deg;C for subsequent RNA extraction. For 10-day colonization studies, fresh stool was collected from infected mice each day beginning on the day of infection. Mice were monitored for overt signs of disease and were euthanized after the final stool collection.

#### *C. difficile* cultivation and quantification
Cecal samples were weighed and serially diluted under anaerobic conditions with anaerobic PBS. Differential plating was performed to quantify *C. difficile* vegetative cells by plating diluted samples on CCFAE plates (fructose agar plus cycloserine, cefoxitin, and erythromycin) at 37&deg;C for 24 hours under anaerobic conditions [@Wilson1982]. Quantification of total *C. difficile* CFU for the 10-day colonization experiments was performed from stool using TCCFAE to measure total *C. difficile* load in these animals over time.

#### DNA/RNA extraction and sequencing library preparation
DNA for shotgun metagenomic and 16S rRNA gene sequencing was extracted from approximately 50 mg of cecal content from each mouse using the PowerSoil-htp 96 Well Soil DNA isolation kit (MO BIO Laboratories) and an epMotion 5075 automated pipetting system (Eppendorf). The V4 region of the bacterial 16S rRNA gene was amplified using custom barcoded primers [@Kozich2013]. Equal molar ratios of raw isolated DNA within each treatment group were then pooled and ~2.5 ng of material was used to generate shotgun libraries with a modified 10-cycle Nextera XT genomic library construction protocol (Illumina). This was done to mimic the pooling strategy necessary for metatranscriptomic library preparation. Final libraries were pooled at equal molar ratios and stored at -20&deg;C. For RNA extraction, a more detailed description of the procedure can be found in [@Jenior2017]. Briefly, immediately before RNA extraction, 3 ml of lysis buffer (2% SDS, 16 mM EDTA and 200 mM NaCl) contained in a 50 ml polypropylene conical tube was heated for 5 minutes in a boiling water bath [@Lopez-Medina2011]. The hot lysis buffer was added to the frozen and ground cecal content. The mixture was boiled with periodic vortexing for another 5 minutes. After boiling, an equal volume of 37&deg;C acid phenol/chloroform was added to the cecal content lysate and incubated at 37&deg;C for 10 minutes with periodic vortexing. The mixture was the centrifuged at 2,500 x g at 4&deg;C for 15 minutes. The aqueous phase was then transferred to a sterile tube and an equal volume of acid phenol/chloroform was added. This mixture was vortexed and centrifuged at 2,500 x g at 4&deg;C for 5 minutes. The process was repeated until aqueous phase was clear. The last extraction was performed with chloroform/isoamyl alcohol to remove acid phenol. An equal volume of isopropanol was added and the extracted nucleic acid was incubated overnight at -20&deg;C. The following day the sample was centrifuged at 12000 x g at 4&deg;C for 45 minutes. The pellet was washed with 0&deg;C 100% ethanol and resuspended in 200 ul of RNase-free water. Following the manufacturer's protocol, samples were then treated with 2 ul of Turbo DNase for 30 minutes at 37&deg;C. RNA samples were retrieved using the Zymo Quick-RNA MiniPrep according the manufacturer's protocol. The Ribo-Zero Gold rRNA Removal Kit (Epidemiology) was then used to deplete prokaryotic and eukaryotic rRNA from the samples according the manufacturer's protocol (Illumina). Unstranded RNA-Seq libraries were made constructed with the TruSeq Total RNA Library Preparation Kit v2, both using the manufacturer's protocol. Completed libraries were stored at -20&deg;C until time of sequencing.

#### High-throughput sequencing and raw read curation
Sequencing of 16S rRNA gene amplicon libraries was performed using an Illumina MiSeq sequencer as described previously [@Kozich2013]. The 16S rRNA gene sequences were curated using the mothur software package (v1.36) and OTU-based analysis was performed as described in [@Jenior2017]. Genus-level classification-based analysis of 16S rRNA gene sequence data was accomplished using the phylotype workflow in mothur and the full SILVA bacterial taxonomy (release 132). Shotgun metagenomic sequencing was performed in 2 phases. Libraries from mock-infected communities, which were also to be utilized for *de novo* contig assembly, were sequenced using an Illumina HiSeq 2500 on 2x250 paired-end settings and was repeated across 2 lanes to normalize for inter-run variation. *C. difficile*-infected metagenomic libraries were sequenced with an Illumina NextSeq 300 with 2x150 settings across 2 runs to also normalize for inter-run variation. These efforts resulted in an average of 280,000,000 paired raw reads per sample. Metatranscriptomic sequencing was performed on an Illumina HiSeq 2500 with 2x50 settings and was repeated across 4 lanes for normalization and to normalize for technical variation between lans and to obtain necessary coverage [@Franzosa2014]. This gave an average of 380 million raw cDNA reads per library. Both metagenomic and metatranscriptomic sequencing was performed at the University of Michigan Sequencing Core. Raw sequence read curation for both metagenomic and metatranscriptomic datasets was performed in a two step process. Residual 5' and 3' Illumina adapter sequences were trimmed using CutAdapt [@Martin2011] on a per library basis. Reads were quality trimmed using Sickle [@Joshi2011] with a quality cutoff of Q30. This resulted in approximately 270 million reads per library (both paired and orphaned) for both metagenomic and metatranscriptomic sequencing. Actual read abundances for individual metagenomic and metatranscriptomic sequencing efforts can be found in Table S4.

#### Metagenomic contig assembly and gene annotation
Metagenomic contigs were assembled using Megahit [@Li2014] with the following settings: minimum kmer size of 87, maximum kmer size of 127, and a kmer step size of 10. Prodigal was utilized to to identify putative gene sequences, and were screened for a minimum length of 250 nucleotides. These sequences were translated to amino acids and the predicted peptides were annotated based on the KEGG protein database [@Ogata1999] using Diamond implementation of BLASTp [@Buchfink2015]. Peptide-level gene annotations were assigned to the corresponding nucleotide sequence, and genes failing to find a match in KEGG were preserved as unannotated genes. Final nucleotide FASTA files with KEGG annotations were then utilized in the construction of Bowtie2 mapping databases from downstream analyses [@Langmead2012].

#### DNA/cDNA read mapping and normalization
Mapping of DNA and cDNA reads to the assemblies was accomplished using Bowtie2 and the default stringent settings [@Langmead2012]. Optical and PCR duplicates were then removed using Picard MarkDuplicates (http://broadinstitute.github.io/picard/). The remaining mappings were converted to idxstats format using Samtools [@Li2009] and the read counts per gene were tabulated. Discordant pair mappings were discarded and counts were then normalized to read length and gene length to give a per base report of gene coverage. Transcript abundance was then normalized to gene abundance to yield overall level of transcription for each gene. Reads contributed by *C. difficile* were removed from analysis using Bowtie2 against the *C. difficile* str. 630 genome with settings allowing for up to 2 mismatches.

#### Quantification of *in vivo* metabolite relative concentrations
Metabolomic analysis was performed by Metabolon (Durham, NC), for a detailed description of the procedure refer to [@Jenior2017]. Briefly, all methods utilized a Waters ACQUITY ultra-performance liquid chromatography (UPLC) and a Thermo Scientific Q-Exactive high resolution/accurate mass spectrometer interfaced with a heated electrospray ionization (HESI-II) source and Orbitrap mass analyzer at 35,000 mass resolution. Samples were dried then reconstituted in solvents compatible to each of the four methods. The first, in acidic positive conditions using a C18 column (Waters UPLC BEH C18-2.1x100 mm, 1.7 um) using water and methanol, containing 0.05% perfluoropentanoic acid (PFPA) and 0.1% formic acid (FA). The second method was identical to the first but was chromatographically optimized for more hydrophobic compounds. The third approach utilized a basic negative ion optimized conditions using a separate dedicated C18 column. Basic extracts were gradient eluted from the column using methanol and water, however with 6.5mM Ammonium Bicarbonate at pH 8. Samples were then analyzed via negative ionization following elution from a hydrophilic interaction chromatography column (Waters UPLC BEH Amide 2.1x150 mm, 1.7 um) using a gradient consisting of water and acetonitrile with 10 mM Ammonium Formate, pH 10.8. The MS analysis alternated between MS and data-dependent MS n scans using dynamic exclusion. The scan range varied slighted between methods but covered 70-1000 m/z. Library matches for each compound were checked for each sample and corrected if necessary.

#### Statistical methods
All statistical analyses were performed using R (v.3.2.0) and the vegan package [@vegan2018]. Significant differences of inverse Simpson diversity, CFU, and metabolite concentrations were determined by Wilcoxon signed-rank test with Benjamini-Hochberg correction using a study-wide Type I error rate of 0.05. Undetectable points used the limit of detection for CFU statistical calculations. Dynamic time warping was performed with the dtw package in R [@Giorgino2009]. Random forest was performed using the AUCRF implementation [@Calle2011] as well as the standard package [@Breiman2001] in R. Distances of outlier points from center line during metatranscriptomic comparisons was accomplished using 2-dimensional linear geometry.

#### Data Availability
Pooled and quality trimmed *C. difficile*-infected metatranscriptomes (SRA; PRJNA354635) and 16S rRNA gene amplicon read data (SRA; PRJNA383577) from infection experiments are available through the NCBI Sequence Read Archive. Metagenomic reads and mock-infected metatranscriptomic reads can be found also on the SRA (PRJNA415307). Data processing steps beginning with raw sequence data to the final manuscript are hosted at https://github.com/SchlossLab/Jenior_Metatranscriptomics_mSphere_2018.

#### Acknowledgments
The authors would like to acknowledge Charles Koumpouras for assistance with DNA extractions and metabolomic sample preparation. We would also like to acknowledge members of the University of Michigan Germfree Mouse Center, University of Michigan Sequencing Core, and Metabolon for their assistance in experimental design, execution, and data collection.

#### Author Contributions
M.L.J. conceived, designed and performed experiments, analyzed data, and drafted the manuscript. J.L.L. performed experiments, analyzed data, and contributed to the manuscript. V.B.Y. contributed to the manuscript. P.D.S. interpreted data and contributed the manuscript. The authors declare no conflicts of interest.

### References

<div id="refs"></div>


### Figure and Table Legends

#### Figure 1 | Distinct antibiotic pretreatments have differential impacts on *C. difficile* colonization and cecal microbiota community structure.
**(A)** *C. difficile* 630 CFU in stool of infected mice following each antibiotic-pretreated group over 10 days of infection. Median and interquartile range are shown for each time point. Both cefoperzone and streptomycin pretreatments had more significantly detectable CFU on the final day of detectable CFU associated with clindamycin-pretreatment (*p* < 0.001). **(B)** Relative abundance of family-level OTU taxonomic classification in each pretreatment group from 16S rRNA gene sequencing. **(C)** Quantification of terminal vegetative *C. difficile* CFU in cecal content across 18 hour colonization models. Black lines indicate median values and each pretreatment group had significantly greater detectable CFU than no antibiotic controls. Significant differences in A were determined through permANOVA with dynamic time warping and in C were found by Wilcoxon rank-sum test with Benjamni-Hochberg correction when necessary. The limit of detection was used in place of undetectable values for statistical testing.

#### Figure 2 | Significant differences in cecal community structure and metabolomes track with downstream *C. difficile* clearance across antibiotic pretreatment regimes.
**(A)** NMDS ordination of Bray-Curtis distances of OTU relative abundances between mouse cecal communities that remained colonized by *C. difficile* and those that eventually cleared the infection. **(B)** Relative abundance of OTUs included the optimal model generated by AUCRF classifying the same groups as in panel A. Species-level identification was obtained using centroid representative sequences for each OTU. **(C)** NMDS ordination of Bray-Curtis distances using metabolite intensities between the aforementioned groups of animals. **(D)** Scaled intensity of metabolites included the optimal model generated by AUCRF classifying colonized and clearing mouse cecal microbiomes. Differences for ordinations in A & C were calculated using permANOVA. Optimal AUCRF models demonstrated 0% out of bag error, and significant differences in B & D were determined by Wilcoxon rank-sum test with Benjamni-Hochberg correction.

#### Figure 3 | *C. difficile* colonization alters gene transcription of taxonomic groups differentially between antibiotic pretreatments.
Each point represents a unique gene from the respective metagenomic assembly. Coordinates were determined by the log~2~-transformed difference in transcription level between *C. difficile*-infected and mock-infected conditions for each gene. Outliers were defined using linear correlation and a squared residual cutoff of 2. Euclidean distance of outliers to the x=y line was also calculated. The coloring of each point indicates the genus that the transcript originated from and the and the gray points denote those genes with consistent transcription levels between conditions as defined by outlier analysis. Antibiotic pretreatments; **(A)** Streptomycin-pretreated, **(B)** Cefoperazone-pretreated, and **(C)** Clindamycin-pretreated.

#### Figure 4 | A majority of metatranscriptomic changes are focused within minority members of each microbiota.
Absolute difference in metatranscriptomic reads contributed by each genus in pretreatments between mock and *C. difficile*-infected conditions. Colored lines denoted antibiotic pretreatment. Each point represents all transcript contributed by that genus in each pretreatment group, and point colors correspond to those used in Figure 3. Numbers at the base of pretreatment lines in the first panel represent the quantity of genera in each group as some points are obscured.

#### Figure 5 | Metatranscriptomic changes due to infection in certain metabolic pathways are overrepresented in the minority taxa.
Log~2~ metagenome-normalized cDNA abundances for genes with differential transcription during infection belonging to genera that had a relative abundance greater than 0.1%. Double asterisks denotes genes shared between pretreatment groups.


#### Supplementary Figure 1 | Conserved markers of *C. difficile* colonization susceptibility in mouse cecal microbiomes.
**(A)** NMDS ordination of Bray-Curtis distances of OTU relative abundances between mouse cecal communities that are susceptible to colonization by *C. difficile* and those that are resistant. **(B)** Relative abundance of OTUs included in the optimal model generated by AUCRF classifying the same groups as in panel A, labeled with the finest resolution provided by classifying to the RDP reference database. **(C)** NMDS ordination of Bray-Curtis distances between metabolite intensities. **(D)** Scaled metabolites relative abundnaces included the optimal model generated by AUCRF classifying resistant and susceptible cecal microbiomes. Significant differences in A & C were also calculated using permANOVA. The AUCRF models generated in this analysis also had 0% out of bag error and significant differences in B & D were calculated as in Figure 2.

#### Supplementary Figure 2 | Signatures of infection effect on the cecal microbiomes conserved across pretreatment groups.
**(A)** NMDS ordination of Bray-Curtis distances of OTU relative abundances between antibiotic-pretreated mouse cecal communities that are either *C. difficile*-colonized or mock-infected. **(B)** Relative abundance of OTUs included the optimal model generated by AUCRF classifying the same groups as in panel A. **(C)** NMDS ordination of Bray-Curtis distances using metabolite intensities between the same classes. **(D)** Scaled intensity of metabolites included the optimal model classifying infected and uninfected cecal microbiomes. Statistical differences were performed as in Figure 2.

#### Supplementary Figure 3 | Relative concentrations of select *C. difficile* Stickland fermentation metabolites across infection models.
Metabolites included in this analysis were chosen based on their previously published interaction with *C. difficile* Stickland fermentation: **(A)** Proline, **(B)** 4-Hydroxyproline, **(C)** Glycine, **(D)** 5-Aminovalerate. Significant differences were determined by Wilcoxon rank-sum test with Benjamini-Hochberg correction when necessary. Black asterisks in the plotting area represent within group differences, while green asterisks along the top border denote significant differences compared to untreated control.

#### Supplementary Figure 4 | Unique genes with functional annotation detectable within each metagenome and metatranscriptome.
Genes in each datasets were derived from respective metagenomic assemblies, with only those genes that mapped to a KEGG pathway-level annotation: **(A)** Untreated, **(B)** Streptomycin-pretreated, **(C)** Cefoperazone-pretreated, and **(D)** Clindamycin-pretreated mice. Each panel includes that treatments' unique genes from metagenomic assembly and genes that recruited at least one cDNA read from the corresponding metatranscriptomes (1X coverage). Collector's curves from rarefaction analysis of reads mapped to genes from **(E)** metagenomes and **(F)** metatranscriptomes.


#### Supplementary Table 1 | Antibiotic pretreatment regime summaries.
Antibiotic classes, mechanisms, and dosage information for each pretreatment. Quantified effect on alpha- and beta-diversities of the cecal microbiota are also included.

#### Supplementary Table 2 | Summary of impact of infection on cecal community structure and metabolome.
Global effect as well as changes to specific metabolites are included.

#### Supplementary Table 3 | Summary statistics for datasets containing replicates generated during this study.

#### Supplementary Table 4 | High-throughput sequencing read counts and metagenomic assembly.
Raw and curated read abundances for both metagenomic and metatranscriptomic sequencing efforts. Raw read curation steps are outlined in Materials & Methods. Metagenomic contig summary statistics reflect the quality of assembly for each group.