MMGS

A comprehensive R package, multiple-environments multiple methods genomic selection (MMGS), developed by Mingjia Zhu, integrates the polygenic environmental interaction (PEI) and Reaction Norm (RE) methods along with 15 prediction models that include difference prediction estimated methods contains parametic, semi-parametric and non-parametric.

Description

RE model includes four steps: (1) Using CERIS algorithm (Guo, 2021) to identify an environmental index that explained the largest proportion of phenotypic variation. (2) Regressing the observed phenotypes on the identified environmental index to obtain an intercept and a slope estimate for each tested genotype. (3) Treating intercept and slope as new "traits" and perform genomic prediction through ridge regression to predict the intercept and slope for each untested genotype. (4) Predict the phenotypes of the untested genotypes using the predicted intercept and slope and the environmental index value of each environment. Consistent with the RE model, the PEI model starts with identifying key environmental index that best captures the phenotypic variation

Total of these predicted statistical models are classified into three major categories: parametric, semi-parametric, and non-parametric (Admas et al, 2024). The parametric statistical models include mixed linear models like genomic best linear unbiased prediction (G-BLUP) (Vanraden, 2008), BayesA (BA) and BayesB (BB) (Meuwissen et al., 2001), BayesC (BC) (George and McCulloch, 1993), Bayesian ridge regression (BRR) (Erbe et al., 2012), and Bayesian LASSO (BL) (Park and Casella, 2008), least absolute shrinkage and selection operator (LASSO) (Usai et al., 2009), ridge regression (RR) (Whittaker et al., 2000), ridge regression best linear unbiased prediction (RR-BLUP) (Meuwissen et al., 2001), and elastic net (EN) (Zou and Hastie, 2005). The semi-parametric method includes the reproducing kernel Hilbert space (RKHS) model and multiple kernel RKHS (MKRKHS) (Gianola et al., 2006). The non-parametric method comprises support vector machine (SVM) (Maenhout et al., 2007), and random forest (RF) (Chen and Ishwaran, 2012), and gradient boosting machine (GBM) (Li et al., 2018).

Installation

You can install the package from CRAN using the following command: From Github:

devtools::install_github("Ryougi-yukiro/MMGS")

Usage

Provide examples of how to use your package. Include code snippets and brief explanations to demonstrate the key features and functionalities. You can also provide links to additional resources or documentation.

Step.1 Load packages and data

We have built-in data from a hybrid population that includes environmental data from multiple locations, filtered genotype data, and flowering-related phenotypic data. This dataset is smaller and easier for beginners to understand how the package is used.

#Load the required packages
library("MMGS")
library("dplyr")# used for data reshape and melt

data(trait)
data(geno)
data(env_info)
data("PTT_PTR")

Step.2 data analysis

This step explores the basic attributes of the data situation and provides pre-processing for subsequent analysis.

env_trait<-env_trait_calculate(data=trait,trait="FTgdd",env="env_code")
LbyE<-LbyE_calculate(data=trait,trait="FTgdd",env="env_code",line="line_code")
LbyE_corrplot(LbyE=LbyE)
etl<-LbyE_Reshape(data=env_trait,env="env_code",LbyE=LbyE)
etl_plotter(data=etl,trait=env_trait)
Regression<-Reg(LbyE = LbyE, env_trait = env_trait)
#Reg_plotter(Reg = Regression)
result<-line_trait_mean(data=trait,trait="FTgdd",mean=env_trait,LbyE=LbyE,row=2)
MSE<-result[[1]]
ltm<-result[[2]]
#mse_plotter(MSE)
#Mean_trait_plot(Regression,MSE)

Step.3 Search env index

This step aims to find the most relevant environmental factors to provide a solid basis for subsequent predictions

Paras <- colnames(PTT_PTR)[-c(1:4)]
#windows-search
pop_cor<-Exhaustive_search(data=env_trait, env_paras=PTT_PTR, searching_daps=122,
                           p=1, dap_x=122,dap_y=122,LOO=0,Paras=Paras)
#plot
#Exhaustive_plotter(Correlation=pop_cor,dap_x=122, dap_y=122,p=1,Paras=Paras)

#correlation
envMeanPara<-envMeanPara(data=env_trait, env_paras=PTT_PTR, maxR_dap1=18,
                         maxR_dap2=43, Paras=Paras)
#plot
#envMeanPara_plotter(data=envMeanPara,Paras=Paras)

Step.3 CV

Users can customize the model they need, the function uses the by default, the given environment parameters can be obtained from the previous results , fold number represents the number of folds, reshuffle represents the number of repetitions.

#Check pheno
pheno<-LbyE[which(as.character(LbyE$line_code)%in%c("line_code",as.character(geno$line_code))),];
#CV 
out<-MMGP(pheno=pheno, geno=geno, env=env_info,para=envMeanPara, Para_Name=Para[1], depend="PEI",model="BB", kernel="linear", fold=2, reshuffle=5, methods="RM.G")
#result
#> mean(out[[3]])
#[1] 0.8728506
#> apply(out[[2]],2,mean)
#     PR12      IA14      PR11      IA13     PR14S      KS11      KS12 
#0.5418663 0.3868576 0.5381628 0.4759335 0.4871427 0.6219213 0.6380658
#head(out[[1]])
#       obs      pre     col para
#1 1595.988 1588.782 #FF0000 PR12
#2 1512.918 1576.437 #FF0000 PR12

Correlation here refers to the correlation between the predicted phenotypes and the actual phenotypes of the environment, not the breeding values, so please do your own calculations first if needed (before the R package is updated).

Others function Example

pheno<-LbyE
pheno$PR11 <-NA
#linear radial polynomial linear
#library(dplyr)
for( i in envMeanPara$env_code){
  pheno<-LbyE
  pheno[["KS12"]]<-NA
out<-MMPrdM(pheno=pheno, geno=geno,env=env_info,para=envMeanPara,
            Para_Name=c("PTS"), depend="PEI",
            SVM_cost = 1,gamma=10,kernel="linear",fixed=T,
            model="SVM",reshuffle=1,methods="RM.G")
(cor<-cor(out[,2],LbyE[["KS12"]]))
print(paste(i," : ",cor))
}

Methods notes

For some non-parametric algorithms, please refer here for changes.

#SVM : There are 4 kernel you can use :linear: u0v
#polynomial: (γu0v + coef0)degree
#radial basis: e( − γ|u − v|2)
#sigmoid: tanh(γu0v + coef0)

#GBM function
  if(is.null(GBM_params)){
    params <- list(boosting="gbdt",objective = "regression",metric = "RMSE",min_data = 1L,
                   learning_rate = 0.01,num_iterations=1000,num_leaves=3,max_depth=-1,
                   early_stopping_round=50L,cat_l2=10,skip_drop=0.5,drop_rate=0.5,
                   cat_smooth=5)
  }

Documentation

See full documentation from original repository

Command line interface

env_trait_calculate
envMeanPara
envMeanPara_plotter
etl_calculate
etl_plotter
Exhaustive_plotter
Exhaustive_search
MMGS
h2_rrBLUP
LbyE_calculate
LbyE_corrplot
line_trait_mean
ltm_plotter
Mean_trait_plot
mse_plotter
prdM_plotter
Reg
Reg_plotter
Slope_Intercept

Implementation notes

MMGS is a collection of tools for cross-environmental genome-wide selection prediction that integrates most genome-wide prediction models, both parametric and non-parametric. You can input your own collected data against sample data and get the results you want directly through the built-in functions of the toolkit, which requires no additional statistical knowledge or coding skills and is somewhat user-friendly because it saves users from having to search for various tools and apply them to cross-environmental prediction.

License

GPL-3

Reference

Li X, Guo T, Wang J, et al. An integrated framework reinstating the environmental dimension for GWAS and genomic selection in crops[J]. Molecular Plant

Jarquín D, Crossa J, Lacaze X, et al. A reaction norm model for genomic selection using high-dimensional genomic and environmental data[J]. Theoretical and applied genetics, 2014, 127: 595-607.

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
R		R
data		data
man		man
.Rbuildignore		.Rbuildignore
.Rhistory		.Rhistory
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
MMGS-tutorial.pdf		MMGS-tutorial.pdf
MMGS.Rproj		MMGS.Rproj
NAMESPACE		NAMESPACE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MMGS

Description

Installation

Usage

Step.1 Load packages and data

Step.2 data analysis

Step.3 Search env index

Step.3 CV

Others function Example

Methods notes

Documentation

Command line interface

Implementation notes

License

Reference

About

Releases 2

Packages

Languages

Ryougi-yukiro/MMGS

Folders and files

Latest commit

History

Repository files navigation

MMGS

Description

Installation

Usage

Step.1 Load packages and data

Step.2 data analysis

Step.3 Search env index

Step.3 CV

Others function Example

Methods notes

Documentation

Command line interface

Implementation notes

License

Reference

About

Topics

Resources

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages