Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convenience function to support a PCA based Niche overview as graphical plot #87

Open
2 tasks done
Martin-Jung opened this issue Jan 24, 2024 · 9 comments
Open
2 tasks done
Labels
Quality of life Quality of life improvement

Comments

@Martin-Jung
Copy link
Collaborator

Martin-Jung commented Jan 24, 2024

The idea (by Piero) is as follows:

  • Take the environmental covariates from a BiodiversityDistribution object and do a PCA on them
  • Plot the first and second PC axis in environmental space as points (sample like 10000 from them)
  • Then overlay in a different colour all datasets, such as points or ranges to assess where data falls in.

This would allow a visual assessment of the extent to which the data falls within the whole environmental space. It should not replace the existing functionalities of partial_density() here but instead looks at all covariates.


  • Implement the function, either within the class definition or as external helper function (in plot.r)
  • Make sure it works and add a unit test
@Martin-Jung Martin-Jung added the Quality of life Quality of life improvement label Jan 24, 2024
@mhesselbarth
Copy link
Collaborator

mhesselbarth commented Jan 24, 2024

I have worked on this using the ecospat packages. Here is my approach. x and y are the point and range maps. envir is the stack of predictors.

Source:

library(ecospat)
library(ade4)
library(terra)

compare_niches <- function(x, y, envir) {
  
  # extract values for first object
  values_x <- terra::extract(x = envir, y = x, ID = FALSE)
  values_x <- values_x[complete.cases(values_x), ]
  
  # extract values for second object
  values_y <- terra::extract(x = envir, y = y, ID = FALSE)
  values_y <- values_y[complete.cases(values_y), ]
  
  # extract environmental values
  values_envir <- terra::as.data.frame(x = envir)
  
  # Calibrating the PCA in the whole study area, including both x and 
  # y ranges (same as PCAenv in Broenniman et al. 2012)
  pca_env <- ade4::dudi.pca(values_envir, center  = TRUE, scale = TRUE, scannf = FALSE, 
                            nf = 2)
  
  # predict the global envir scores on the PCA axes
  scores_bkg<- pca_env$li	
  
  # scores for the two species
  scores_x <- ade4::suprow(pca_env, values_x)$lisup
  scores_y <- ade4::suprow(pca_env, values_y)$lisup
  
  # calculation of occurence density
  suppressMessages(density_x <- ecospat::ecospat.grid.clim.dyn(glob = scores_bkg, glob1 = scores_bkg, 
                                                               sp = scores_x, R = 100))
  
  suppressMessages(density_y <- ecospat::ecospat.grid.clim.dyn(glob = scores_bkg, glob1 = scores_bkg, 
                                                               sp = scores_y, R = 100))
  
  # calculate overlap
  ecospat::ecospat.niche.overlap(density_x, density_y, cor = TRUE)
  
}

@Martin-Jung
Copy link
Collaborator Author

Sounds good. I think it would be neat if there is a wrapper function for ibis.iSDM objects. So specifically by providing a BiodiversityDistribution-class object to the function and it then extracts the covariates and any datasets (both ranges and points) to construct the plot.
If at all possible without extra dependencies, e.g. ecospat, the better (PCA can be created with base R)

@mhesselbarth
Copy link
Collaborator

Sounds all reasonable 👍

The function above is not exactly Piero's idea, but similar, and based on the first paper listed under the Sources. So in terms of a BiodiversityDistribution-class would this extract the biodiversity data and the range? Or what would should it be possible to provide two BiodiversityDistribution-class objects? Or both ?

@Martin-Jung
Copy link
Collaborator Author

Sounds all reasonable 👍

The function above is not exactly Piero's idea, but similar, and based on the first paper listed under the Sources. So in terms of a BiodiversityDistribution-class would this extract the biodiversity data and the range? Or what would should it be possible to provide two BiodiversityDistribution-class objects? Or both ?

We made a sketch in the coffee room today :D But yeah, extract all specified biodiversity data in the object. Can talk tmr or so about it.

@mhesselbarth
Copy link
Collaborator

I am on leave until next week, but happy to chat during lunch next week (or anytime)

@Martin-Jung
Copy link
Collaborator Author

Needed a distraction/procastrination from proposal writing. Now implemented in c033af2 . I did not add any of the summary overlap metrics (Schoener's D etc), which I think need a dedicated issue and function elsewhere. This plotting function simply makes point clouds and colours them.

Works for two selected variables or - if not variables are specified - simply makes a PCA on all variables in envvars.
Reference: https://iiasa.github.io/ibis.iSDM/reference/nicheplot.html

@jeffreyhanson
Copy link
Contributor

jeffreyhanson commented Nov 24, 2024

Just an idea, I wonder if you have considered alternative dimension reduction techniques to visualize a species' niche? This is because principle components analyses are unable to account for non-linear releationships/interactions between environmental variables and this often manifests as PCA plots that contain "arcs" (points appear as a "n", "v", "<", ">"
shape when plotted) that can make it difficult to interpret. To address this, one approach would be to use a detrended correspondence analysis to compute the 2d ordination scores for visualization (see vegan::decorana()). Additionally/alternatively, I wonder if you had considered using t-SNE (see https://www.datacamp.com/tutorial/introduction-t-sne)? This might be better suited for characterizing environmental variation where clustering is present?

@jeffreyhanson
Copy link
Contributor

jeffreyhanson commented Nov 24, 2024

In case it's of interest, here's a quick demo comparing the methods:

# load packages
library(tidyverse)
library(ibis.iSDM)
library(vegan)
library(terra)
library(Rtsne)

# import example environmental data
env_data <- 
  system.file("extdata/predictors/", package = "ibis.iSDM", mustWork = TRUE) %>%
  dir(pattern = "*.tif",full.names = TRUE) %>%
  # exclude categorical layer
  grep(pattern = "koppen", fixed = TRUE, invert = TRUE, value = TRUE) %>%
  terra::rast()

# run PCA and extract first two scores
pca_data <- 
  env_data %>%
  terra::as.data.frame(na.rm = TRUE) %>%
  prcomp(center = TRUE, scale. = TRUE) %>%
  `[[`("x") %>%
  as.data.frame() %>%
  select(1, 2) %>%
  setNames(c("d1", "d2")) %>%
  mutate(name = "pca")

# run DCA and extract first two scores
dca_data <-
  env_data %>%
  terra::as.data.frame(na.rm = TRUE) %>%
  apply(2, scale) %>%
  as.matrix() %>%
  # adjust for negative values
  {. + abs(min(.)) + 0.01} %>%
  vegan::decorana() %>%
  vegan::scores(display = "sites", choices = 1:2) %>%
  as.data.frame() %>%
  setNames(c("d1", "d2")) %>%
  mutate(name = "decorana")

# run t-SNE
tsne_data <-
  env_data %>%
  terra::as.data.frame(na.rm = TRUE) %>%
  as.matrix() %>%
  Rtsne::normalize_input() %>%
  Rtsne::Rtsne(dims = 2) %>%
  `[[`("Y") %>%
  as.data.frame() %>%
  select(1, 2) %>%
  setNames(c("d1", "d2")) %>%
  mutate(name = "t-SNE")

# combine data for visualization
all_data <-
  pca_data %>%
  bind_rows(dca_data) %>%
  bind_rows(tsne_data)

# plot results
ggplot(data = all_data, aes(x = d1, y = d2)) +
  geom_point(alpha = 0.1) +
  facet_wrap(~ name, scales = "free")

dim-red

@jeffreyhanson
Copy link
Contributor

jeffreyhanson commented Nov 24, 2024

Also, just in case it's hard to tell exactly what I mean by an "arc" in the PCA plot. Here's the same plot above, but with red lines to show where the "arc" is. Granted, the DCA/decorana still has a bit of an arc, but I would aruge it's not as "bad" (subjectively).

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Quality of life Quality of life improvement
Projects
None yet
Development

No branches or pull requests

3 participants