Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NaN values in extraction #21

Open
victofs opened this issue Feb 13, 2020 · 7 comments
Open

NaN values in extraction #21

victofs opened this issue Feb 13, 2020 · 7 comments
Labels
awaiting feedback Further information is requested

Comments

@victofs
Copy link

victofs commented Feb 13, 2020

I am trying to use exact_extract function where the shapefile is the one used in the Basic Usage section of the readme file

# Pull municipal boundaries for Brazil
brazil <- st_as_sf(getData('GADM', country='BRA', level=2))

I have the following RasterLayer:

my_raster   <- raster::raster(ncfname)
str(my_raster)
Formal class 'RasterLayer' [package "raster"] with 12 slots
  ..@ file    :Formal class '.RasterFile' [package "raster"] with 13 slots
  .. .. ..@ name        : chr "xxxx"| __truncated__
  .. .. ..@ datanotation: chr "FLT4S"
  .. .. ..@ byteorder   : chr "little"
  .. .. ..@ nodatavalue : num -32767
  .. .. ..@ NAchanged   : logi FALSE
  .. .. ..@ nbands      : int 6552
  .. .. ..@ bandorder   : chr "BIL"
  .. .. ..@ offset      : int 0
  .. .. ..@ toptobottom : logi FALSE
  .. .. ..@ blockrows   : int 0
  .. .. ..@ blockcols   : int 0
  .. .. ..@ driver      : chr "netcdf"
  .. .. ..@ open        : logi FALSE
  ..@ data    :Formal class '.SingleLayerData' [package "raster"] with 13 slots
  .. .. ..@ values    : logi(0) 
  .. .. ..@ offset    : num 0
  .. .. ..@ gain      : num 1
  .. .. ..@ inmemory  : logi FALSE
  .. .. ..@ fromdisk  : logi TRUE
  .. .. ..@ isfactor  : logi FALSE
  .. .. ..@ attributes: list()
  .. .. ..@ haveminmax: logi FALSE
  .. .. ..@ min       : num Inf
  .. .. ..@ max       : num -Inf
  .. .. ..@ band      : int 112
  .. .. ..@ unit      : chr "m**3 m**-3"
  .. .. ..@ names     : chr "xxxx"
  ..@ legend  :Formal class '.RasterLegend' [package "raster"] with 5 slots
  .. .. ..@ type      : chr(0) 
  .. .. ..@ values    : logi(0) 
  .. .. ..@ color     : logi(0) 
  .. .. ..@ names     : logi(0) 
  .. .. ..@ colortable: logi(0) 
  ..@ title   : chr(0) 
  ..@ extent  :Formal class 'Extent' [package "raster"] with 4 slots
  .. .. ..@ xmin: num -55
  .. .. ..@ xmax: num -45
  .. .. ..@ ymin: num -17.1
  .. .. ..@ ymax: num -12.9
  ..@ rotated : logi FALSE
  ..@ rotation:Formal class '.Rotation' [package "raster"] with 2 slots
  .. .. ..@ geotrans: num(0) 
  .. .. ..@ transfun:function ()  
  ..@ ncols   : int 101
  ..@ nrows   : int 41
  ..@ crs     :Formal class 'CRS' [package "sp"] with 1 slot
  .. .. ..@ projargs: chr "+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0"
  ..@ history : list()
  ..@ z       :List of 1
  .. ..$ : chr "2019-01-05 16:06:28"

Then i run the exact_extract function

brazil$mean <- (brazil, my_raster, 'mean')

which returns me NaN values, as shown below

image

Am i doing something wrong importing the raster or this is a known issue in the package?

@dbaston
Copy link
Member

dbaston commented Feb 13, 2020

Are all of the results NaN, or just some of them? If a polygon doesn't intersect any defined cell in your raster, the mean will be 0/0 = NaN.

@dbaston dbaston added the awaiting feedback Further information is requested label Feb 19, 2020
@dbaston dbaston closed this as completed Apr 12, 2020
@aldotapia
Copy link

I can provide some feedback (for this closed and old issue) since I experimented the same today with exactextractr_0.9.0 (same with exactextractr_0.7.2 and exactextractr_0.8.2) installed from conda (r-exactextract). Trying to extract some values from a GLDAS product, I got the following values:

> v <- read_sf('./some/shape.shp')
> r <- rast('./some/raster.tif')
> exact_extract(x = r,
               y = v[8,],
               include_cols = "gauge_id",
               progress = F)
[[1]]
  gauge_id value coverage_fraction
1  1041002     0        0.13134970
2  1041002   NaN        0.45708704
3  1041002   NaN        0.11274476
4  1041002   NaN        0.03988117
5  1041002   NaN        0.13633704
6  1041002   NaN        0.02293241

While in other R installation (outside conda environment with exactextractr_0.9.0) for the same task I got:

[[1]]
  gauge_id   value coverage_fraction
1  1041002 0.08750        0.13134970
2  1041002 0.13000        0.45708704
3  1041002 0.03750        0.11274476
4  1041002 0.04250        0.03988117
5  1041002 0.07000        0.13633704
6  1041002 0.00875        0.02293241

Session info where the task fails:

R version 4.1.3 (2022-03-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Big Sur/Monterey 10.16

Matrix products: default
BLAS/LAPACK: /Users/aldotapia/miniforge3/envs/hidrocl/lib/libopenblasp-r0.3.20.dylib

locale:
[1] C/UTF-8/C/C/C/C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] terra_1.5-21        sf_1.0-7            exactextractr_0.9.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.9         raster_3.5-21      magrittr_2.0.3     units_0.8-0       
 [5] lattice_0.20-45    rlang_1.0.4        fansi_1.0.3        tools_4.1.3       
 [9] grid_4.1.3         KernSmooth_2.23-20 utf8_1.2.2         cli_3.3.0         
[13] e1071_1.7-11       DBI_1.1.3          class_7.3-20       tibble_3.1.8      
[17] lifecycle_1.0.1    vctrs_0.4.1        codetools_0.2-18   glue_1.6.2        
[21] sp_1.5-0           proxy_0.4-27       compiler_4.1.3     pillar_1.8.1      
[25] classInt_0.4-7     pkgconfig_2.0.3   

Session info where the task runs ok:

R version 4.2.0 (2022-04-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.5

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] exactextractr_0.9.0 terra_1.5-42        sf_1.0-7           

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.9         rstudioapi_0.13    raster_3.5-21      magrittr_2.0.3     units_0.8-0       
 [6] tidyselect_1.1.2   lattice_0.20-45    R6_2.5.1           rlang_1.0.4        fansi_1.0.3       
[11] dplyr_1.0.9        tools_4.2.0        grid_4.2.0         utf8_1.2.2         KernSmooth_2.23-20
[16] cli_3.3.0          e1071_1.7-11       DBI_1.1.3          ellipsis_0.3.2     class_7.3-20      
[21] assertthat_0.2.1   tibble_3.1.7       lifecycle_1.0.1    crayon_1.5.1       purrr_0.3.4       
[26] vctrs_0.4.1        codetools_0.2-18   glue_1.6.2         sp_1.5-0           proxy_0.4-27      
[31] compiler_4.2.0     pillar_1.7.0       generics_0.1.3     classInt_0.4-7     pkgconfig_2.0.3

I tried to update R to 4.2.0 or 4.2.1 in my conda env without success, so far I'm using a scale factor as workaround and works alright for my needs:

exact_extract(x = r*100,
               y = v[8,],
               include_cols = "gauge_id",
               progress = F)
[[1]] 
  gauge_id  value coverage_fraction
1  1041002  8.750        0.13134970
2  1041002 13.000        0.45708704
3  1041002  3.750        0.11274476
4  1041002  4.250        0.03988117
5  1041002  7.000        0.13633704
6  1041002  0.875        0.02293241

@aldotapia
Copy link

A bit more on context:

The scale factor inside the R function wasn't enough. I'm processing files in Python and exporting them to R for using the exact_extract computing weighted stats and the percent of valid pixels in the analysis.

So I applied a scale factor while doing the process in Python and saved the file in uint8 data type, everything works great so far. Now the output takes all the valid pixels when using floating point rasters some zones took 0 pixels (NaN output) or 30 - 50%, which is wrong knowing the data.

Any hints or ideas of this behavior?

@dbaston dbaston reopened this Aug 29, 2022
@dbaston
Copy link
Member

dbaston commented Aug 29, 2022

Thanks for documenting this. Is my hunch right that any of the following modifications (applied by itself) would "fix" this?

  • r <- raster('./some/raster.tif')
  • x = r*1
  • y = v[8:9, ]

@aldotapia
Copy link

Yes, you're right. I suppose it's related with raster's statistics, r*1 made the trick.

I tested it with 6 dates and +400 polygons, everything looks fine so far (with r and with r*1):
output

I need to run it on +7000 dates, so if I face another issue related with this function, I'm gonna provide more feedback.

Thank you!

@dbaston
Copy link
Member

dbaston commented Aug 29, 2022

r*1 is just a trick to force terra to load everything into memory. It appears that either there is a problem with the way I'm requesting cell values from an out-of-memory raster with terra, or there's a problem within terra itself. Having a reproducible case would help figure out a solution rather than a workaround.

@aldotapia
Copy link

Sure, you can access to data in this drive folder

I also saved the result and sessionInfo in workspace.RData. Also, this example is the extraction of one of the +400 vectors used. Some of them have the same issue.

The script used is:

setwd('test_path')

v <- 'HidroCL_boundaries.shp'
r <- 'snow_gldas_A20000102.tif'

result <- exactextractr::exact_extract(x = terra::rast(r),
                                       y = sf::read_sf(v)[8,],
                                       include_cols = "gauge_id",
                                       progress = F)

 session_info <- sessionInfo()
 save.image('~/hidrocl_test/workspace.RData')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting feedback Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants