-
Notifications
You must be signed in to change notification settings - Fork 0
/
UUSS_MAT_Reconstruction.Rmd
1172 lines (927 loc) · 47.7 KB
/
UUSS_MAT_Reconstruction.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
author: "Andrew Gillreath-Brown"
date: "`r format(Sys.time(), '%d %B %Y')`"
title: "paleoMAT: A Low-Frequency Summer Temperature Reconstruction for the United States Southwest, 3000 BC – AD 2000"
html_document:
toc: yes
toc_depth: 2
toc_float: yes
number_sections: no
code_folding: hide
output:
html_document:
df_print: paged
mainfont: Calibri
editor_options:
chunk_output_type: console
keep_md: yes
---
```{r setup, include = FALSE, results = 'hide'}
knitr::opts_chunk$set(echo = TRUE)
# If you have cloned the repository, then you can also use devtools::install() to install the package.
# devtools::install()
# devtools::load_all()
library(paleomat)
```
# Introduction
Pollen data can be used to do paleo-temperature reconstructions. However, this type of modeling can be affected by a lot of different aspects, such as paleoecological processes, chronology, and topographic effects on communities and species. Improvements in these techniques and the increasing breadth of paleoclimatic proxies available, however, have furthered our understanding of the effects of climate-driven variability on past societies.
This program allows you to reconstruct the climate for multiple locations across the southwestern United States. In the program below, you can download fossil and modern data from the [Neotoma Paleoecology Database](https://www.neotomadb.org/), then compile the data using Williams and Shuman (2008), so that there will be columns of taxa with counts and associated metadata attached to each of those records/rows. Some data in Neotoma overlaps with what was used by Whitmore et al. (2005) in the North American Modern Pollen Database, which can be obtained from one of two sources [the Laboratory for Paleoclimatology and Climatology](http://www.lpc.uottawa.ca/data/modern/) at the University of Ottawa and [the Williams Paleoecology Lab](http://www.geography.wisc.edu/faculty/williams/lab/Downloads.html) at the University of Wisconsin. However, data from the North American Pollen Database is constantly being uploaded to Neotoma, and in some cases corrections are being made to the data too. Here, we use the Neotoma package, version 1.7.4.
**Note**: While running the code below, if you have trouble with `paleomat` datasets (e.g., `paleomat::coreTops`), then you can use `here::here` with `data` as the root folder, then the variable name with `.rda`, such as `load(file = here::here("data/coreTops.rda"))`, to load a dataset. Additionally, we use `if` statements in a number of places throughout the markdown to save time, as some items can take a while to run. However, if you have cloned the repository, then you can run the interior contents of those if statements to overwrite the variables.
# Download Fossil and Modern Pollen Data
```{r function_model, cache = TRUE, message = FALSE, warning = FALSE}
# Use gpids to get the United States and Canada (or their geopolitical units) in North America. Then, get the datasets for the pollen data from each of the gpids. Here, to save time, we read in the already downloaded data from Neotoma, as it takes a while to run for the 3 countries. Data downloaded on October 23, 2021.
# Retrieve the GeoPolitical Units table, which has country, state, and county level names with associated IDs. We skip this step if the fossil pollen and modern pollen datasets are present.
if (!file.exists(here::here("data/MP_metadata_counts.rda")) | !file.exists(here::here("data/NAfossil_metadata_counts.rda"))) {
gpids <-
neotoma::get_table(table.name = 'GeoPoliticalUnits')
NAID <-
gpids %>%
dplyr::filter(
GeoPoliticalName %in% c('United States', 'Canada', 'Mexico'),
GeoPoliticalUnit == 'country'
) %$%
GeoPoliticalID
}
# Get the modern pollen dataset.
if (!file.exists(here::here("data/MP_metadata_counts.rda"))) {
MP_metadata_counts <-
paleomat::get_modern_pollen(gpid = NAID)
save(MP_metadata_counts,
file = here::here("data/MP_metadata_counts.rda"))
}
MP_metadata_counts <- paleomat::MP_metadata_counts
# Get the fossil pollen dataset.
if (!file.exists(here::here("data/NAfossil_metadata_counts.rda"))) {
NAfossil_metadata_counts <-
paleomat::get_fossil_pollen(gpid = NAID)
save(NAfossil_metadata_counts,
file = here::here("data/NAfossil_metadata_counts.rda"))
}
NAfossil_metadata_counts <- paleomat::NAfossil_metadata_counts
```
The Neotoma Modern Pollen Database contains `r nrow(MP_metadata_counts)` samples for the United States, Canada, and Mexico, representing `r ncol(MP_metadata_counts %>% dplyr::select(ABIES:tail(names(.), 1)))` different pollen taxa.
# Data Cleaning for Fossil and Modern Pollen and Climate Data
## Load "Core Tops" Data from the Fossil Pollen Dataset
```{r load_coreTops, message = FALSE, warning = FALSE, results = 'hide'}
# Get core tops from the fossil pollen dataset and add to the modern pollen dataset. Data was originally downloaded using Tilia with a function created by Eric Grimm. The SSL certificate has now expired, but the sample numbers, dataset ID, and site ID core tops can be loaded using paleomat::coreTops.
if (!file.exists(here::here("vignettes/data/pollen_cleaned/CT_metadata_counts.rds"))) {
CT_metadata_counts <-
dplyr::left_join(paleomat::coreTops,
NAfossil_metadata_counts,
by = c("dataset.id", "site.id", "sample.id")) %>%
dplyr::mutate(type = "core top") %>%
dplyr::arrange(dataset.id) %T>%
readr::write_rds(here::here("vignettes/data/pollen_cleaned/CT_metadata_counts.rds"))
}
CT_metadata_counts <-
readr::read_rds(here::here("vignettes/data/pollen_cleaned/CT_metadata_counts.rds"))
```
## Combine Core Top and Modern Pollen Data for Continental United States
```{r Combine_CT_MP, message = FALSE, warning = FALSE, results = 'hide'}
#Combine the modern and core top data, then keep only sites that fall within the continental United states.
# If you do not want to run this code chunk to create MPCT_metadata_counts_final, then you can uncomment and run the following line:
# MPCT_metadata_counts_final <-
# readr::read_rds(here::here("vignettes/data/pollen_cleaned/MPCT_metadata_counts_final.rds"))
# Now, the two dataframes (i.e., the modern data and core top data) can be combined using this function, then sort the rows by dataset.id.
MPCT_metadata_counts <-
dplyr::bind_rows(MP_metadata_counts, CT_metadata_counts) %>%
dplyr::arrange(dataset.id) %>%
dplyr::mutate(across(-c(dataset.id:pub_year), ~ tidyr::replace_na(.x, 0))) %>%
dplyr::select(dataset.id:pub_year,
sort(tidyselect::peek_vars())) %>%
dplyr::filter(!is.na(long)) %>%
sf::st_as_sf(coords = c("long", "lat"), crs = 4326)
# Keep only modern sites in the continental United States.
# Transform the states (spatial polygons data frame) to the Coordinate Reference System (CRS) of the PRISM data.
states <-
sp::spTransform(
paleomat::states,
sp::CRS(
"+proj=longlat +datum=NAD83 +no_defs +ellps=GRS80 +towgs84=0,0,0"
)
)
# Filter out non-continental states and territories.
continental.states <-
subset(
states,
NAME != "Alaska" &
NAME != "Hawaii" &
NAME != "Puerto Rico" & NAME != "U.S. Virgin Islands"
)
# Convert sp to sf.
continental.states <- sf::st_as_sf(continental.states, crs = 4326)
# Update projection to CRS 4326, which is WGS 84.
continental.states <- sf::st_transform(continental.states, 4326)
# Need to make the geometry valid for the states shapefile.
continental.states <- sf::st_make_valid(continental.states)
continental.states_union <- sf::st_union(continental.states)
# Do an intersection and keep only the sites that are within the continental United States.
buffer <- sf::st_buffer(continental.states_union, 0)
MPCT_metadata_counts_final <-
suppressWarnings(sf::st_intersection(MPCT_metadata_counts, buffer))
# If you need to write the data to file, then you can uncomment and run the following.
# readr::write_rds(
# MPCT_metadata_counts_final,
# here::here("vignettes/data/pollen_cleaned/MPCT_metadata_counts_final.rds"))
#------End for MPCT_metadata_counts_final--------#
```
## Remove Coretops Data from the Fossil Pollen Dataset
```{r removeCT, message = FALSE, warning = FALSE, results = 'hide'}
# Remove the core tops from the fossil pollen dataset, and then limit fossil pollen sites to just the southwestern United States.
# If you do not want to run the code for creating fossil_metadata_counts_final2, then you can uncomment and run the following line:
# fossil_metadata_counts_final2 <- readr::read_rds(here::here("vignettes/data/pollen_cleaned/fossil_metadata_counts_final2.rds"))
# Remove the coretop samples from the fossil pollen dataset.
fossil_metadata_counts <-
dplyr::anti_join(NAfossil_metadata_counts,
coreTops,
by = c("dataset.id", "site.id", "sample.id")) %>%
sf::st_as_sf(coords = c("long", "lat"), crs = 4326)
# Limit fossil pollen sites to just the southwestern United States.
# First, create a bounding box for the Upper United States Southwest (after Bocinsky and Kohler 2014 - https://doi.org/10.1038/ncomms6618).
bb <-
c(
"xmin" = -113,
"xmax" = -105,
"ymin" = 31.334,
"ymax" = 38.09
) %>%
sf::st_bbox() %>%
sf::st_as_sfc() %>%
sf::st_as_sf(crs = 4326) %>%
sf::st_transform(crs = 4326)
# Do an intersection and keep only the sites that are within the southwestern United States.
fossil_metadata_counts_final2 <-
suppressWarnings(sf::st_intersection(fossil_metadata_counts, bb))
# If you need to write the data to file, then you can uncomment and run the following.
# readr::write_rds(
# fossil_metadata_counts_final2,
# here::here("vignettes/data/pollen_cleaned/fossil_metadata_counts_final2.rds"))
#------End for fossil_metadata_counts_final2--------#
```
## Add Bacon Age-depth Models to Fossil Pollen Dataset
```{r age_models, message = FALSE, warning = FALSE, results = 'hide'}
# If you do not want to run the code for creating fossil_metadata_counts_final3, then you can uncomment and run the following line:
# fossil_metadata_counts_final3 <- readr::read_rds(here::here("vignettes/data/pollen_cleaned/fossil_metadata_counts_final3.rds"))
# Keep only sites that have new age models. Most sites that are removed from fossil_metadata_counts_final2 did not have ages, and were primarily archaeological sites with one sample.
fossil_metadata_counts_final3 <- fossil_metadata_counts_final2 %>%
dplyr::filter(dataset.id %in% paleomat::bacon_age_models$dataset.id) %>%
dplyr::left_join(., paleomat::bacon_age_models, by = c("dataset.id", "site.id", "depth")) %>%
dplyr::relocate(min, max, median, mean, .after = age) %>%
dplyr::select(-site.name.y) %>%
dplyr::rename(site.name = site.name.x)
# If you need to write the data to file, then you can uncomment and run the following.
# readr::write_rds(
# fossil_metadata_counts_final3,
# here::here("vignettes/data/pollen_cleaned/fossil_metadata_counts_final3.rds"))
```
## Apply Quality Control Filters to the Fossil Pollen Dataset
```{r clean_fossil_dataset, message = FALSE, warning = FALSE, results = 'hide'}
# If you do not want to run the code for creating fossil_metadata_counts_final4, then you can uncomment and run the following line:
# fossil_metadata_counts_final4 <- readr::read_rds(here::here("vignettes/data/pollen_cleaned/fossil_metadata_counts_final4.rds"))
# Apply quality control filters to the fossil pollen dataset.
fossil_metadata_counts_final4 <- fossil_metadata_counts_final3 %>%
# Calculate the total pollen grains identified in each sample.
dplyr::mutate(total_grains = rowSums(dplyr::select(sf::st_drop_geometry(.), ABIES:XANTHIUM))) %>%
# A sample must have a minimum of 200 counted pollen grains to be included in the dataset.
dplyr::filter(total_grains >= 200) %>%
# Convert to proportions.
dplyr::mutate(dplyr::across(ABIES:XANTHIUM, ~.x / total_grains)) %>%
# Replace observations with < 1% abundance with 0.
dplyr::mutate(dplyr::across(ABIES:XANTHIUM, ~dplyr::if_else(.x < .005, 0, .x)))
# If you need to write the data to file, then you can uncomment and run the following.
# readr::write_rds(
# fossil_metadata_counts_final4,
# here::here("vignettes/data/pollen_cleaned/fossil_metadata_counts_final4.rds"))
# Remove taxa that do not occur in at least 20 samples. Retrieve column names to remove.
rm_taxa <- fossil_metadata_counts_final4 %>%
sf::st_drop_geometry() %>%
dplyr::select(ABIES:XANTHIUM) %>%
dplyr::select_if(~!(is.numeric(.) && (sum(.x > 0) > 20))) %>%
colnames()
fossil_metadata_counts_final5 <- fossil_metadata_counts_final4 %>%
# Remove taxa columns that did not have at least 20 samples.
dplyr::select(-c(dplyr::all_of(rm_taxa))) %>%
# Select sites that have 5 or more unique taxa.
dplyr::mutate(taxa_count = rowSums(!dplyr::near(dplyr::select(sf::st_drop_geometry(.), ABIES:THALICTRUM), 0))) %>%
dplyr::filter(taxa_count >= 5) %>%
dplyr::select(-taxa_count, -total_grains)
# Limit samples to post 6000 BP.
fossil_west_post5500 <- fossil_metadata_counts_final5 %>%
dplyr::filter(max <= 6000)
# If you want to re-save fossil_west_post5500.
#save(fossil_west_post5500, file = here::here("data/fossil_west_post5500.rda"))
#------End for fossil_west_post5500--------#
# Cleaning up the fossil dataset.
# If you do not want to run the code for creating fossil_final, then you can uncomment and run the following line:
# fossil_final <-
# readr::read_rds(here::here("vignettes/data/pollen_cleaned/fossil_final.rds"))
fossil_final <-
fossil_west_post5500 %>%
# Remove any rows that do not have age data.
dplyr::filter(!is.na(age)) %>%
dplyr::select(!c(site.id, site.name:pub_year)) %>%
tibble::as_tibble() %>%
# Nest the pollen data into a nested dataframe.
tidyr::nest(pollen.counts = -c(dataset.id, geometry)) %>%
sf::st_as_sf() %>%
dplyr::select(dataset.id, pollen = pollen.counts) %>%
# Add column that shows the total number of samples for each core.
dplyr::mutate(n_samples = purrr::map_dbl(pollen, nrow))
# If you need to write the data to file, then you can uncomment and run the following.
# readr::write_rds(fossil_final,
# here::here("vignettes/data/pollen_cleaned/fossil_final.rds"))
#------End for fossil_final--------#
```
## Load Prism Climate Data
The first step is to get the locations of the Modern Pollen samples. Next, we use a PRISM climate extraction script, which is adapted from [Bocinsky et al. (2016)](https://github.com/bocinsky/Bocinsky_et_al_2016/blob/master/R/Bocinsky_ET_AL_2016_PRISM_EXTRACTOR.R).
```{r extract_prism_normals, warning = FALSE}
# Extract Prism climate data for modern pollen sites and limit to the western United States.
# If you do not want to run the code for creating MPCT_climate_final, then you can uncomment and run the following line:
# MPCT_climate_final <-
# readr::read_rds(here::here("vignettes/data/pollen_cleaned/MPCT_climate_final.rds"))
MPCT_climate <-
MPCT_metadata_counts_final %>%
dplyr::select(sample.id) %>%
paleomat::extract_prism_normals() %>%
sf::st_drop_geometry() %>%
dplyr::left_join(MPCT_metadata_counts_final, by = "sample.id") %>%
dplyr::select(sample.id,
elev,
prism.normals,
geometry,
ABIES:XANTHIUM) %>%
sf::st_as_sf()
# Do an intersection and keep only the sites that are within the western continental United States.
MPCT_climate_final <-
suppressWarnings(sf::st_intersection(MPCT_climate, paleomat::west_in_states)) %>%
dplyr::select(-FID)
# If you need to write the data to file, then you can uncomment and run the following.
# readr::write_rds(
# MPCT_climate_final,
# here::here("vignettes/data/pollen_cleaned/MPCT_climate_final.rds")
# )
```
# Apply Quality Control Filters to Modern Pollen Dataset
```{r modern_quality control, warning = FALSE}
MPCT_climate_final2 <- MPCT_climate_final %>%
dplyr::mutate(total_grains = rowSums(dplyr::select(sf::st_drop_geometry(.), ABIES:XANTHIUM))) %>%
# A sample must have a minimum of 200 counted pollen grains to be included in the dataset.
dplyr::filter(total_grains >= 200) %>%
dplyr::mutate(dplyr::across(ABIES:XANTHIUM, ~.x / total_grains)) %>%
# Replace observations with < 1% abundance with 0.
dplyr::mutate(dplyr::across(ABIES:XANTHIUM, ~dplyr::if_else(.x < .005, 0, .x)))
# Remove taxa that do not occur in at least 20 samples. Retrieve column names to remove.
rm_taxa <- MPCT_climate_final2 %>%
sf::st_drop_geometry() %>%
dplyr::select(ABIES:XANTHIUM) %>%
dplyr::select_if(~!(is.numeric(.) && (sum(.x > 0) > 20))) %>%
colnames()
MP_coords <- dplyr::bind_rows(MP_metadata_counts, CT_metadata_counts) %>%
dplyr::select(sample.id, long, lat)
MPCT_climate_final3 <- MPCT_climate_final2 %>%
dplyr::select(-c(dplyr::all_of(rm_taxa))) %>%
# Select sites that have 5 or more unique taxa.
dplyr::mutate(taxa_count = rowSums(!dplyr::near(dplyr::select(sf::st_drop_geometry(.), ABIES:ULMUS), 0))) %>%
dplyr::filter(taxa_count >= 5) %>%
dplyr::select(-taxa_count, -total_grains) %>%
sf::st_drop_geometry() %>%
dplyr::left_join(., MP_coords, by = "sample.id") %>%
tidyr::nest(pollen.counts = -c(sample.id, elev, prism.normals, long, lat)) %>%
dplyr::select(sample.id,
elev,
climate = prism.normals,
pollen = pollen.counts,
long,
lat) %>%
sf::st_as_sf(coords = c("long", "lat"), crs = 4326)
# Check that the recorded elevation from Neotoma is not too different from elevation (using the elevatr package) of the location.
df_elev_epqs <- elevatr::get_elev_point(MPCT_climate_final3 %>% dplyr::select(-c(climate, pollen)), prj = 4326, src = "aws") %>%
# Calculate the absolute difference between the elevations.
dplyr::mutate(diff = abs(elevation - elev))
# Remove samples that do not match the DEM.
mismatch <- df_elev_epqs %>%
dplyr::filter(diff > 1000) %>%
dplyr::pull(sample.id)
MPCT_climate_final3 <- MPCT_climate_final3 %>%
dplyr::filter(!sample.id %in% mismatch) %>%
dplyr::select(-elev)
```
## Finalize Modern and Fossil Pollen Datasets
```{r finalize-datasets, cache = TRUE, results = 'asis', warning = FALSE}
# Modern Dataset
# Clean up modern pollen and climate dataframe to prepare for transfer function and reconstruction diagnostics. We have already changed the data to proportions, so just select the climate variables of interest.
# If you do not want to run the code for creating MPCT_proportions, then you can uncomment and run the following line:
# MPCT_proportions <- paleomat::MPCT_proportions
MPCT_proportions <-
MPCT_climate_final3 %>%
dplyr::mutate(
climate =
purrr::map(climate,
function(x) {
x %>%
dplyr::select(month, tavg)
})
)
# If you need to write the data to file, then you can uncomment and run the following.
# save(MPCT_proportions, file = here::here("data/MPCT_proportions.rda"))
# Fossil Dataset
# If you do not want to run the code for creating fossil_pollen, then you can uncomment and run the following line:
# fossil_pollen <- paleomat::fossil_pollen
# Prepare fossil pollen dataset for transfer function and reconstruction diagnostics.
fossil_final$pollen <-
purrr::map(fossil_final$pollen, function(x)
tibble::column_to_rownames(x, "sample.id"))
fossil_pollen <- fossil_final
# If you need to write the data to file, then you can uncomment and run the following.
# save(fossil_pollen, file = here::here("data/fossil_pollen.rda"))
# Keep only taxa that are the same between the modern and fossil pollen dataset.
keep_cols <- intersect(colnames(dplyr::bind_rows(fossil_pollen$pollen) %>% dplyr::select(-Other)), colnames(dplyr::bind_rows(MPCT_proportions$pollen)))
fossil_pollen <- fossil_pollen %>%
dplyr::mutate(pollen =
purrr::map(pollen,
function(x) {
x %>%
dplyr::select(dplyr::all_of(keep_cols))
}))
MPCT_proportions <- MPCT_proportions %>%
dplyr::mutate(pollen =
purrr::map(pollen,
function(x) {
x %>%
dplyr::select(dplyr::all_of(keep_cols))
}))
# Put the modern pollen data into a dataframe and make the sample.id as the row names.
spp <- dplyr::bind_rows(MPCT_proportions$pollen) %>%
dplyr::bind_cols(sample.id = MPCT_proportions$sample.id, .) %>%
tibble::column_to_rownames("sample.id")
# Put the modern climate data into a dataframe and make the sample.id as the row names.
env <- MPCT_proportions %>%
dplyr::select(sample.id, climate) %>%
tidyr::unnest(climate) %>%
# Pull just the climate (tavg) values as a vector.
dplyr::pull(tavg)
# Turn climate value (tavg) vector into a named vector with the sample ids.
names(env) <- c(MPCT_proportions$sample.id)
```
# Transfer Function and Reconstruction Diagnostics and Reconstruction
Here, we use several transfer function and reconstruction diagnostics, such as squared residual length, the proportion of variance in the fossil data explained by an environmental reconstruction, and the no-analog threshold.
## Squared-residual Length
**Step 1**: We use squared residual length. Here, we return the residual length for each sample, and the 0.95 quantile.
```{r squared_residual_length, cache = TRUE, results = 'asis', warning = FALSE}
# If you do not want to run the code for creating rlen_results, then you can uncomment and run the following line:
# rlen_results <- readr::read_rds(here::here("vignettes/data/diagnostics_reconstruction/rlen_results.rds"))
rlen_results <-
purrr::map(
fossil_pollen$pollen,
.f = function(z) {
rlen <- analogue::residLen(spp, env, z, method = "cca")
list(res_lengths = stack(rlen$passive),
quantile_95 = as.numeric(quantile(rlen$train, probs = 0.95)))
}
)
# If you need to write the data to file, then you can uncomment and run the following.
# readr::write_rds(
# rlen_results,
# here::here("vignettes/data/diagnostics_reconstruction/rlen_results.rds"))
# Extract the 0.95 quantile baseline to be used for sample removal.
rlen_ninety5quant <- rlen_results[[1]]$quantile_95
# Put the squared residual length and sample.id into one dataframe.
rlen_results <- purrr::map_dfr(rlen_results, `[[`, 1) %>%
dplyr::rename("Squared_res_lengths" = 1, "sample.id" = 2)
# Extract the sample.ids for samples that will need to be removed since they did not meet the 0.95 threshold.
rlen_removal <-
as.numeric(levels(rlen_results$sample.id))[rlen_results$Squared_res_lengths > rlen_ninety5quant]
# Results from rlen with sample.id for last 6000 years.
rlen_full_results1 <- rlen_results %>%
dplyr::mutate(sample.id = as.integer(levels(sample.id))) %>%
dplyr::filter(sample.id %in% fossil_west_post5500$sample.id)
# If you do not want to run the code for creating rlen_full_results2, then you can uncomment and run the following line:
# rlen_full_results2 <- readr::read_rds(here::here("vignettes/data/diagnostics_reconstruction/rlen_full_results2.rds"))
# Putting in the rlen results into a list structure.
rlen_full_results2 <-
structure(list(res_lengths = rlen_full_results1,
quantile_95 = rlen_ninety5quant),
class = "list")
# If you need to write the data to file, then you can uncomment and run the following.
# readr::write_rds(
# rlen_full_results2,
# here::here("vignettes/data/diagnostics_reconstruction/rlen_full_results2.rds"))
# Creating a copy of fossil pollen to remove samples that did not meet the 0.95 threshold from the fossil pollen dataset.
fos <- fossil_pollen
fos$pollen <-
purrr::map(
fos$pollen,
~ .x %>% tibble::rownames_to_column("sample.id") %>% dplyr::filter(!sample.id %in% rlen_removal) %>% tibble::column_to_rownames("sample.id")
)
# Cleaning up fos to get the number of samples that are within each site. If a site is left with 0 or 1 samples, then it is removed.
# If you do not want to run the code for creating fos2, then you can uncomment and run the following line:
# fos2 <- readr::read_rds(here::here("vignettes/data/diagnostics_reconstruction/fos2.rds"))
# Remove any sites that have 0 samples from the fossil pollen dataset.
fos2 <- fos %>%
dplyr::select(-n_samples) %>%
dplyr::mutate(rows = purrr::map_dbl(pollen, ~ nrow(.x))) %>%
dplyr::filter(!rows %in% c(0))
# If you need to write the data to file, then you can uncomment and run the following.
# readr::write_rds(fos2,
# here::here("vignettes/data/diagnostics_reconstruction/fos2.rds"))
```
## randomTF Testing
You can explore the fossil pollen sites using the Random TF function to understand how much of the variance can be explained by tJuly. We do not use this step to remove any sites.
```{r randomTF, cache = TRUE, results = 'asis', warning = FALSE}
# # This line takes approximately 40 minutes to run (depending on your system). If you would like to re-run, then you can run the interior contents of the if statement.
# if (!file.exists(here::here(
# "vignettes/data/diagnostics_reconstruction/sig_results2_final.rds"
# ))) {
# sig_results2_full <-
# purrr::map2(fos2$pollen, fos2$dataset.id, function(x, y) {
# output <- palaeoSig::randomTF(
# spp = spp,
# env = data.frame(tjul = env),
# fos = x,
# n = 10000,
# fun = MAT,
# col = 1
# )
# prop <- output$EX
# names(prop) <- y
# sig_out <- output$sig
# names(sig_out) <- y
# return(list(prop, sig_out))
# })
#
# # Extract the proportion explained variance and the dataset ID.
# sig_results2_proportions <- sapply(sig_results2_full, "[[", 1) %>%
# stack(.) %>%
# dplyr::rename(dataset.id = ind, proportion = values)
#
# # Extract the p-values and the dataset ID.
# sig_results2_pvalues <- sapply(sig_results2_full, "[[", 2) %>%
# stack(.) %>%
# dplyr::rename(dataset.id = ind, pvalue = values)
#
# # Put the proportion of variance and p-value into the same dataframe.
# sig_results2_final <-
# dplyr::left_join(sig_results2_proportions, sig_results2_pvalues, by = "dataset.id") %>%
# dplyr::select(dataset.id, proportion, pvalue) %>%
# dplyr::mutate(dataset.id = as.numeric(levels(dataset.id)))
#
# readr::write_rds(
# sig_results2_final,
# here::here(
# "vignettes/data/diagnostics_reconstruction/sig_results2_final.rds"
# )
# )
# }
# sig_results2_final <-
# readr::read_rds(here::here("vignettes/data/diagnostics_reconstruction/sig_results2_final.rds"))
#
# # Pull out the dataset IDs that have p-values less than the 0.10.
# only_90b <- sig_results2_final %>%
# dplyr::filter(pvalue <= 0.10) %>%
# dplyr::pull(dataset.id)
#
# # Finally, subset the fossil pollen dataset by keeping only the dataset IDs that passed randomTF.
# fos3_90 <- fos %>%
# dplyr::filter(dataset.id %in% only_90b)
#
```
## MAT Transfer Function and Prediction
### Spatial Autocorrelation
```{r spatial_autocorrelation, cache = TRUE, results = 'asis', warning = FALSE}
# To reduce the effects of spatial autocorrelation, we first get the geographic euclidean distance between fossil and modern pollen sites. So, that we do not use samples within 300 km of the target site.
# Get list of modern pollen samples and geometry.
fos3_90 <- fos2
modern_samples <- MPCT_proportions %>%
dplyr::select(sample.id, geometry)
# Get list of fossil samples and geometry.
fossil_samples <- fos3_90 %>%
dplyr::mutate(pollen = purrr::map(
pollen,
~ .x %>% tibble::rownames_to_column("sample.id") %>% dplyr::select(sample.id)
)) %>%
tidyr::unnest(cols = "pollen") %>%
dplyr::select(sample.id, geometry)
# Calculate distances between all modern and fossil samples.
sample_dist <- sf::st_distance(modern_samples, fossil_samples) %>%
data.frame() %>%
dplyr::mutate(dplyr::across(dplyr::everything(), ~as.numeric(.x)/1000))
colnames(sample_dist) <- fossil_samples$sample.id
rownames(sample_dist) <- modern_samples$sample.id
```
**Step 2**: Run MAT, then do prediction/reconstruction.
```{r MAT_transfer_prediction, cache = TRUE, results = 'asis', warning = FALSE}
# Remove any columns in the modern pollen dataset that have 0 for the pollen taxa.
pollen_input <- spp %>%
dplyr::select(which(colSums(.) > 0))
# Remove any columns in the fossil pollen dataset that have 0 for the pollen taxa.
fossil_input <- do.call(rbind, unname(fos3_90$pollen)) %>%
dplyr::select(which(colSums(.) > 0))
# Keep only columns that are the same between the modern and fossil pollen dataset, as they need to be the same to run MAT.
keep_cols <- colnames(pollen_input[colSums(pollen_input) > 0])
keep_cols <- keep_cols[keep_cols %in% colnames(fossil_input)]
pollen_input <- pollen_input[, keep_cols]
fossil_input <- fossil_input[, keep_cols]
# Exclude modern samples if sample is within h distance to target (fossil pollen sample). Here, I get a list of the modern samples that are greater than 300 km for each site.
sample_dist2 <- sample_dist %>%
tibble::rownames_to_column("modern.sample.id") %>%
tidyr::pivot_longer(
-modern.sample.id,
names_to = "fossil.sample.id",
values_to = "dist_km",
values_drop_na = TRUE
) %>%
dplyr::filter(dist_km > 300) %>%
dplyr::select(-dist_km) %>%
dplyr::group_by(fossil.sample.id) %>%
dplyr::mutate(modern.sample.id = as.numeric(modern.sample.id)) %>%
dplyr::summarize(modern.sample.id = list(unique(modern.sample.id))) %>%
dplyr::mutate(fossil.sample.id = as.numeric(fossil.sample.id)) %>%
dplyr::left_join(
.,
fossil_west_post5500 %>%
dplyr::select(sample.id, site.id) %>%
sf::st_drop_geometry(),
by = c("fossil.sample.id" = "sample.id")
) %>%
dplyr::select(-fossil.sample.id) %>%
dplyr::distinct()
# Put fossil pollen data into a list where each site is its own dataframe.
fossil_input2 <- fossil_input %>%
tibble::rownames_to_column("sample.id") %>%
dplyr::mutate(sample.id = as.integer(sample.id)) %>%
dplyr::left_join(
.,
fossil_west_post5500 %>%
dplyr::select(sample.id, site.id) %>%
sf::st_drop_geometry(),
by = "sample.id"
) %>%
dplyr::group_split(site.id) %>%
purrr::set_names(purrr::map_chr(., ~ as.character(.x$site.id[1]))) %>%
purrr::map(.,
~ .x %>%
dplyr::select(-site.id) %>%
tibble::column_to_rownames("sample.id"))
# If you want to run the code for creating reconstructions (i.e., the predict.mat and predict.mat bootstrap output for each site), then you can uncomment the section below (the code takes approximately 30 minutes to run, depending on your machine). Below, we load output files from the model to save time.
# You can use this line once you have ran the reconstruction. The file output was too large to load into GitHub.
# reconstructions <- readRDS(here::here("vignettes/data/full_reconstruction_output_FINAL.rds"))
# Iterate over the fossil pollen data to create reconstructions for each transfer function.
# reconstructions <-
# purrr::map2(.x = fossil_input2, .y = as.numeric(names(fossil_input2)), function(.x, .y) {
# modern_samps <- sample_dist2 %>%
# dplyr::filter(site.id == .y) %>%
# dplyr::select(modern.sample.id) %>%
# unlist() %>%
# unname()
#
# pollen_input2 <- pollen_input %>%
# tibble::rownames_to_column("sample.id") %>%
# dplyr::mutate(sample.id = as.integer(sample.id)) %>%
# dplyr::filter(sample.id %in% modern_samps) %>%
# tibble::column_to_rownames("sample.id")
#
# env2 <- env[as.numeric(names(env)) %in% modern_samps]
#
# noK <- 4
# # Create the transfer function using mat from the analogue package.
# swap.mat2 <-
# analogue::mat(pollen_input2, env2, method = "SQchord")
#
# # Create the transfer function using mat from the analogue package.
# rlgh.mat2 <-
# analogue:::predict.mat(object = swap.mat2,
# newdata = .x,
# k = noK)
#
# # Create the transfer function using mat from the analogue package. Bootstrapping takes approximately 5-10 minutes (depending on your system).
# rlgh.mat2b4 <-
# analogue:::predict.mat(
# object = swap.mat2,
# newdata = .x,
# k = noK,
# bootstrap = TRUE
# )
#
# mat_prediction_4 <-
# as.data.frame(rlgh.mat2b4$predictions$model$predicted[noK, ]) %>%
# dplyr::mutate(site.id = as.numeric(.y)) %>%
# tibble::rownames_to_column("sample.id") %>%
# dplyr::select(sample.id, site.id, value = 2) %>%
# # Pull out the bootstrap predicted values.
# dplyr::left_join(
# .,
# as.data.frame(rlgh.mat2b4[["predictions"]][["bootstrap"]][["predicted"]][, noK]) %>%
# dplyr::mutate(site.id = as.numeric(.y)) %>%
# tibble::rownames_to_column("sample.id") %>%
# dplyr::select(sample.id, site.id, bootstrap = 2),
# by = c("sample.id", "site.id")
# ) %>%
# # Pull out the bootstrap rmsep.
# dplyr::left_join(
# .,
# as.data.frame(rlgh.mat2b4[["predictions"]][["sample.errors"]][["rmsep"]][, noK]) %>%
# dplyr::mutate(site.id = as.numeric(.y)) %>%
# tibble::rownames_to_column("sample.id") %>%
# dplyr::select(sample.id, site.id, rmsep = 2),
# by = c("sample.id", "site.id")
# ) %>%
# # Pull out the bootstrap s1 errors.
# dplyr::left_join(
# .,
# as.data.frame(rlgh.mat2b4[["predictions"]][["sample.errors"]][["s1"]][, noK]) %>%
# dplyr::mutate(site.id = as.numeric(.y)) %>%
# tibble::rownames_to_column("sample.id") %>%
# dplyr::select(sample.id, site.id, s1 = 2),
# by = c("sample.id", "site.id")
# )
#
# return(
# list(
# fossil.sample.id = .y,
# recon = rlgh.mat2,
# recon_bootstrap = rlgh.mat2b4,
# recon_df = mat_prediction_4
# )
# )
#
# })
# If you need to write the data to file, then you can uncomment and run the following.
# saveRDS(reconstructions, here::here("vignettes/data/full_reconstruction_output_FINAL.rds"))
# Save output of MAT.
# If you do not want to run the code for creating mat_transfer, then you can uncomment and run the following line:
# mat_transfer <- readr::read_rds(here::here("vignettes/data/diagnostics_reconstruction/mat_calibration.rds"))
# Create the transfer function using mat from the analogue package.
mat_transfer <- analogue::mat(pollen_input, env, method = "SQchord")
noK <- 4
# If you need to write the data to file, then you can uncomment and run the following.
# readr::write_rds(
# mat_transfer,
# here::here("vignettes/data/diagnostics_reconstruction/mat_calibration.rds"))
# Run the prediction. If you do not want to run the code for creating mat_predict, then you can uncomment and run the following line:
# mat_predict <- readr::read_rds(here::here("vignettes/data/diagnostics_reconstruction/mat_prediction.rds"))
# Create the transfer function using mat from the analogue package.
mat_predict <-
analogue:::predict.mat(object = mat_transfer,
newdata = fossil_input,
k = noK)
# If you need to write the data to file, then you can uncomment and run the following.
# readr::write_rds(
# mat_predict,
# here::here("vignettes/data/diagnostics_reconstruction/mat_prediction.rds"))
# Here, we pull out the data from the predict.mat objects. First, we pull out the predicted values for the four best analogues. In the analogue package, the analogues have cumulative means, so we only need to extract the four record here.
# Since, I do not run the reconstructions script here. I saved the output as a tibble.
mat_prediction_4 <- readr::read_rds(here::here("vignettes/data/diagnostics_reconstruction/mat_prediction_4.rds"))
# mat_prediction_4 <-
# purrr::map_df(reconstructions, "recon_df")
# If you need to write the data to file, then you can uncomment and run the following.
# readr::write_rds(
# mat_prediction_4,
# here::here("vignettes/data/diagnostics_reconstruction/mat_prediction_4.rds"))
# Since, I do not run the reconstructions script here. I saved the output as a tibble.
predictMat <- readr::read_rds(here::here("vignettes/data/diagnostics_reconstruction/predictMat.rds"))
# predictMat <- purrr::map(reconstructions, "recon")
# If you need to write the data to file, then you can uncomment and run the following.
# readr::write_rds(
# predictMat,
# here::here("vignettes/data/diagnostics_reconstruction/predictMat.rds"))
```
## No Analog Threshold (or Minimum Dissimilarity)
**Step 3**: Remove any fossil pollen samples that have a minimum dissimilarity that is greater than the 5% quantile.
```{r no_analog, cache = TRUE, results = 'asis', warning = FALSE}
# If you do not want to run the code for creating minDC_full_results, then you can uncomment and run the following line:
# minDC_full_results <- readr::read_rds(here::here("vignettes/data/diagnostics_reconstruction/minDC_full_results.rds"))
# Get a list of the samples that remain in the fossil pollen dataset after rlen and the randomTF removal steps.
pre_minDC_samples <- rlen_full_results1 %>%
dplyr::filter(!sample.id %in% rlen_removal) %>%
dplyr::left_join(
.,
fossil_west_post5500 %>%
dplyr::select(dataset.id, site.id, sample.id),
by = "sample.id"
) %>%
#dplyr::filter(dataset.id %in% only_90b) %>%
dplyr::pull(sample.id)
rlgh.mdc2 <- predictMat %>%
purrr::map_df(., "minDC") %>%
tidyr::pivot_longer(everything(), names_to = "ind", values_to = "values", values_drop_na = TRUE)
# Put the results into a list.
minDC_results <- rlgh.mdc2 %>%
dplyr::mutate(ind = as.integer(ind)) %>%
dplyr::filter(ind %in% pre_minDC_samples) %>%
dplyr::left_join(
.,
fossil_west_post5500 %>%
dplyr::select(sample.id, dataset.id, site.id, age) %>%
dplyr::mutate(age = paleomat::BPtoBCAD(age)) %>%
sf::st_drop_geometry(),
by = c("ind" = "sample.id")
)
# Put the results into a list structure.
minDC_full_results <- structure(list(minDC = minDC_results,
quantiles = mat_predict$quantiles),
class = "list")
# If you need to write the data to file, then you can uncomment and run the following.
# readr::write_rds(
# minDC_full_results,
# here::here("vignettes/data/diagnostics_reconstruction/minDC_full_results.rds"))
# Put data into a list. Create list of samples that did and did not meet the threshold, and subset the fossil pollen data to the samples that did pass the minDC test.
# If you do not want to run the code for creating minDC_summary, then you can uncomment and run the following line:
# minDC_summary <- readr::read_rds(here::here("vignettes/data/diagnostics_reconstruction/minDC_summary.rds"))
# Get sample IDs and dissimilarity for samples that did not pass the threshold (i.e., samples that will need to be removed).
no_analog_threshold <- mat_predict[["quantiles"]][["5%"]]
remove_samples <- rlgh.mdc2 %>%
dplyr::rename(dissimilarity = 2, sample.id = ind) %>%
dplyr::filter(dissimilarity > no_analog_threshold)
# Remove the samples from the dataset.
mat_prediction_cleaned <- mat_prediction_4 %>%
dplyr::filter(!sample.id %in% remove_samples$sample.id)
# Final Sample set.
final_approved_samples <- minDC_full_results$minDC %>%
dplyr::filter(!ind %in% remove_samples$sample.id) %>%
dplyr::pull(ind)
# Put into a list structure.
minDC_summary <- list(
remove_samples = remove_samples,
mat_prediction_cleaned = mat_prediction_cleaned,
final_approved_samples = final_approved_samples
)
# If you need to write the data to file, then you can uncomment and run the following.
# readr::write_rds(
# minDC_summary,
# here::here("vignettes/data/diagnostics_reconstruction/minDC_summary.rds"))
```
## Clean Up Data
Clean up fossil pollen data to create the final dataset of samples.
```{r clean_up_data, cache = TRUE, results = 'asis', warning = FALSE}
# If you do not want to run the code for creating fos_90, then you can uncomment and run the following line:
# fos_90 <- readr::read_rds(here::here("vignettes/data/diagnostics_reconstruction/fos_90.rds"))
fos_90 <- mat_prediction_cleaned %>%
dplyr::mutate(sample.id = as.integer(sample.id)) %>%
dplyr::filter(sample.id %in% final_approved_samples) %>%
dplyr::left_join(.,
fossil_west_post5500 %>%
dplyr::select(dataset.id:pub_year),
by = "sample.id") %>%
dplyr::filter(age <= 6000) %>%
dplyr::mutate(age = round(age)) %>%
sf::st_as_sf()
# If you need to write the data to file, then you can uncomment and run the following.
# readr::write_rds(fos_90,
# here::here("vignettes/data/diagnostics_reconstruction/fos_90.rds"))
```
# Anomaly Calculation
```{r anomaly_calculation, cache = TRUE, results = 'asis', warning = FALSE}
# Extract the modern mean temperature at each fossil pollen site, and then calculate the anomalies.
# If you do not want to run the code for creating fos_90, then you can uncomment and run the following line:
# site.preds.unlisted.anom <- readr::read_rds(here::here("vignettes/data/diagnostics_reconstruction/final_paleomat_dataset3.rds"))
# Put lat and long into individual columns.
tavg_new_90 <- fos_90 %>%
dplyr::mutate(long = unlist(purrr::map(.$geometry, 1)),
lat = unlist(purrr::map(.$geometry, 2))) %>%
# Remove/drop geometry.
dplyr::select(-geometry) %>%
sf::st_drop_geometry() %>%
dplyr::mutate(age = mean) %>%
# Create a date column with BC/AD ages.
dplyr::mutate(date = paleomat::BPtoBCAD(mean)) %>%