Causes, consequences, and cross-scale linkages of climate-driven phenological mismatch across three tropic levels
This repository contains scripts to help research and outreach efforts. There are other GitHub repositories for this project maintained by the various research labs.
All data is stored in either a SQLite3 or PostgreSQL (PostGIS) database. We have gathered and consolidated the data from the sources below. Each source dataset records different things and therefore contains very different data. Our schema distills the common information needed for our study into a relational model and keeps the unique data, for further analysis, in NoSQL JSON blobs which are attached to each relational record.
- North American Breeding Bird Survey (BBS).
- MAPS: Monitoring Avian Productivity and Survivorship (MAPS).
- eBird Basic Dataset. This dataset has been culled to take everything between 20⁰ and 90⁰ latitude and -95⁰ and -50⁰ longitude. We are also only taking "complete" and "approved" checklists. Additionally, we are also only taking data for ~120 migratory bird species.
- Pollard butterfly dataset.
- NABA butterfly dataset.
- Caterpillar Counts dataset.
- NestWatch dataset.
- USGS Bird Banding Laboratory dataset.
There are 5 primary tables in the database:
- The Taxa table that hold data specific to each taxon.
- The Dataset table that hold information about where and when we extracted the data.
- The Place table contains where observations occur. Most datasets sample data repeatedly at designated locations. NoSQL place data is contained in the place_json field.
- The Event table holds when, how, and by whom data about the observation. NoSQL event data is in the event_json field.
- The Count table holds what was observed and how many birds or leps were observed during the event. count_json holds the NoSQL data.
Some record counts for the datasets:
Dataset | Place Records | Event Records | Count Records | Notes |
---|---|---|---|---|
BBS | 5,690 | 122,925 | > 6M | |
MAPS | 1,224 | 619,335 | > 2M | |
NestWatch | 65,063 | 503,510 | 647,212 | |
Pollard | 760 | 86,996 | 86,958 | |
NABA | 1,132 | 2,135 | 305,810 | |
eBird | 1,986,208 | 16,820,802 | > 120M | Culled from > 650M records |
See this R script for how to access the sqlite database in R.
Most of the scripts in the lib directory access the database. I have moved common code into this library. A some sample_queries that use this library are in this Python script.