Auto-updating Landsat 8 mosaics from AWS SNS notifications.
AWS stores an open, freely-accessible data set of Landsat 8 imagery. Crucially, this data is stored in Cloud-Optimized GeoTIFF (COG), an extension of the GeoTIFF standard which specifies a smart internal layout for image overviews. By reading the image's header, an application can understand the byte ranges of different parts of the image, and can read them using HTTP range requests without needing to download the entire file.
This file type allows for new cloud-native processing models. For example, you can serve a basemap of Landsat 8 imagery using serverless AWS Lambda functions. This is a huge advance in technology because it allows for serving satellite imagery without needing to pregenerate and store any imagery. This enables huge cost savings, especially for hobby projects, where just storing hundreds of GB or TB of data would be cost-prohibitive.
There exists an AWS Simple Notification Service (SNS) Topic that creates notifications when new Landsat data are added to the AWS open data set. This library defines an AWS Lambda function to run when those notifications are sent, and update two DynamoDB databases with identifiers for the most recent imagery per mercator tile.
This library does not provide on-the-fly image tiling. For that, look at
awspds-mosaic
.
Also note that this library creates a new DynamoDB table but does not populate it with initial values: it only updates the table as new imagery comes in. See instructions below to create an initial Landsat mosaic.
git clone https://github.com/kylebarron/landsat-mosaic-latest
cd landsat-mosaic-latest
pip install .
Unless you have specific requirements, you can skip this section.
Landsat images are produced in a grid of paths and rows. In order to keep things simple, eliminate geospatial dependencies, and create efficient an MosaicJSON, this package relies on a prebuilt index that associates those path-row combinations to the mercator tile quadkeys used by the tiler.
By default, this library ships with a worldwide index at quadkey zoom level 8. If you need a differing quadkey zoom or want to restrict your mosaic to a geographic bounding box, you can build your own index.
The script to create an index is stored in
landsat-cogeo-mosaic
. For full instructions, see its
project docs.
The standard index bundled by default with landsat-mosaic-latest
is created
with:
landsat-cogeo-mosaic index \
`# Path to Shapefile of path-row geometries` \
--wrs-path data/WRS2_descending_0/WRS2_descending.shp \
`# Path to CSV of scene metadata downloaded from AWS S3` \
--scene-path data/scene_list.gz \
`# Worldwide bounds` \
--bounds '-180,-90,180,90' \
`# Quadkey zoom` \
--quadkey-zoom 8 \
| gzip \
> landsat_mosaic_latest/data/index.json.gz
Note it's currently imperative to write the index to that exact location in order to be properly found during runtime.
If you wish to change the quadkey index file, as described above, make sure you do that before building, as that file will be included in the lambda bundle.
Then building is simple: (requires Docker and Make)
make package
This creates a package.zip
file in the current directory with this package's
code and any required dependencies. This will be uploaded to AWS in the next
step.
To simplify deployment, this package uses the Serverless framework. Refer to
their docs to install
the sls
command line library and authorize it with your AWS credentials.
By default deployment creates two DynamoDB tables, one for the absolute latest imagery, another for the latest low-cloud imagery.
Then it's simple to deploy this stack with a single line:
sls deploy \
--table-name landsat-mosaic-latest \
--cloudless-table-name landsat-mosaic-latest-cloudless \
--max-cloud-cover 5
table-name
is the name given to the DynamoDB table without a cloud cover filter. You'll need to provide this information to the tiler when serving imagery. Default:landsat-mosaic-latest-cloudless
.cloudless-table-name
is the name given to the DynamoDB table that uses the cloud cover filter below. Defaultlandsat-mosaic-latest
.max-cloud-cover
is an integer between 0 and 100 that defines the maximum percent cloud cover permitted for new imagery into the cloudless DynamoDB table. If a new Landsat scene has cloud cover greater than the given percent, it will only be added to the non-cloudless DynamoDB table. Default5
.
This library creates a new DynamoDB table but does not populate it with initial
values: it only updates the table as new imagery comes in. To create an inital
Landsat mosaic, we'll use landsat-cogeo-mosaic
.
Note that when creating an initial MosaicJSON, you should use the same path-row
index as in the serverless function. The below commands point to the default,
bundled index.json.gz
.
Install cogeo-mosaic
and landsat-cogeo-mosaic
:
pip install "cogeo-mosaic>=3.0a3" landsat-cogeo-mosaic
For up-to-date instructions, see landsat-cogeo-mosaic
docs. But it's roughly:
git clone https://github.com/kylebarron/landsat-cogeo-mosaic/
cd landsat-cogeo-mosaic
mkdir -p data/
aws s3 cp s3://landsat-pds/c1/L8/scene_list.gz data/
gunzip -c data/scene_list.gz > data/scene_list
cd data/
sqlite3 scene_list.db < ../scripts/csv_import.sql
cd -
Then scene_list.db
is the database to be used with the --sqlite-path
argument below.
Some Landsat 8 path-row combinations have never had a scene with cloud cover
<5%. This command will automatically relax the --max-cloud
restriction until
it finds a result for each path-row.
Assuming you have cloned and are in the landsat-mosaic-latest
repository:
landsat-cogeo-mosaic create-from-db \
`# Path to the sqlite database file` \
--sqlite-path ../landsat-cogeo-mosaic/data/scene_list.db \
`# Path to the path-row geometry file` \
--pathrow-index landsat_mosaic_latest/data/index.json.gz \
`# Min zoom of mosaic, 7 is a good default for Landsat` \
--min-zoom 7 \
`# Max zoom of mosaic, 12 is a good default for Landsat` \
--max-zoom 12 \
`# Maximum cloud cover. This means 5%` \
--max-cloud 5 \
`# Preference for choosing the asset for a tile` \
--sort-preference newest \
> mosaic_cloudless_latest.json
This is almost the same as the latest cloudless command, except that it removes
the --max-cloud
argument.
Assuming you have cloned and are in the landsat-mosaic-latest
repository:
landsat-cogeo-mosaic create-from-db \
`# Path to the sqlite database file` \
--sqlite-path ../landsat-cogeo-mosaic/data/scene_list.db \
`# Path to the path-row geometry file` \
--pathrow-index landsat_mosaic_latest/data/index.json.gz \
`# Min zoom of mosaic, 7 is a good default for Landsat` \
--min-zoom 7 \
`# Max zoom of mosaic, 12 is a good default for Landsat` \
--max-zoom 12 \
`# Preference for choosing the asset for a tile` \
--sort-preference newest \
> mosaic_latest.json
Then upload these two generated MosaicJSON files to DynamoDB. The --url
argument must match the names given to the DynamoDB tables in the sls deploy
step.
Note: This will overwrite any existing data in the DynamoDB table.
cogeo-mosaic upload \
--url 'dynamodb://us-west-2/landsat-mosaic-latest' \
mosaic_latest.json
cogeo-mosaic upload \
--url 'dynamodb://us-west-2/landsat-mosaic-latest-cloudless' \
mosaic_cloudless_latest.json
$2.64 per year is a rough estimate of the cost to keep each DynamoDB table updated.
Note that actually serving imagery using a tiler is not included in this estimate.
Time:
- $ per 100ms: 0.0000016667 (when set to 1024mb memory. From a simple test, it looks like setting to lower memory doesn't reduce cost because it takes proportionally longer.)
- Rough # of 100ms when the scene is not cloudy: 10
- Percentage of time when scene is below max cloud cover: 0.3
- Scenes per day: ~750
- Days per year: 365
Roughly $1.36/year for the time cost with these estimates.
Requests:
- $ 0.20 per 1M requests
- Scenes per day: 750
- Days per year: 365
Roughly $0.05/year.
Reads:
- Scenes per day: 750
- Percentage of time when scene is below max cloud cover: 0.3
- Quadkeys per scene: ~10
- 1 read per quadkey
- Days per year: 365
- $0.25 per million reads
Roughly $0.21/year.
Writes:
- Scenes per day: 750
- Percentage of time when scene is below max cloud cover: 0.3
- Quadkeys per scene: ~10
- 1 write per quadkey
- Days per year: 365
- $1.25 per million reads
Roughly $1.02/year.