Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mirror demo landsat-c212 collections to mcp-test and mcp-prod #134

Open
2 tasks
anayeaye opened this issue May 29, 2024 · 5 comments
Open
2 tasks

Mirror demo landsat-c212 collections to mcp-test and mcp-prod #134

anayeaye opened this issue May 29, 2024 · 5 comments
Assignees

Comments

@anayeaye
Copy link
Contributor

anayeaye commented May 29, 2024

What

Seven demo lansdat spotlight collections were published to the staging catalog that need to be mirrored in the production account. Because these collections refer to externally hosted data (LPDAAC) and contain custom provider metadata we need to create (or re-use a one-off script to mirror this metadata in the production STAC catalog.

Note this is NOT a transfer or s3 discovery task, all we want to do is mirror the metadata from staging in the production catalog

Suggested steps

For each collection

  1. correct or remove the item_assets (there is no cog_default but there are many band-specific assets)
  2. Keep the summaries object (we can manually create it but there isn't an automated way to generate this for items we are not ingesting via airflow)
  3. publish collection and add to veda-data/production/collections
  4. use a stac client to select all items for this collection in the staging stac catalog and for each:
  • remove any self-referential links, i.e. self, parent, collection, root
  • publish the item using the ingestions/ endpoint

collections

'landsat-c2l2-sr-antarctic-glaciers-pine-island',
'landsat-c2l2-sr-antarctic-glaciers-thwaites',
'landsat-c2l2-sr-lakes-aral-sea',
'landsat-c2l2-sr-lakes-lake-balaton',
'landsat-c2l2-sr-lakes-lake-biwa',
'landsat-c2l2-sr-lakes-tonle-sap',
'landsat-c2l2-sr-lakes-vanern'

AC

  • collection metadata restored/added to ingestion-data/production/collections with existing summaries preserved and with corrected/removed item_assets
  • new or updated notebook for publishing a mirror of these collections to the production STAC catalog added to transformation-scripts/
@anayeaye anayeaye self-assigned this May 31, 2024
@anayeaye
Copy link
Contributor Author

anayeaye commented May 31, 2024

Blocked by ingest api role (not assumed properly? not clear if anything else is different for viewing lpdaac vs the accessibility check) but progress checked in #138 (collection and item meta data corrections complete, currently ingest api feels it cannot access the lpdaac assets)

@j08lue
Copy link
Contributor

j08lue commented Jun 3, 2024

Just in case someone wonders why we even have them - these collections are not featured in the VEDA Earthdata Dashboard, but in the EO Dashboard, e.g. https://eodashboard.org/story?id=nasa-thwaites.

@anayeaye
Copy link
Contributor Author

anayeaye commented Jun 3, 2024

EDIT: it is not the role, and it is usgs-landsat not lpdaac, we were blocked by the bucket owner required requester pays parameter. I confirmed that we can get the head object with requester pays configured so I am working on a PR to get the ingest API to use the requester pays configuration if provided in the environment.

aws s3api head-object --bucket usgs-landsat --key collection02/level-2/standard/oli-tirs/2023/001/113/LC08_L2SR_001113_20230125_20230208_02_T2/LC08_L2SR_001113_20230125_20230208_02_T2_SR_B4.TIF --request-payer requester

out>
{
   ...
    "RequestCharged": "requester"
}

anayeaye added a commit to NASA-IMPACT/veda-backend that referenced this issue Jun 4, 2024
### Issue

NASA-IMPACT/veda-data#134

### What?

- Add optional requester pays configuration to ingest API to validate
accessibility of assets in buckets that require requester pays _if_ the
titiler been configured to use requester pays

### Why?

- Ingest API verifies that all hrefs in new items are accessible to the
titiler but is failing in cases where the bucket requires requester pays
configuration

### Testing?

I deployed this manually to dev and confirmed that the change resolves
the unable to access objects errors when configured to use requester
pays.
@anayeaye
Copy link
Contributor Author

anayeaye commented Jun 5, 2024

UPDATES

✔️ Modifications to ingest-api made it possible to test accessibility with requester pays config
✔️ Additional invalid medadata were surfaced after getting past the accessibility check, these were 'fixed' by removing the classification extension (many of the items declared the classification extension but did not conform to the spec)

The next validation blocker: Many of the items have hrefs to non-existent assets (different from the requester pays issue)

aws s3api head-object --bucket usgs-landsat --key collection02/level-2/standard/oli-tirs/2022/001/113/LC09_L2SR_001113_20221130_20221202_02_T2/LC09_L2SR_001113_20221130_20221202_02_T2_SR_B4.TIF --request-payer requester 

An error occurred (404) when calling the HeadObject operation: Not Found

Here are the currently publishable counts in test and I think we should move forward at this point and not attempt to correct any further (which means some invalid items in staging will NOT be published to production):

landsat-c2l2-sr-antarctic-glaciers-pine-island src_item_count=46 target_item_count=43 OK=False
landsat-c2l2-sr-antarctic-glaciers-thwaites src_item_count=53 target_item_count=49 OK=False
landsat-c2l2-sr-lakes-aral-sea src_item_count=1434 target_item_count=1402 OK=False
landsat-c2l2-sr-lakes-lake-balaton src_item_count=186 target_item_count=174 OK=False
landsat-c2l2-sr-lakes-lake-biwa src_item_count=72 target_item_count=70 OK=False
landsat-c2l2-sr-lakes-tonle-sap src_item_count=330 target_item_count=324 OK=False
landsat-c2l2-sr-lakes-vanern src_item_count=134 target_item_count=131 OK=False

@anayeaye
Copy link
Contributor Author

anayeaye commented Jun 5, 2024

#138

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants