Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] invalid zip files for SEDAC data #864

Open
1 task done
JackNelsonDS opened this issue Nov 4, 2024 · 25 comments
Open
1 task done

[BUG] invalid zip files for SEDAC data #864

JackNelsonDS opened this issue Nov 4, 2024 · 25 comments

Comments

@JackNelsonDS
Copy link

Is this issue already tracked somewhere, or is this a new report?

  • I've reviewed existing issues and couldn't find a duplicate for this problem.

Current Behavior

  • querying to get a data_granule works as expected
earthaccess.login()

results = earthaccess.search_data(
    provider="SEDAC",
    short_name="CIESIN_SEDAC_GPWv4_APCT_WPP_2015_R11",
    version="4.11",
    doi="10.7927/H4PN93PB",
)
  • downloading a targeted data_granule just downloading an empty zip file
filenames = earthaccess.download(
    granules=[data_granule],
)

> QUEUEING TASKS | : 100%|██████████| 1/1 [00:00<00:00, 1271.39it/s]
PROCESSING TASKS | : 100%|██████████| 1/1 [00:01<00:00,  1.03s/it]
COLLECTING RESULTS | : 100%|██████████| 1/1 [00:00<00:00, 4696.87it/s]
import zipfile

with zipfile.ZipFile(filenames[0], "r") as zip_ref:
    zip_ref.extractall(data_folder)

> ---------------------------------------------------------------------------
BadZipFile                                Traceback (most recent call last)
BadZipFile: File is not a zip file
data_granule.__dict__

> {'cloud_hosted': False,
 'uuid': 'e197555b-00db-4b20-9f0e-d29b84d8797e',
 'render_dict': Collection: {'ShortName': 'CIESIN_SEDAC_GPWv4_APCT_WPP_2015_R11', 'Version': '4.11'}
 Spatial coverage: {'HorizontalSpatialDomain': {'Geometry': {'BoundingRectangles': [{'WestBoundingCoordinate': -180.0, 'EastBoundingCoordinate': 180.0, 'NorthBoundingCoordinate': 90.0, 'SouthBoundingCoordinate': -90.0}]}}}
 Temporal coverage: {'SingleDateTime': '2020-07-01T00:00:00.000Z'}
 Size(MB): 404.984
 Data: ['https://sedac.ciesin.columbia.edu/downloads/data/gpw-v4/gpw-v4-population-count-adjusted-to-2015-unwpp-country-totals-rev11/gpw-v4-population-count-adjusted-to-2015-unwpp-country-totals-rev11_2020_30_sec_tif.zip']}

image

  • same error exists after renaming the folder to have a .zip file extension. File size suggests the zip is empty.

Expected Behavior

Zip file of .tif and .txt as you get when downloading manually.

Steps To Reproduce

See above code. Code was executed on a VSCode Devcontainer

Environment

- OS: Linux (python:3.10-slim)
- Python: 3.10.15
- earthaccess: 0.11.0

Additional Context

No response

@JackNelsonDS JackNelsonDS changed the title [BUG] {{ Empty zip for SEDAC data }} [BUG] Empty zip for SEDAC data Nov 4, 2024
@gv2325
Copy link

gv2325 commented Nov 5, 2024

Hi @JackNelsonDS I am tracking the same error and am at SEDAC. I have notified our CMR team to review the granule get data link. I wonder why its returning the bad zipfolder.

@chuckwondo chuckwondo changed the title [BUG] Empty zip for SEDAC data [BUG] invalid zip files for SEDAC data Nov 5, 2024
@chuckwondo
Copy link
Collaborator

The zip file is not empty, it's just very small, and not actually a zip file. I was able to replicate this. At first, I thought this problem was due to needing to accept the EULA, which I had not yet done for SEDAC. However, even after accepting the EULA, I'm still seeing the issue. I can successfully download a valid zip file from Earthdata Search, but earthaccess fails to download the same URL correctly.

@chuckwondo
Copy link
Collaborator

Further investigation shows that what is being downloaded is the contents of the Earthdata Login page, which seems very strange since I have already successfully logged in via earthaccess.login(). At this point, something seems off with how earthaccess is communicating with SEDAC, as the credentials don't appear to be making it through, hence we're getting the login page.

@gv2325
Copy link

gv2325 commented Nov 5, 2024

The "zip file" opens for me (not sure what to call it) but contains a HTML. Not sure what's going on here. Should contain a '.asc'

@chuckwondo
Copy link
Collaborator

The "zip file" opens for me (not sure what to call it) but contains a HTML. Not sure what's going on here. Should contain a '.asc'

Right. The HTML is the Earthdata Login page that I mentioned. So the attempt to download is redirecting to the login page, but should not do so because we already logged in via earthaccess.login(), so we should simply get the file we're requesting.

@gv2325
Copy link

gv2325 commented Nov 5, 2024

I wonder if it's our Apache module doing something weird and forcing it back to the EDL page.

@chuckwondo
Copy link
Collaborator

Yep, nothing we can do about incorrect URLs, but bad URL or not, we should not be getting redirected to the EDL login page. Are you able to investigate what going on with Apache on your end?

@gv2325
Copy link

gv2325 commented Nov 5, 2024

Yes, will do some testing and report back if I find anything as soon as I can.

@gv2325
Copy link

gv2325 commented Nov 6, 2024

http trace - posting temporarily

INFO:earthaccess.store: Getting 9 granules, approx download size: 0.67 GB

DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): sedac.ciesin.columbia.edu:443
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): sedac.ciesin.columbia.edu:443
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): sedac.ciesin.columbia.edu:443

DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): sedac.ciesin.columbia.edu:443
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): sedac.ciesin.columbia.edu:443
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): sedac.ciesin.columbia.edu:443
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): sedac.ciesin.columbia.edu:443
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): sedac.ciesin.columbia.edu:443
DEBUG:urllib3.connectionpool:https://sedac.ciesin.columbia.edu:443 "GET /downloads/data/gpw-v4/gpw-v4-population-count-adjusted-to-2015-unwpp-country-totals-rev11/gpw-v4-population-count-adjusted-to-2015-unwpp-country-totals-rev11_2020_30_sec_asc.zip HTTP/11" 302 642
DEBUG:earthaccess.auth:Deleting Auth Headers: sedac.ciesin.columbia.edu -> urs.earthdata.nasa.gov
DEBUG:urllib3.connectionpool:https://sedac.ciesin.columbia.edu:443 "GET /downloads/data/gpw-v4/gpw-v4-population-count-adjusted-to-2015-unwpp-country-totals-rev11/gpw-v4-population-count-adjusted-to-2015-unwpp-country-totals-rev11_2020_15_min_tif.zip HTTP/11" 302 642
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): urs.earthdata.nasa.gov:443
DEBUG:earthaccess.auth:Deleting Auth Headers: sedac.ciesin.columbia.edu -> urs.earthdata.nasa.gov
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): urs.earthdata.nasa.gov:443
DEBUG:urllib3.connectionpool:https://sedac.ciesin.columbia.edu:443 "GET /downloads/data/gpw-v4/gpw-v4-population-count-adjusted-to-2015-unwpp-country-totals-rev11/gpw-v4-population-count-adjusted-to-2015-unwpp-country-totals-rev11_2020_15_min_asc.zip HTTP/11" 302 642
DEBUG:earthaccess.auth:Deleting Auth Headers: sedac.ciesin.columbia.edu -> urs.earthdata.nasa.gov
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): urs.earthdata.nasa.gov:443
DEBUG:urllib3.connectionpool:https://sedac.ciesin.columbia.edu:443 "GET /downloads/data/gpw-v4/gpw-v4-population-count-adjusted-to-2015-unwpp-country-totals-rev11/gpw-v4-population-count-adjusted-to-2015-unwpp-country-totals-rev11_2020_2pt5_min_asc.zip HTTP/11" 302 644
DEBUG:urllib3.connectionpool:https://sedac.ciesin.columbia.edu:443 "GET /downloads/data/gpw-v4/gpw-v4-population-count-adjusted-to-2015-unwpp-country-totals-rev11/gpw-v4-population-count-adjusted-to-2015-unwpp-country-totals-rev11_2020_30_min_asc.zip HTTP/11" 302 642
DEBUG:urllib3.connectionpool:https://sedac.ciesin.columbia.edu:443 "GET /downloads/data/gpw-v4/gpw-v4-population-count-adjusted-to-2015-unwpp-country-totals-rev11/gpw-v4-population-count-adjusted-to-2015-unwpp-country-totals-rev11_2020_30_sec_tif.zip HTTP/11" 302 642
DEBUG:urllib3.connectionpool:https://sedac.ciesin.columbia.edu:443 "GET /downloads/data/gpw-v4/gpw-v4-population-count-adjusted-to-2015-unwpp-country-totals-rev11/gpw-v4-population-count-adjusted-to-2015-unwpp-country-totals-rev11_2020_1_deg_asc.zip HTTP/11" 302 640
DEBUG:urllib3.connectionpool:https://sedac.ciesin.columbia.edu:443 "GET /downloads/data/gpw-v4/gpw-v4-population-count-adjusted-to-2015-unwpp-country-totals-rev11/gpw-v4-population-count-adjusted-to-2015-unwpp-country-totals-rev11_2020_30_min_tif.zip HTTP/11" 302 642
DEBUG:earthaccess.auth:Deleting Auth Headers: sedac.ciesin.columbia.edu -> urs.earthdata.nasa.gov
DEBUG:earthaccess.auth:Deleting Auth Headers: sedac.ciesin.columbia.edu -> urs.earthdata.nasa.gov
DEBUG:earthaccess.auth:Deleting Auth Headers: sedac.ciesin.columbia.edu -> urs.earthdata.nasa.gov
DEBUG:earthaccess.auth:Deleting Auth Headers: sedac.ciesin.columbia.edu -> urs.earthdata.nasa.gov
DEBUG:earthaccess.auth:Deleting Auth Headers: sedac.ciesin.columbia.edu -> urs.earthdata.nasa.gov
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): urs.earthdata.nasa.gov:443
...
DEBUG:urllib3.connectionpool:https://urs.earthdata.nasa.gov:443 "GET /oauth/authorize?client_id=9l9yCHEF4zcZStCzop00yw&response_type=code&redirect_uri=https%3A%2F%2Fsedac.ciesin.columbia.edu%2Furs&state=aHR0cHM6Ly9zZWRhYy5jaWVzaW4uY29sdW1iaWEuZWR1L2Rvd25sb2Fkcy9kYXRhL2dwdy12NC9ncHctdjQtcG9wdWxhdGlvbi1jb3VudC1hZGp1c3RlZC10by0yMDE1LXVud3BwLWNvdW50cnktdG90YWxzLXJldjExL2dwdy12NC1wb3B1bGF0aW9uLWNvdW50LWFkanVzdGVkLXRvLTIwMTUtdW53cHAtY291bnRyeS10b3RhbHMtcmV2MTFfMjAyMF8zMF9taW5fYXNjLnppcA HTTP/11" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): urs.earthdata.nasa.gov:443
DEBUG:urllib3.connectionpool:https://urs.earthdata.nasa.gov:443 "GET /oauth/authorize?client_id=9l9yCHEF4zcZStCzop00yw&response_type=code&redirect_uri=https%3A%2F%2Fsedac.ciesin.columbia.edu%2Furs&state=aHR0cHM6Ly9zZWRhYy5jaWVzaW4uY29sdW1iaWEuZWR1L2Rvd25sb2Fkcy9kYXRhL2dwdy12NC9ncHctdjQtcG9wdWxhdGlvbi1jb3VudC1hZGp1c3RlZC10by0yMDE1LXVud3BwLWNvdW50cnktdG90YWxzLXJldjExL2dwdy12NC1wb3B1bGF0aW9uLWNvdW50LWFkanVzdGVkLXRvLTIwMTUtdW53cHAtY291bnRyeS10b3RhbHMtcmV2MTFfMjAyMF8xX2RlZ19hc2Muemlw HTTP/11" 200 None
DEBUG:urllib3.connectionpool:https://urs.earthdata.nasa.gov:443 "GET /oauth/authorize?client_id=9l9yCHEF4zcZStCzop00yw&response_type=code&redirect_uri=https%3A%2F%2Fsedac.ciesin.columbia.edu%2Furs&state=aHR0cHM6Ly9zZWRhYy5jaWVzaW4uY29sdW1iaWEuZWR1L2Rvd25sb2Fkcy9kYXRhL2dwdy12NC9ncHctdjQtcG9wdWxhdGlvbi1jb3VudC1hZGp1c3RlZC10by0yMDE1LXVud3BwLWNvdW50cnktdG90YWxzLXJldjExL2dwdy12NC1wb3B1bGF0aW9uLWNvdW50LWFkanVzdGVkLXRvLTIwMTUtdW53cHAtY291bnRyeS10b3RhbHMtcmV2MTFfMjAyMF8zMF9zZWNfdGlmLnppcA HTTP/11" 200 None
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...
send: b'GET /downloads/data/gpw-v4/gpw-v4-population-count-adjusted-to-2015-unwpp-country-totals-rev11/gpw-v4-population-count-adjusted-to-2015-unwpp-country-totals-rev11_2020_30_sec_asc.zip HTTP/1.1\r\nHost: sedac.ciesin.columbia.edu\r\nUser-Agent: earthaccess v0.11.0\r\nAccept-Encoding: gzip, deflate, br, zstd\r\nAccept: */*\r\nConnection: keep-alive\r\nAuthorization: Bearer eyJ0eXAiOiJKV1QiLCJvcmlnaW4iOiJFYXJ0aGRhdGEgTG9naW4iLCJzaWciOiJlZGxqd3RwdWJrZXlfb3BzIiwiYWxnIjoiUlMyNTYifQ.eyJ0eXBlIjoiVXNlciIsInVpZCI6Imd2ZXJnaGVzIiwiZXhwIjoxNzM0NzE3NDg2LCJpYXQiOjE3Mjk1MzM0ODYsImlzcyI6Imh0dHBzOi8vdXJzLmVhcnRoZGF0YS5uYXNhLmdvdiJ9.VDLwRKBqmQsShhUI-2YM7E57aawptxFZ0X7GciSjLWBbJR2pp8GgqclVOW6ZeRZnQ37pbXNn2XZs-kdBgUclpUYojDUjQWVZL1vNkKmKyJJm2bHOeylGg57j1Ig8yXMbOV_lvzacPackUZA-mx4lE5Gb2DmsU0hOnzj_P3RUIMQyTOTX6wevjMnDYyZFiIcFS6slv45KLyIYX0WhlEQZNu1G1752Tsrt68YSpBKs_CzMQZD0WQt9OBjQ5UvG00HMRgVst5NQLea-fbS_lm_yf0jtEVLhxkeWte1WC3_40ktpxmfvDBza5QQkfx_FJkmHgQ0JETN9KJhnVITjxO2QNQ\r\n\r\n'
send: b'GET /downloads/data/gpw-v4/gpw-v4-population-count-adjusted-to-2015-unwpp-country-totals-rev11/gpw-v4-population-count-adjusted-to-2015-unwpp-country-totals-rev11_2020_15_min_tif.zip HTTP/1.1\r\nHost: sedac.ciesin.columbia.edu\r\nUser-Agent: earthaccess v0.11.0\r\nAccept-Encoding: gzip, deflate, br, zstd\r\nAccept: */*\r\nConnection: keep-alive\r\nAuthorization: Bearer eyJ0eXAiOiJKV1QiLCJvcmlnaW4iOiJFYXJ0aGRhdGEgTG9naW4iLCJzaWciOiJlZGxqd3RwdWJrZXlfb3BzIiwiYWxnIjoiUlMyNTYifQ.eyJ0eXBlIjoiVXNlciIsInVpZCI6Imd2ZXJnaGVzIiwiZXhwIjoxNzM0NzE3NDg2LCJpYXQiOjE3Mjk1MzM0ODYsImlzcyI6Imh0dHBzOi8vdXJzLmVhcnRoZGF0YS5uYXNhLmdvdiJ9.VDLwRKBqmQsShhUI-2YM7E57aawptxFZ0X7GciSjLWBbJR2pp8GgqclVOW6ZeRZnQ37pbXNn2XZs-kdBgUclpUYojDUjQWVZL1vNkKmKyJJm2bHOeylGg57j1Ig8yXMbOV_lvzacPackUZA-mx4lE5Gb2DmsU0hOnzj_P3RUIMQyTOTX6wevjMnDYyZFiIcFS6slv45KLyIYX0WhlEQZNu1G1752Tsrt68YSpBKs_CzMQZD0WQt9OBjQ5UvG00HMRgVst5NQLea-fbS_lm_yf0jtEVLhxkeWte1WC3_40ktpxmfvDBza5QQkfx_FJkmHgQ0JETN9KJhnVITjxO2QNQ\r\n\r\n'
reply: 'HTTP/1.1 302 Found\r\n'
header: Date: Wed, 06 Nov 2024 15:02:04 GMT
header: Server: Apache
header: Location: https://urs.earthdata.nasa.gov/oauth/authorize?client_id=9l9yCHEF4zcZStCzop00yw&response_type=code&redirect_uri=https%3A%2F%2Fsedac.ciesin.columbia.edu%2Furs&state=aHR0cHM6Ly9zZWRhYy5jaWVzaW4uY29sdW1iaWEuZWR1L2Rvd25sb2Fkcy9kYXRhL2dwdy12NC9ncHctdjQtcG9wdWxhdGlvbi1jb3VudC1hZGp1c3RlZC10by0yMDE1LXVud3BwLWNvdW50cnktdG90YWxzLXJldjExL2dwdy12NC1wb3B1bGF0aW9uLWNvdW50LWFkanVzdGVkLXRvLTIwMTUtdW53cHAtY291bnRyeS10b3RhbHMtcmV2MTFfMjAyMF8zMF9zZWNfYXNjLnppcA
header: Content-Length: 642
header: Keep-Alive: timeout=5, max=100
header: Connection: Keep-Alive
header: Content-Type: text/html; charset=iso-8859-1
send: b'GET /downloads/data/gpw-v4/gpw-v4-population-count-adjusted-to-2015-unwpp-country-totals-rev11/gpw-v4-population-count-adjusted-to-2015-unwpp-country-totals-rev11_2020_30_min_asc.zip HTTP/1.1\r\nHost: sedac.ciesin.columbia.edu\r\nUser-Agent: earthaccess v0.11.0\r\nAccept-Encoding: gzip, deflate, br, zstd\r\nAccept: */*\r\nConnection: keep-alive\r\nAuthorization: Bearer eyJ0eXAiOiJKV1QiLCJvcmlnaW4iOiJFYXJ0aGRhdGEgTG9naW4iLCJzaWciOiJlZGxqd3RwdWJrZXlfb3BzIiwiYWxnIjoiUlMyNTYifQ.eyJ0eXBlIjoiVXNlciIsInVpZCI6Imd2ZXJnaGVzIiwiZXhwIjoxNzM0NzE3NDg2LCJpYXQiOjE3Mjk1MzM0ODYsImlzcyI6Imh0dHBzOi8vdXJzLmVhcnRoZGF0YS5uYXNhLmdvdiJ9.VDLwRKBqmQsShhUI-2YM7E57aawptxFZ0X7GciSjLWBbJR2pp8GgqclVOW6ZeRZnQ37pbXNn2XZs-kdBgUclpUYojDUjQWVZL1vNkKmKyJJm2bHOeylGg57j1Ig8yXMbOV_lvzacPackUZA-mx4lE5Gb2DmsU0hOnzj_P3RUIMQyTOTX6wevjMnDYyZFiIcFS6slv45KLyIYX0WhlEQZNu1G1752Tsrt68YSpBKs_CzMQZD0WQt9OBjQ5UvG00HMRgVst5NQLea-fbS_lm_yf0jtEVLhxkeWte1WC3_40ktpxmfvDBza5QQkfx_FJkmHgQ0JETN9KJhnVITjxO2QNQ\r\n\r\n'
send: b'GET /downloads/data/gpw-v4/gpw-v4-population-count-adjusted-to-2015-unwpp-country-totals-rev11/gpw-v4-population-count-adjusted-to-2015-unwpp-country-totals-rev11_2020_2pt5_min_asc.zip HTTP/1.1\r\nHost: sedac.ciesin.columbia.edu\r\nUser-Agent: earthaccess v0.11.0\r\nAccept-Encoding: gzip, deflate, br, zstd\r\nAccept: */*\r\nConnection: keep-alive\r\nAuthorization: Bearer eyJ0eXAiOiJKV1QiLCJvcmlnaW4iOiJFYXJ0aGRhdGEgTG9naW4iLCJzaWciOiJlZGxqd3RwdWJrZXlfb3BzIiwiYWxnIjoiUlMyNTYifQ.eyJ0eXBlIjoiVXNlciIsInVpZCI6Imd2ZXJnaGVzIiwiZXhwIjoxNzM0NzE3NDg2LCJpYXQiOjE3Mjk1MzM0ODYsImlzcyI6Imh0dHBzOi8vdXJzLmVhcnRoZGF0YS5uYXNhLmdvdiJ9.VDLwRKBqmQsShhUI-2YM7E57aawptxFZ0X7GciSjLWBbJR2pp8GgqclVOW6ZeRZnQ37pbXNn2XZs-kdBgUclpUYojDUjQWVZL1vNkKmKyJJm2bHOeylGg57j1Ig8yXMbOV_lvzacPackUZA-mx4lE5Gb2DmsU0hOnzj_P3RUIMQyTOTX6wevjMnDYyZFiIcFS6slv45KLyIYX0WhlEQZNu1G1752Tsrt68YSpBKs_CzMQZD0WQt9OBjQ5UvG00HMRgVst5NQLea-fbS_lm_yf0jtEVLhxkeWte1WC3_40ktpxmfvDBza5QQkfx_FJkmHgQ0JETN9KJhnVITjxO2QNQ\r\n\r\n'
send: b'GET /downloads/data/gpw-v4/gpw-v4-population-count-adjusted-to-2015-unwpp-country-totals-rev11/gpw-v4-population-count-adjusted-to-2015-unwpp-country-totals-rev11_2020_30_sec_tif.zip HTTP/1.1\r\nHost: sedac.ciesin.columbia.edu\r\nUser-Agent: earthaccess v0.11.0\r\nAccept-Encoding: gzip, deflate, br, zstd\r\nAccept: */*\r\nConnection: keep-alive\r\nAuthorization: Bearer eyJ0eXAiOiJKV1QiLCJvcmlnaW4iOiJFYXJ0aGRhdGEgTG9naW4iLCJzaWciOiJlZGxqd3RwdWJrZXlfb3BzIiwiYWxnIjoiUlMyNTYifQ.eyJ0eXBlIjoiVXNlciIsInVpZCI6Imd2ZXJnaGVzIiwiZXhwIjoxNzM0NzE3NDg2LCJpYXQiOjE3Mjk1MzM0ODYsImlzcyI6Imh0dHBzOi8vdXJzLmVhcnRoZGF0YS5uYXNhLmdvdiJ9.VDLwRKBqmQsShhUI-2YM7E57aawptxFZ0X7GciSjLWBbJR2pp8GgqclVOW6ZeRZnQ37pbXNn2XZs-kdBgUclpUYojDUjQWVZL1vNkKmKyJJm2bHOeylGg57j1Ig8yXMbOV_lvzacPackUZA-mx4lE5Gb2DmsU0hOnzj_P3RUIMQyTOTX6wevjMnDYyZFiIcFS6slv45KLyIYX0WhlEQZNu1G1752Tsrt68YSpBKs_CzMQZD0WQt9OBjQ5UvG00HMRgVst5NQLea-fbS_lm_yf0jtEVLhxkeWte1WC3_40ktpxmfvDBza5QQkfx_FJkmHgQ0JETN9KJhnVITjxO2QNQ\r\n\r\n'
send: b'GET /downloads/data/gpw-v4/gpw-v4-population-count-adjusted-to-2015-unwpp-country-totals-rev11/gpw-v4-population-count-adjusted-to-2015-unwpp-country-totals-rev11_2020_1_deg_asc.zip HTTP/1.1\r\nHost: sedac.ciesin.columbia.edu\r\nUser-Agent: earthaccess v0.11.0\r\nAccept-Encoding: gzip, deflate, br, zstd\r\nAccept: */*\r\nConnection: keep-alive\r\nAuthorization: Bearer eyJ0eXAiOiJKV1QiLCJvcmlnaW4iOiJFYXJ0aGRhdGEgTG9naW4iLCJzaWciOiJlZGxqd3RwdWJrZXlfb3BzIiwiYWxnIjoiUlMyNTYifQ.eyJ0eXBlIjoiVXNlciIsInVpZCI6Imd2ZXJnaGVzIiwiZXhwIjoxNzM0NzE3NDg2LCJpYXQiOjE3Mjk1MzM0ODYsImlzcyI6Imh0dHBzOi8vdXJzLmVhcnRoZGF0YS5uYXNhLmdvdiJ9.VDLwRKBqmQsShhUI-2YM7E57aawptxFZ0X7GciSjLWBbJR2pp8GgqclVOW6ZeRZnQ37pbXNn2XZs-kdBgUclpUYojDUjQWVZL1vNkKmKyJJm2bHOeylGg57j1Ig8yXMbOV_lvzacPackUZA-mx4lE5Gb2DmsU0hOnzj_P3RUIMQyTOTX6wevjMnDYyZFiIcFS6slv45KLyIYX0WhlEQZNu1G1752Tsrt68YSpBKs_CzMQZD0WQt9OBjQ5UvG00HMRgVst5NQLea-fbS_lm_yf0jtEVLhxkeWte1WC3_40ktpxmfvDBza5QQkfx_FJkmHgQ0JETN9KJhnVITjxO2QNQ\r\n\r\n'
send: b'GET /downloads/data/gpw-v4/gpw-v4-population-count-adjusted-to-2015-unwpp-country-totals-rev11/gpw-v4-population-count-adjusted-to-2015-unwpp-country-totals-rev11_2020_30_min_tif.zip HTTP/1.1\r\nHost: sedac.ciesin.columbia.edu\r\nUser-Agent: earthaccess v0.11.0\r\nAccept-Encoding: gzip, deflate, br, zstd\r\nAccept: */*\r\nConnection: keep-alive\r\nAuthorization: Bearer eyJ0eXAiOiJKV1QiLCJvcmlnaW4iOiJFYXJ0aGRhdGEgTG9naW4iLCJzaWciOiJlZGxqd3RwdWJrZXlfb3BzIiwiYWxnIjoiUlMyNTYifQ.eyJ0eXBlIjoiVXNlciIsInVpZCI6Imd2ZXJnaGVzIiwiZXhwIjoxNzM0NzE3NDg2LCJpYXQiOjE3Mjk1MzM0ODYsImlzcyI6Imh0dHBzOi8vdXJzLmVhcnRoZGF0YS5uYXNhLmdvdiJ9.VDLwRKBqmQsShhUI-2YM7E57aawptxFZ0X7GciSjLWBbJR2pp8GgqclVOW6ZeRZnQ37pbXNn2XZs-kdBgUclpUYojDUjQWVZL1vNkKmKyJJm2bHOeylGg57j1Ig8yXMbOV_lvzacPackUZA-mx4lE5Gb2DmsU0hOnzj_P3RUIMQyTOTX6wevjMnDYyZFiIcFS6slv45KLyIYX0WhlEQZNu1G1752Tsrt68YSpBKs_CzMQZD0WQt9OBjQ5UvG00HMRgVst5NQLea-fbS_lm_yf0jtEVLhxkeWte1WC3_40ktpxmfvDBza5QQkfx_FJkmHgQ0JETN9KJhnVITjxO2QNQ\r\n\r\n'
send: b'GET /downloads/data/gpw-v4/gpw-v4-population-count-adjusted-to-2015-unwpp-country-totals-rev11/gpw-v4-population-count-adjusted-to-2015-unwpp-country-totals-rev11_2020_15_min_asc.zip HTTP/1.1\r\nHost: sedac.ciesin.columbia.edu\r\nUser-Agent: earthaccess v0.11.0\r\nAccept-Encoding: gzip, deflate, br, zstd\r\nAccept: */*\r\nConnection: keep-alive\r\nAuthorization: Bearer eyJ0eXAiOiJKV1QiLCJvcmlnaW4iOiJFYXJ0aGRhdGEgTG9naW4iLCJzaWciOiJlZGxqd3RwdWJrZXlfb3BzIiwiYWxnIjoiUlMyNTYifQ.eyJ0eXBlIjoiVXNlciIsInVpZCI6Imd2ZXJnaGVzIiwiZXhwIjoxNzM0NzE3NDg2LCJpYXQiOjE3Mjk1MzM0ODYsImlzcyI6Imh0dHBzOi8vdXJzLmVhcnRoZGF0YS5uYXNhLmdvdiJ9.VDLwRKBqmQsShhUI-2YM7E57aawptxFZ0X7GciSjLWBbJR2pp8GgqclVOW6ZeRZnQ37pbXNn2XZs-kdBgUclpUYojDUjQWVZL1vNkKmKyJJm2bHOeylGg57j1Ig8yXMbOV_lvzacPackUZA-mx4lE5Gb2DmsU0hOnzj_P3RUIMQyTOTX6wevjMnDYyZFiIcFS6slv45KLyIYX0WhlEQZNu1G1752Tsrt68YSpBKs_CzMQZD0WQt9OBjQ5UvG00HMRgVst5NQLea-fbS_lm_yf0jtEVLhxkeWte1WC3_40ktpxmfvDBza5QQkfx_FJkmHgQ0JETN9KJhnVITjxO2QNQ\r\n\r\n'
reply: 'HTTP/1.1 302 Found\r\n'
header: Date: Wed, 06 Nov 2024 15:02:04 GMT
header: Server: Apache
header: Location: https://urs.earthdata.nasa.gov/oauth/authorize?client_id=9l9yCHEF4zcZStCzop00yw&response_type=code&redirect_uri=https%3A%2F%2Fsedac.ciesin.columbia.edu%2Furs&state=aHR0cHM6Ly9zZWRhYy5jaWVzaW4uY29sdW1iaWEuZWR1L2Rvd25sb2Fkcy9kYXRhL2dwdy12NC9ncHctdjQtcG9wdWxhdGlvbi1jb3VudC1hZGp1c3RlZC10by0yMDE1LXVud3BwLWNvdW50cnktdG90YWxzLXJldjExL2dwdy12NC1wb3B1bGF0aW9uLWNvdW50LWFkanVzdGVkLXRvLTIwMTUtdW53cHAtY291bnRyeS10b3RhbHMtcmV2MTFfMjAyMF8xNV9taW5fdGlmLnppcA
header: Content-Length: 642
header: Keep-Alive: timeout=5, max=100
header: Connection: Keep-Alive
header: Content-Type: text/html; charset=iso-8859-1
reply: 'HTTP/1.1 302 Found\r\n'
...
header: X-Runtime: 0.068327
header: Content-Encoding: gzip
header: Strict-Transport-Security: max-age=31536000
header: Transfer-Encoding: chunked
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...
DEBUG:urllib3.connectionpool:https://urs.earthdata.nasa.gov:443 "GET /oauth/authorize?client_id=9l9yCHEF4zcZStCzop00yw&response_type=code&redirect_uri=https%3A%2F%2Fsedac.ciesin.columbia.edu%2Furs&state=aHR0cHM6Ly9zZWRhYy5jaWVzaW4uY29sdW1iaWEuZWR1L2Rvd25sb2Fkcy9kYXRhL2dwdy12NC9ncHctdjQtcG9wdWxhdGlvbi1jb3VudC1hZGp1c3RlZC10by0yMDE1LXVud3BwLWNvdW50cnktdG90YWxzLXJldjExL2dwdy12NC1wb3B1bGF0aW9uLWNvdW50LWFkanVzdGVkLXRvLTIwMTUtdW53cHAtY291bnRyeS10b3RhbHMtcmV2MTFfMjAyMF8ycHQ1X21pbl90aWYuemlw HTTP/11" 200 None
reply: 'HTTP/1.1 200 OK\r\n'
header: Server: nginx/1.22.1
header: Date: Wed, 06 Nov 2024 15:02:04 GMT
header: Content-Type: text/html; charset=utf-8
header: Connection: keep-alive
header: X-Frame-Options: SAMEORIGIN
header: X-XSS-Protection: 1; mode=block
header: X-Content-Type-Options: nosniff
header: X-Download-Options: noopen
header: X-Permitted-Cross-Domain-Policies: none
header: Referrer-Policy: strict-origin-when-cross-origin
header: Cache-Control: no-store
header: Pragma: no-cache
header: Expires: Fri, 01 Jan 1990 00:00:00 GMT
header: Vary: Accept
header: ETag: W/"fdac9622e4c26aa7b14e57b1e0cc5b65"
header: Set-Cookie: _urs-gui_session=4ce70503a6ca6807f413b1b5f8340b28; path=/; expires=Thu, 07 Nov 2024 15:02:04 GMT; HttpOnly
header: X-Request-Id: 53eb1184-910f-47b8-819c-2b7ce4f1f8c7
header: X-Runtime: 0.018861
header: Content-Encoding: gzip
header: Strict-Transport-Security: max-age=31536000
header: Transfer-Encoding: chunked

@gv2325
Copy link

gv2325 commented Nov 6, 2024

The headers are deleted when a redirect happens outside the original request domain, is this the issue? @chuckwondo

@gv2325
Copy link

gv2325 commented Nov 6, 2024

AUTH_HOSTS: List[str] = [
        "urs.earthdata.nasa.gov",
        "cumulus.asf.alaska.edu",
        "sentinel1.asf.alaska.edu",
        "datapool.asf.alaska.edu",
        "sedac.ciesin.columbia.edu", # added sedac
    ]

in earthaccess/auth.py maybe?

@chuckwondo
Copy link
Collaborator

@gv2325, that indeed could be the problem. However, I've long wanted to remove that hard-coded list and provide a more flexible, yet still secure solution, so this may be the time for me to do that.

@gv2325
Copy link

gv2325 commented Nov 6, 2024

Thats great, I was thinking the same. That said I wonder if a hotfix is an okay stop gap solution until a better one roles in?

I have some thoughts on the auth part.

  1. One is if there could be a way to pass in the EDL username and password via github secrets or aws secrets manager. This could help in cloud based processing where a user can let cron jobs run?
  2. While not all DAACs have enabled token based access, I wonder if there could be a strategy to allow users to pass their tokens?

Let me know if these use cases are valid or if the solution already exists.

If we are okay on a hotfix that would be great!

@mfisher87
Copy link
Collaborator

While not all DAACs have enabled token based access, I wonder if there could be a strategy to allow users to pass their tokens?

See #484 ! 🚀

@chuckwondo
Copy link
Collaborator

  1. One is if there could be a way to pass in the EDL username and password via github secrets or aws secrets manager. This could help in cloud based processing where a user can let cron jobs run?

I don't think we should directly support that. What we need to do is simply provide a means for you to pass a username and password directly to the login function. How you obtain the username and password is up to you, be it github secrets or aws secrets, or otherwise, as long as we allow you to pass them into login, without necessarily having to set env vars.

However, as far as github secrets are concerned, you can simply pass them in as env vars in your github workflows, so that's already supported since earthaccess allows you to set EARTHACCESS_USERNAME and EARTHACCESS_PASSWORD.

@gv2325
Copy link

gv2325 commented Nov 6, 2024

Is a hotfix something we could do?

@jhkennedy
Copy link
Collaborator

Is a hotfix something we could do?

yes and no; it's possible but messy right now. I'd rather just do a regular release if we can.

@gv2325
Copy link

gv2325 commented Nov 6, 2024

What would the timeline on that look like? I would like to give some sort of notice to our TOPS project and users to be aware of this.

@jhkennedy
Copy link
Collaborator

There's a lot already in main:
v0.11.0...main#diff-06572a96a58dc510037d5efa622f9bec8519bc1beab13c9f251e97e657a9d4edR8-R43

So I think I'd be fine sending a release anytime, tbh. @mfisher87 @chuckwondo @betolink?

@chuckwondo
Copy link
Collaborator

@jhkennedy, are you suggesting we simply add "sedac.ciesin.columbia.edu" to the list, get that merged to main, and then cut a release? I'd be fine with that.

@jhkennedy
Copy link
Collaborator

@chuckwondo I wasn't suggesting anything about a fix -- just that as soon as we have one, I'd be fine cutting a release instead of trying to do a hotfix.

But for the short-term fix, adding sedac.ciesin.columbia.edu to the list and getting it out sounds like a good way to go. We should def. circle back to refactor, so we don't need that list, however.

@mfisher87
Copy link
Collaborator

All of the above sounds good to me!

@chuckwondo
Copy link
Collaborator

@gv2325, if you don't want to wait for us to cut a new release (it will likely be another day, or several), you can use this workaround:

import urllib.parse
from functools import cache
from typing import Any

import earthaccess
import requests
import requests.cookies


class BasicAuthResponseHook:
    def __init__(self, auth: earthaccess.Auth) -> None:
        self.auth = auth

    def __call__(self, r: requests.Response, **kwargs: Any) -> requests.Response:
        if urllib.parse.urlparse(r.url).hostname != self.auth.system.edl_hostname:
            return r

        # Consume content and release the original connection to allow our new request
        # to reuse the same one.
        r.content
        r.close()

        prepared_request = r.request.copy()
        requests.cookies.extract_cookies_to_jar(
            prepared_request._cookies, r.request, r.raw  # type: ignore
        )
        prepared_request.prepare_cookies(prepared_request._cookies)  # type: ignore
        prepared_request.prepare_auth((self.auth.username, self.auth.password))

        _r = r.connection.send(prepared_request, **kwargs)
        _r.history.append(r)
        _r.request = prepared_request

        return _r


results = earthaccess.search_data(
    provider="SEDAC",
    short_name="CIESIN_SEDAC_GPWv4_APCT_WPP_2015_R11",
    version="4.11",
    doi="10.7927/H4PN93PB",
    count=1,
)

auth = earthaccess.login()
earthaccess.__store__.auth.get_session = cache(earthaccess.__store__.auth.get_session)
session = earthaccess.get_requests_https_session()
session.hooks["response"].append(BasicAuthResponseHook(auth))

filenames = earthaccess.download(granules=results)
print(filenames)

@gv2325
Copy link

gv2325 commented Nov 6, 2024

@JackNelsonDS looks like we have a temp solution! Thank you @chuckwondo @mfisher87 @jhkennedy for the help and input. I will test the solution in a bit once I am back at a desk.

@JackNelsonDS
Copy link
Author

Implemented the temp solution yesterday evening. Thank you for the rapid responses and quick turnaround!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🆕 New
Development

No branches or pull requests

5 participants