Accessing opendap datasets #26

jhamman · 2018-03-15T18:31:19Z

I am working on what I think is a fairly common workflow:

log on to ESGS using the LogonManager class
search for some datasets using the SearchConnection class
access some opendap dataset using netcdf4-python or pydap

Here's an example workflow:

In [1]: openid = 'https://esgf-node.llnl.gov/esgf-idp/openid/SECRET'
   ...: password = 'SECRET'
   ...:

In [2]: from pyesgf.logon import LogonManager
   ...: from pyesgf.search import SearchConnection
   ...: import xarray as xr
   ...:

In [3]: # intialize the logon manager
   ...: lm = LogonManager(verify=True)
   ...: if not lm.is_logged_on():
   ...:     lm.logon_with_openid(openid, password, 'pcmdi9.llnl.gov')
   ...: lm.is_logged_on()
   ...:
Out[3]: True

In [4]: def print_context_info(ctx):
   ...:     print('Hits:', ctx.hit_count)
   ...:     print('Realms:', ctx.facet_counts['experiment'])
   ...:     print('Realms:', ctx.facet_counts['realm'])
   ...:     print('Ensembles:', ctx.facet_counts['ensemble'])
   ...:

In [5]: # search for some data
   ...: conn = SearchConnection('http://pcmdi9.llnl.gov/esg-search', distrib=Tru
   ...: e)
   ...: ctx = conn.new_context(project='CMIP5', model='CCSM4', experiment='rcp85
   ...: ', time_frequency='day')
   ...: ctx = ctx.constrain(realm='atmos', ensemble='r1i1p1')
   ...:
   ...: # print a summary of what we found
   ...: print_context_info(ctx)
   ...:
Hits: 4
Realms: {'rcp85': 4}
Realms: {'atmos': 4}
Ensembles: {'r1i1p1': 4}

In [6]: # aggregate results
   ...: result = ctx.search()[0]
   ...: agg_ctx = result.aggregation_context()
   ...:
   ...: # get a list of opendap urls
   ...: x = list(a.opendap_url for a in agg_ctx.search() if a.opendap_url)
   ...: x
   ...:
Out[6]:
['http://aims3.llnl.gov/thredds/dodsC/cmip5.output1.NCAR.CCSM4.rcp85.day.atmos.day.r1i1p1.tasmin.20120705.aggregation.1',
 'http://aims3.llnl.gov/thredds/dodsC/cmip5.output1.NCAR.CCSM4.rcp85.day.atmos.day.r1i1p1.tasmax.20120705.aggregation.1',
 'http://aims3.llnl.gov/thredds/dodsC/cmip5.output1.NCAR.CCSM4.rcp85.day.atmos.day.r1i1p1.prc.20120705.aggregation.1',
 'http://aims3.llnl.gov/thredds/dodsC/cmip5.output1.NCAR.CCSM4.rcp85.day.atmos.day.r1i1p1.psl.20120705.aggregation.1',
 'http://aims3.llnl.gov/thredds/dodsC/cmip5.output1.NCAR.CCSM4.rcp85.day.atmos.day.r1i1p1.tas.20120705.aggregation.1',
 'http://aims3.llnl.gov/thredds/dodsC/cmip5.output1.NCAR.CCSM4.rcp85.day.atmos.day.r1i1p1.pr.20120705.aggregation.1']

In [7]: # try opening one of the opendap datasets
   ...: xr.open_dataset(x[0], engine='pydap')
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-7-90d39efb83f7> in <module>()
      1 # try opening one of the opendap datasets
----> 2 xr.open_dataset(x[0], engine='pydap')

~/anaconda/envs/aist/lib/python3.6/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, group, decode_cf, mask_and_scale, decode_times, autoclose, concat_characters, decode_coords, engine, chunks, lock, cache, drop_variables)
    302                                             autoclose=autoclose)
    303         elif engine == 'pydap':
--> 304             store = backends.PydapDataStore.open(filename_or_obj)
    305         elif engine == 'h5netcdf':
    306             store = backends.H5NetCDFStore(filename_or_obj, group=group,

~/anaconda/envs/aist/lib/python3.6/site-packages/xarray/backends/pydap_.py in open(cls, url, session)
     75     def open(cls, url, session=None):
     76         import pydap.client
---> 77         ds = pydap.client.open_url(url, session=session)
     78         return cls(ds)
     79

~/anaconda/envs/aist/lib/python3.6/site-packages/pydap/client.py in open_url(url, application, session, output_grid)
     62     never retrieve coordinate axes.
     63     """
---> 64     dataset = DAPHandler(url, application, session, output_grid).dataset
     65
     66     # attach server-side functions

~/anaconda/envs/aist/lib/python3.6/site-packages/pydap/handlers/dap.py in __init__(self, url, application, session, output_grid)
     62
     63         # build the dataset from the DDS and add attributes from the DAS
---> 64         self.dataset = build_dataset(dds)
     65         add_attributes(self.dataset, parse_das(das))
     66

~/anaconda/envs/aist/lib/python3.6/site-packages/pydap/parsers/dds.py in build_dataset(dds)
    159 def build_dataset(dds):
    160     """Return a dataset object from a DDS representation."""
--> 161     return DDSParser(dds).parse()
    162
    163

~/anaconda/envs/aist/lib/python3.6/site-packages/pydap/parsers/dds.py in parse(self)
     47         dataset = DatasetType('nameless')
     48
---> 49         self.consume('dataset')
     50         self.consume('{')
     51         while not self.peek('}'):

~/anaconda/envs/aist/lib/python3.6/site-packages/pydap/parsers/dds.py in consume(self, regexp)
     39     def consume(self, regexp):
     40         """Consume and return a token."""
---> 41         token = super(DDSParser, self).consume(regexp)
     42         self.buffer = self.buffer.lstrip()
     43         return token

~/anaconda/envs/aist/lib/python3.6/site-packages/pydap/parsers/__init__.py in consume(self, regexp)
    180             self.buffer = self.buffer[len(token):]
    181         else:
--> 182             raise Exception("Unable to parse token: %s" % self.buffer[:10])
    183         return token

Exception: Unable to parse token:

Questions:

Is this actually a workflow that should work?
Does this opendap URL actually exist? What is the best way to test that an opendap url from esgf is a valid one?
Is additional authentication required?

The text was updated successfully, but these errors were encountered:

agstephens · 2018-03-19T09:31:25Z

Hi @jhamman, right now I don't have time to look into your issue but please see if this example sheds any light on your questions:
https://github.com/cehbrecht/demo-notebooks/blob/master/esgf-opendap.ipynb

jhamman · 2018-03-21T04:43:17Z

@agstephens - Indeed, I had seen this notebook. As far as I can tell, the problem seems to lie in the use of aggregation context urls to opendap datasets.

cehbrecht · 2018-08-01T14:09:05Z

@jhamman late answer ... there might be several issues but not related to esgf-pyclient. The aggregation might not work but it also looks like that pydap needs to be updated to work with ESGF.

I tried it with a CORDEX aggregation and I can't get pydap working:
https://github.com/cehbrecht/jupyterlab-notebooks/blob/master/esgf-examples/esgf-pydap.ipynb

See also:
https://pydap.readthedocs.io/en/latest/client.html?#earth-system-grid-federation-esgf

saeedvzf · 2020-10-02T15:17:42Z

Hi, I hope you all are doing well,

Can anyone help me to overcome this issue?

My OpenID is working and is connected to my ESGF acc.

Please let me know if you need more information.

Thank you,

Saeed

from pyesgf.search import SearchConnection conn = SearchConnection('https://esgf-index1.ceda.ac.uk/esg-search/',distrib=True)
ctx = conn.new_context(project= 'CORDEX', institute= 'KNMI', time_frequency= 'day', experiment= 'historical', variable= 'tas') ctx.hit_count
result = ctx.search()[14] result.dataset_id
ds = ctx.search()[14] files = ds.file_context().search() len(files)
for f in files: print(f.download_url);
from pyesgf.logon import LogonManager lm = LogonManager() lm.logoff() lm.is_logged_on()
OPENID = 'https://ceda.ac.uk/openid/xxx' lm.logon_with_openid(openid=OPENID, password=None, bootstrap=True) lm.is_logged_on()
password = 'xxx' username = 'xxx' myproxy_host = 'slcs1.ceda.ac.uk' lm.logon(username, password, hostname=myproxy_host, interactive=True, bootstrap=True) lm.is_logged_on()
import xarray as xr
ds = xr.open_dataset(f.download_url) print(ds)
`KeyError Traceback (most recent call last)
D:\Anaconda\envs\gdal\lib\site-packages\xarray\backends\file_manager.py in _acquire_with_cache_info(self, needs_lock)
197 try:
--> 198 file = self._cache[self._key]
199 except KeyError:

D:\Anaconda\envs\gdal\lib\site-packages\xarray\backends\lru_cache.py in getitem(self, key)
52 with self._lock:
---> 53 value = self._cache[key]
54 self._cache.move_to_end(key)

KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('http://esgf1.dkrz.de/thredds/fileServer/cordex/cordex/output/EUR-11/KNMI/ICHEC-EC-EARTH/historical/r3i1p1/KNMI-RACMO22E/v1/day/tas/v20190108/tas_EUR-11_ICHEC-EC-EARTH_historical_r3i1p1_KNMI-RACMO22E_v1_day_20010101-20051231.nc',), 'r', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False))]

During handling of the above exception, another exception occurred:

OSError Traceback (most recent call last)
in
----> 1 ds = xr.open_dataset(f.download_url)
2 print(ds)

D:\Anaconda\envs\gdal\lib\site-packages\xarray\backends\api.py in open_dataset(filename_or_obj, group, decode_cf, mask_and_scale, decode_times, autoclose, concat_characters, decode_coords, engine, chunks, lock, cache, drop_variables, backend_kwargs, use_cftime, decode_timedelta)
506 engine = _get_default_engine(filename_or_obj, allow_remote=True)
507 if engine == "netcdf4":
--> 508 store = backends.NetCDF4DataStore.open(
509 filename_or_obj, group=group, lock=lock, **backend_kwargs
510 )

D:\Anaconda\envs\gdal\lib\site-packages\xarray\backends\netCDF4_.py in open(cls, filename, mode, format, group, clobber, diskless, persist, lock, lock_maker, autoclose)
356 netCDF4.Dataset, filename, mode=mode, kwargs=kwargs
357 )
--> 358 return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose)
359
360 def _acquire(self, needs_lock=True):

D:\Anaconda\envs\gdal\lib\site-packages\xarray\backends\netCDF4_.py in init(self, manager, group, mode, lock, autoclose)
312 self._group = group
313 self._mode = mode
--> 314 self.format = self.ds.data_model
315 self._filename = self.ds.filepath()
316 self.is_remote = is_remote_uri(self._filename)

D:\Anaconda\envs\gdal\lib\site-packages\xarray\backends\netCDF4_.py in ds(self)
365 @Property
366 def ds(self):
--> 367 return self._acquire()
368
369 def open_store_variable(self, name, var):

D:\Anaconda\envs\gdal\lib\site-packages\xarray\backends\netCDF4_.py in _acquire(self, needs_lock)
359
360 def _acquire(self, needs_lock=True):
--> 361 with self._manager.acquire_context(needs_lock) as root:
362 ds = _nc4_require_group(root, self._group, self._mode)
363 return ds

D:\Anaconda\envs\gdal\lib\contextlib.py in enter(self)
111 del self.args, self.kwds, self.func
112 try:
--> 113 return next(self.gen)
114 except StopIteration:
115 raise RuntimeError("generator didn't yield") from None

D:\Anaconda\envs\gdal\lib\site-packages\xarray\backends\file_manager.py in acquire_context(self, needs_lock)
184 def acquire_context(self, needs_lock=True):
185 """Context manager for acquiring a file."""
--> 186 file, cached = self._acquire_with_cache_info(needs_lock)
187 try:
188 yield file

D:\Anaconda\envs\gdal\lib\site-packages\xarray\backends\file_manager.py in _acquire_with_cache_info(self, needs_lock)
202 kwargs = kwargs.copy()
203 kwargs["mode"] = self._mode
--> 204 file = self._opener(*self._args, **kwargs)
205 if self._mode == "w":
206 # ensure file doesn't get overriden when opened again

netCDF4_netCDF4.pyx in netCDF4._netCDF4.Dataset.init()

netCDF4_netCDF4.pyx in netCDF4._netCDF4._ensure_nc_success()

OSError: [Errno -78] NetCDF: Authorization failure: b'http://esgf1.dkrz.de/thredds/fileServer/cordex/cordex/output/EUR-11/KNMI/ICHEC-EC-EARTH/historical/r3i1p1/KNMI-RACMO22E/v1/day/tas/v20190108/tas_EUR-11_ICHEC-EC-EARTH_historical_r3i1p1_KNMI-RACMO22E_v1_day_20010101-20051231.nc'`

larsbuntemeyer · 2020-11-12T11:49:30Z

@saeedvzf I have similar problems. It's probably because you have project= 'CORDEX'. You need special authorization to access that data via open_dap using the CORDEX project_id. I see that you have logged on. So you log on to one of the webportals of ESGF data nodes and check if you are part of the cordex project in your profile. If not, you can simply click something like join Cordex project in the top.

jhamman mentioned this issue Mar 23, 2018

Add ESGF Class jhamman/ingestor#7

Open

cehbrecht added the question label Aug 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accessing opendap datasets #26

Accessing opendap datasets #26

jhamman commented Mar 15, 2018

agstephens commented Mar 19, 2018

jhamman commented Mar 21, 2018

cehbrecht commented Aug 1, 2018

saeedvzf commented Oct 2, 2020

larsbuntemeyer commented Nov 12, 2020 •

edited

Loading

Accessing opendap datasets #26

Accessing opendap datasets #26

Comments

jhamman commented Mar 15, 2018

agstephens commented Mar 19, 2018

jhamman commented Mar 21, 2018

cehbrecht commented Aug 1, 2018

saeedvzf commented Oct 2, 2020

larsbuntemeyer commented Nov 12, 2020 • edited Loading

larsbuntemeyer commented Nov 12, 2020 •

edited

Loading