Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accessing opendap datasets #26

Open
jhamman opened this issue Mar 15, 2018 · 5 comments
Open

Accessing opendap datasets #26

jhamman opened this issue Mar 15, 2018 · 5 comments
Labels

Comments

@jhamman
Copy link

jhamman commented Mar 15, 2018

I am working on what I think is a fairly common workflow:

  1. log on to ESGS using the LogonManager class
  2. search for some datasets using the SearchConnection class
  3. access some opendap dataset using netcdf4-python or pydap

Here's an example workflow:

In [1]: openid = 'https://esgf-node.llnl.gov/esgf-idp/openid/SECRET'
   ...: password = 'SECRET'
   ...:

In [2]: from pyesgf.logon import LogonManager
   ...: from pyesgf.search import SearchConnection
   ...: import xarray as xr
   ...:

In [3]: # intialize the logon manager
   ...: lm = LogonManager(verify=True)
   ...: if not lm.is_logged_on():
   ...:     lm.logon_with_openid(openid, password, 'pcmdi9.llnl.gov')
   ...: lm.is_logged_on()
   ...:
Out[3]: True

In [4]: def print_context_info(ctx):
   ...:     print('Hits:', ctx.hit_count)
   ...:     print('Realms:', ctx.facet_counts['experiment'])
   ...:     print('Realms:', ctx.facet_counts['realm'])
   ...:     print('Ensembles:', ctx.facet_counts['ensemble'])
   ...:

In [5]: # search for some data
   ...: conn = SearchConnection('http://pcmdi9.llnl.gov/esg-search', distrib=Tru
   ...: e)
   ...: ctx = conn.new_context(project='CMIP5', model='CCSM4', experiment='rcp85
   ...: ', time_frequency='day')
   ...: ctx = ctx.constrain(realm='atmos', ensemble='r1i1p1')
   ...:
   ...: # print a summary of what we found
   ...: print_context_info(ctx)
   ...:
Hits: 4
Realms: {'rcp85': 4}
Realms: {'atmos': 4}
Ensembles: {'r1i1p1': 4}

In [6]: # aggregate results
   ...: result = ctx.search()[0]
   ...: agg_ctx = result.aggregation_context()
   ...:
   ...: # get a list of opendap urls
   ...: x = list(a.opendap_url for a in agg_ctx.search() if a.opendap_url)
   ...: x
   ...:
Out[6]:
['http://aims3.llnl.gov/thredds/dodsC/cmip5.output1.NCAR.CCSM4.rcp85.day.atmos.day.r1i1p1.tasmin.20120705.aggregation.1',
 'http://aims3.llnl.gov/thredds/dodsC/cmip5.output1.NCAR.CCSM4.rcp85.day.atmos.day.r1i1p1.tasmax.20120705.aggregation.1',
 'http://aims3.llnl.gov/thredds/dodsC/cmip5.output1.NCAR.CCSM4.rcp85.day.atmos.day.r1i1p1.prc.20120705.aggregation.1',
 'http://aims3.llnl.gov/thredds/dodsC/cmip5.output1.NCAR.CCSM4.rcp85.day.atmos.day.r1i1p1.psl.20120705.aggregation.1',
 'http://aims3.llnl.gov/thredds/dodsC/cmip5.output1.NCAR.CCSM4.rcp85.day.atmos.day.r1i1p1.tas.20120705.aggregation.1',
 'http://aims3.llnl.gov/thredds/dodsC/cmip5.output1.NCAR.CCSM4.rcp85.day.atmos.day.r1i1p1.pr.20120705.aggregation.1']

In [7]: # try opening one of the opendap datasets
   ...: xr.open_dataset(x[0], engine='pydap')
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-7-90d39efb83f7> in <module>()
      1 # try opening one of the opendap datasets
----> 2 xr.open_dataset(x[0], engine='pydap')

~/anaconda/envs/aist/lib/python3.6/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, group, decode_cf, mask_and_scale, decode_times, autoclose, concat_characters, decode_coords, engine, chunks, lock, cache, drop_variables)
    302                                             autoclose=autoclose)
    303         elif engine == 'pydap':
--> 304             store = backends.PydapDataStore.open(filename_or_obj)
    305         elif engine == 'h5netcdf':
    306             store = backends.H5NetCDFStore(filename_or_obj, group=group,

~/anaconda/envs/aist/lib/python3.6/site-packages/xarray/backends/pydap_.py in open(cls, url, session)
     75     def open(cls, url, session=None):
     76         import pydap.client
---> 77         ds = pydap.client.open_url(url, session=session)
     78         return cls(ds)
     79

~/anaconda/envs/aist/lib/python3.6/site-packages/pydap/client.py in open_url(url, application, session, output_grid)
     62     never retrieve coordinate axes.
     63     """
---> 64     dataset = DAPHandler(url, application, session, output_grid).dataset
     65
     66     # attach server-side functions

~/anaconda/envs/aist/lib/python3.6/site-packages/pydap/handlers/dap.py in __init__(self, url, application, session, output_grid)
     62
     63         # build the dataset from the DDS and add attributes from the DAS
---> 64         self.dataset = build_dataset(dds)
     65         add_attributes(self.dataset, parse_das(das))
     66

~/anaconda/envs/aist/lib/python3.6/site-packages/pydap/parsers/dds.py in build_dataset(dds)
    159 def build_dataset(dds):
    160     """Return a dataset object from a DDS representation."""
--> 161     return DDSParser(dds).parse()
    162
    163

~/anaconda/envs/aist/lib/python3.6/site-packages/pydap/parsers/dds.py in parse(self)
     47         dataset = DatasetType('nameless')
     48
---> 49         self.consume('dataset')
     50         self.consume('{')
     51         while not self.peek('}'):

~/anaconda/envs/aist/lib/python3.6/site-packages/pydap/parsers/dds.py in consume(self, regexp)
     39     def consume(self, regexp):
     40         """Consume and return a token."""
---> 41         token = super(DDSParser, self).consume(regexp)
     42         self.buffer = self.buffer.lstrip()
     43         return token

~/anaconda/envs/aist/lib/python3.6/site-packages/pydap/parsers/__init__.py in consume(self, regexp)
    180             self.buffer = self.buffer[len(token):]
    181         else:
--> 182             raise Exception("Unable to parse token: %s" % self.buffer[:10])
    183         return token

Exception: Unable to parse token:

Questions:

  1. Is this actually a workflow that should work?
  2. Does this opendap URL actually exist? What is the best way to test that an opendap url from esgf is a valid one?
  3. Is additional authentication required?
@agstephens
Copy link
Contributor

Hi @jhamman, right now I don't have time to look into your issue but please see if this example sheds any light on your questions:
https://github.com/cehbrecht/demo-notebooks/blob/master/esgf-opendap.ipynb

@jhamman
Copy link
Author

jhamman commented Mar 21, 2018

@agstephens - Indeed, I had seen this notebook. As far as I can tell, the problem seems to lie in the use of aggregation context urls to opendap datasets.

@cehbrecht
Copy link
Collaborator

@jhamman late answer ... there might be several issues but not related to esgf-pyclient. The aggregation might not work but it also looks like that pydap needs to be updated to work with ESGF.

I tried it with a CORDEX aggregation and I can't get pydap working:
https://github.com/cehbrecht/jupyterlab-notebooks/blob/master/esgf-examples/esgf-pydap.ipynb

See also:
https://pydap.readthedocs.io/en/latest/client.html?#earth-system-grid-federation-esgf

@saeedvzf
Copy link

saeedvzf commented Oct 2, 2020

Hi, I hope you all are doing well,

Can anyone help me to overcome this issue?

My OpenID is working and is connected to my ESGF acc.

Please let me know if you need more information.

Thank you,

Saeed

from pyesgf.search import SearchConnection conn = SearchConnection('https://esgf-index1.ceda.ac.uk/esg-search/',distrib=True)
ctx = conn.new_context(project= 'CORDEX', institute= 'KNMI', time_frequency= 'day', experiment= 'historical', variable= 'tas') ctx.hit_count
result = ctx.search()[14] result.dataset_id
ds = ctx.search()[14] files = ds.file_context().search() len(files)
for f in files: print(f.download_url);
from pyesgf.logon import LogonManager lm = LogonManager() lm.logoff() lm.is_logged_on()
OPENID = 'https://ceda.ac.uk/openid/xxx' lm.logon_with_openid(openid=OPENID, password=None, bootstrap=True) lm.is_logged_on()
password = 'xxx' username = 'xxx' myproxy_host = 'slcs1.ceda.ac.uk' lm.logon(username, password, hostname=myproxy_host, interactive=True, bootstrap=True) lm.is_logged_on()
import xarray as xr
ds = xr.open_dataset(f.download_url) print(ds)
`KeyError Traceback (most recent call last)
D:\Anaconda\envs\gdal\lib\site-packages\xarray\backends\file_manager.py in _acquire_with_cache_info(self, needs_lock)
197 try:
--> 198 file = self._cache[self._key]
199 except KeyError:

D:\Anaconda\envs\gdal\lib\site-packages\xarray\backends\lru_cache.py in getitem(self, key)
52 with self._lock:
---> 53 value = self._cache[key]
54 self._cache.move_to_end(key)

KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('http://esgf1.dkrz.de/thredds/fileServer/cordex/cordex/output/EUR-11/KNMI/ICHEC-EC-EARTH/historical/r3i1p1/KNMI-RACMO22E/v1/day/tas/v20190108/tas_EUR-11_ICHEC-EC-EARTH_historical_r3i1p1_KNMI-RACMO22E_v1_day_20010101-20051231.nc',), 'r', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False))]

During handling of the above exception, another exception occurred:

OSError Traceback (most recent call last)
in
----> 1 ds = xr.open_dataset(f.download_url)
2 print(ds)

D:\Anaconda\envs\gdal\lib\site-packages\xarray\backends\api.py in open_dataset(filename_or_obj, group, decode_cf, mask_and_scale, decode_times, autoclose, concat_characters, decode_coords, engine, chunks, lock, cache, drop_variables, backend_kwargs, use_cftime, decode_timedelta)
506 engine = _get_default_engine(filename_or_obj, allow_remote=True)
507 if engine == "netcdf4":
--> 508 store = backends.NetCDF4DataStore.open(
509 filename_or_obj, group=group, lock=lock, **backend_kwargs
510 )

D:\Anaconda\envs\gdal\lib\site-packages\xarray\backends\netCDF4_.py in open(cls, filename, mode, format, group, clobber, diskless, persist, lock, lock_maker, autoclose)
356 netCDF4.Dataset, filename, mode=mode, kwargs=kwargs
357 )
--> 358 return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose)
359
360 def _acquire(self, needs_lock=True):

D:\Anaconda\envs\gdal\lib\site-packages\xarray\backends\netCDF4_.py in init(self, manager, group, mode, lock, autoclose)
312 self._group = group
313 self._mode = mode
--> 314 self.format = self.ds.data_model
315 self._filename = self.ds.filepath()
316 self.is_remote = is_remote_uri(self._filename)

D:\Anaconda\envs\gdal\lib\site-packages\xarray\backends\netCDF4_.py in ds(self)
365 @Property
366 def ds(self):
--> 367 return self._acquire()
368
369 def open_store_variable(self, name, var):

D:\Anaconda\envs\gdal\lib\site-packages\xarray\backends\netCDF4_.py in _acquire(self, needs_lock)
359
360 def _acquire(self, needs_lock=True):
--> 361 with self._manager.acquire_context(needs_lock) as root:
362 ds = _nc4_require_group(root, self._group, self._mode)
363 return ds

D:\Anaconda\envs\gdal\lib\contextlib.py in enter(self)
111 del self.args, self.kwds, self.func
112 try:
--> 113 return next(self.gen)
114 except StopIteration:
115 raise RuntimeError("generator didn't yield") from None

D:\Anaconda\envs\gdal\lib\site-packages\xarray\backends\file_manager.py in acquire_context(self, needs_lock)
184 def acquire_context(self, needs_lock=True):
185 """Context manager for acquiring a file."""
--> 186 file, cached = self._acquire_with_cache_info(needs_lock)
187 try:
188 yield file

D:\Anaconda\envs\gdal\lib\site-packages\xarray\backends\file_manager.py in _acquire_with_cache_info(self, needs_lock)
202 kwargs = kwargs.copy()
203 kwargs["mode"] = self._mode
--> 204 file = self._opener(*self._args, **kwargs)
205 if self._mode == "w":
206 # ensure file doesn't get overriden when opened again

netCDF4_netCDF4.pyx in netCDF4._netCDF4.Dataset.init()

netCDF4_netCDF4.pyx in netCDF4._netCDF4._ensure_nc_success()

OSError: [Errno -78] NetCDF: Authorization failure: b'http://esgf1.dkrz.de/thredds/fileServer/cordex/cordex/output/EUR-11/KNMI/ICHEC-EC-EARTH/historical/r3i1p1/KNMI-RACMO22E/v1/day/tas/v20190108/tas_EUR-11_ICHEC-EC-EARTH_historical_r3i1p1_KNMI-RACMO22E_v1_day_20010101-20051231.nc'`

@larsbuntemeyer
Copy link
Contributor

larsbuntemeyer commented Nov 12, 2020

@saeedvzf I have similar problems. It's probably because you have project= 'CORDEX'. You need special authorization to access that data via open_dap using the CORDEX project_id. I see that you have logged on. So you log on to one of the webportals of ESGF data nodes and check if you are part of the cordex project in your profile. If not, you can simply click something like join Cordex project in the top.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants