-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to link a jupyterlab instance running with EGI-dev-checkin to object store at CESNET? #14
Comments
Hi, Here are the docs regarding S3 access: However, at the moment the configuration of Instead, we have updated our documentation to include Rclone: Could we try this out with an example notebook running on https://pangeo-foss4g.vm.fedcloud.eu/jupyterhub/ ? |
Hi @sebastian-luna-valero @tinaok, I think there is a more general question than the one from Tina above: how to have read/write access to an object storage from Jupyterlab, be it from a browser or from a notebook cell? The ultimate goal is to be able to write on this object store Zarr/NetCDF/GeoTIFF datasets using Dask clusters deployed over Kubernetes and Xarray APIs. Being able to browser them with Jupyterlab is maybe a bit less important. If we can have S3 keys, this is great, even if this is manual for each of us. But I guess this means that only trusted people (and not trainees from a workshop) could have one. Using Rclone is not an option. This could be useful for browsing and exploring the object store, but can't be use to write with Xarray. Finally, we could dig into the possibility of using directly Swift through the corresponding fsspec implementation. |
Hi,
I had a brief look at https://pypi.org/project/swiftspec/ (is this the fsspec implementation for Swift?) and I wasn't convinced. You need to specify the user import fsspec
with fsspec.open("swift://server/account/container/object.txt", "r") as f:
print(f.read()) Instead I successfully tested https://pypi.org/project/zarr-swiftstore/ with: auth = {
"preauthurl": os.environ["OS_STORAGE_URL"],
"preauthtoken": os.environ["OS_AUTH_TOKEN"],
} The value for these environment variables can be obtained the same way as for Could you please give it a try and let me know how it goes?
I thought the goal was to enable read/write access to an object store for everybody. If only trainers should need write access, then asking the site admins for S3 credentials is an option. |
Yes, I guess this is the one with the code source here: https://github.com/fsspec/swiftspec. The advantage of an fsspec implementation is that you can use it with Zarr, but also for other file formats. We could also consider contributing to this fsspec package to improve it. How is it a problem to have an account in the URL? I see on the Readme that it also uses the same environment variable (https://github.com/fsspec/swiftspec#authentication). Anyway, the zarr-switfstore package looks also really promising and might solve part of the problem! We need to find all the resources for using Pangeo on a standard Openstack deployment, this is certainly one of them. I will try the two approaches in the next days.
My personal point of view is that it is really not crucial for trainees to have access to object store write. However, we will also develop real use case at scale on this platform, and application developers such as @pl-marasco, @acocac or @tinaok really need to be able to write on this store. I think we should explore the S3 credentials option too! |
Hi,
Maybe I am missing something basic here, but if I write a pipeline to create:
Would you need my credentials to access it? and if I upload my pipeline to GitHub, will somebody be able to re run it? I did try using
Again the problem I see here is with reproducibility and self-service. Something that might not be relevant during the testing phase but it will become a major issue when moving to production. So in my personal opinion, this would be the last resort. |
xref: fsspec/swiftspec#6 |
Thanks @sebastian-luna-valero for all the work and discussion here! So yes, after testing a bit, it seems that https://github.com/fsspec/swiftspec is not really compatible with our needs or CESNET Swift store. It uses an URL to pass a lot of arguments, and just parse it to make the request to the object store, but this don't work for our settings. From what I can see on Openstack Dashboard, we should access Switf with a URL like https://object-store.cloud.muni.cz/swift/v1/pangeo-test/. So I guess our server is https://object-store.cloud.muni.cz/swift. So this leaves us currently with zarr-swiftstore (was not able to test due to my connection problems), which should solve the Zarr distributed writing and reading, and also S3 anonymous access for reads through s3fs. I think a really good thing for the community would be to implement a correct fsspec implementation for Swift, either developing above swiftspec, either starting from scratch. We could probably get a bit of support from Pangeo community and fsspec maintainers. But I've no idea how hard this would be. Finally, about the S3 credentials, I understand your concern, but I'm not sure if we want to reach something as "production" here. Or not in short to medium term. The infrastructure is made for workshops and for scientific research, which rarely have production concerns. Could we at least get some credentials to test? How can we request them? |
Hi, Revisiting this topic today, I just realised that CESNET provides these instructions (which is not available across all EGI Cloud providers): Therefore, getting S3 credentials is self-service:
Use
I just tested it with the S3 Object Storage browser on https://pangeo-foss4g.vm.fedcloud.eu/jupyterhub/ and it works. Please give it a try and let me know how it goes. |
I just tested the Jupyterlab object storage browser extension with these instructions, and it works perfectly well. I'm going to close this issue as I merged #15. Instruction on how to obtain a S3 access/secret keys pair can be found here: https://github.com/pangeo-data/pangeo-eosc/blob/main/EGI-CLI-Swift-S3.md#retrieve-s3-credentials. Thanks a lot @sebastian-luna-valero! |
On our Foss-4g test EOSC instance jupyterlab interface, we have S3 Object Storage Browser.
Here I guess endpoint URL would be 'https://object-store.cloud.muni.cz'
But what would be the Access Key ID or Secret Access Key?
I tried to use token from https://aai.egi.eu/token/ as session Token.
But I got following error.
The text was updated successfully, but these errors were encountered: