Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DirFileSystem #745

Merged
merged 41 commits into from
Jan 14, 2022
Merged

Add DirFileSystem #745

merged 41 commits into from
Jan 14, 2022

Conversation

lucmos
Copy link
Contributor

@lucmos lucmos commented Sep 9, 2021

The DirFileSystem is a filesystem-wrapper. It assumes every path it is dealing with
is relative to its path. After performing the necessary paths operations it delegates
everything to the wrapped filesystem.

Resolves #395

@lucmos lucmos force-pushed the feature/prefixfs branch 2 times, most recently from 3e5f154 to 7686547 Compare September 9, 2021 08:15
@martindurant
Copy link
Member

Since this is quite substantial, it will take us some time to review. Please bear with us. @isidentical , do you or any of your team have an interest here?

@lucmos , am I right in thinking that this implementation wouldn't participate in URL unpacking, i.e., there would be no point in doing something like fsspec.open("prefix://mypath::s3://", prefix={"root": "mybucket"})

@isidentical
Copy link
Member

CC: @efiop

@efiop
Copy link
Member

efiop commented Sep 21, 2021

From dvc perspective, we don't have a notion of cwd in our internals, and choose to operate on absolute paths, which has its problems. We might consider trying this out for localfs, but it is not going to be soon 🙁 prefixfs is cool though 🙂

@lucmos
Copy link
Contributor Author

lucmos commented Sep 22, 2021

Hello, thanks for the replies and comments!

@martindurant yes, the idea is that PrefixFileSystem would only wrap another filesystem already instantiated, without any additional URL unpacking, and modify all paths to be relative to its own prefix.
I think in the future it may make sense to consider a built-in wrapping, e.g. fs = fsspec.filesystem('file', prefix='myroot') or fsspec.open("s3://my-path-relative-to-mybucket", prefix="mybucket") would internally wrap the file system instantiated with PrefixFileSystem , but I think it is out of scope for this MR

@efiop Thanks for the review! I will work on it as soon as possible

@lucmos lucmos force-pushed the feature/prefixfs branch 2 times, most recently from 8517ccb to 6afff88 Compare September 22, 2021 11:51
@lucmos
Copy link
Contributor Author

lucmos commented Sep 22, 2021

I would like to increase the number of tests by reusing the tests written for other file system, and wrapping that filesystem in this PrefixFileSystem.
Do you have any pointer on how to do that?

I tried to do this with the LocalFileSystem but it is not straightforward to re-parametrize everything to work with both LocalFileSystem and PrefixFileSystem

@martindurant
Copy link
Member

reusing the tests written for other file system, and wrapping that filesystem in this PrefixFileSystem.
Do you have any pointer on how to do that?

Not really! Again, the idea in #651 might give a framework that makes such a thing easier, but it's still not super-clear.

@martindurant
Copy link
Member

(I don't think #651 is making any progress, though)

@efiop
Copy link
Member

efiop commented Sep 24, 2021

@lucmos Btw, just curious, what's the plan with AbstractFile (maybe also something else that I'm missing?) that has path "public" property, would we need to wrap that as well somehow?

@lucmos
Copy link
Contributor Author

lucmos commented Sep 24, 2021

@efiop not sure I got what you are referring to. Do you have a link to AbstractFile?

@martindurant
Copy link
Member

That would be fsspec.spec.AbstractBufferedFile , used by several backends which don't have their own native file-like object. The instances have an fs attribute holding the parent filesystem, so it makes sense if their .path is in the terms of that filesystem. Except that we are probably making this file-like object via the wrapped filesystem's .open method, which won' know that the filesystem has been wrapped.

However, where there is a native file of file wrapper (like LocalFileOpener), probably that instance contains details of the path in terms of the wrapped filesystem and we can do nothing about it. If it's not in a .path attribute, it might not matter.

@lucmos
Copy link
Contributor Author

lucmos commented Sep 24, 2021

I see, thanks for the explanation!

Would something like this be enough?

https://github.com/intake/filesystem_spec/blob/c7f763f3810abbb84707b602ba37435ca6afa559/fsspec/implementations/prefix.py#L158-L163

@martindurant
Copy link
Member

That would work, in the sense that you would be able to open the right file, but its .path attribute would be the full one, and .fs attribute, if present, would be the wrapped filesystem, not the prefix filesystem. Maybe that's OK (but the docstring for open should say this clearly).

@lucmos
Copy link
Contributor Author

lucmos commented Sep 27, 2021

I see, I think it may be OK to do this and document in the docstrings.
What do you think?

@martindurant
Copy link
Member

do this and document in the docstrings

That's probably the easiest thing to do, I don't have a problem with it.

@lucmos
Copy link
Contributor Author

lucmos commented Oct 1, 2021

Sorry for the delay, I updated the open docstring

fsspec/spec.py Outdated Show resolved Hide resolved
@martindurant
Copy link
Member

Please add your implementation to the API docs.

@efiop efiop changed the title Add DirFileSystem [WIP] Add DirFileSystem Jan 14, 2022
@efiop efiop changed the title [WIP] Add DirFileSystem Add DirFileSystem Jan 14, 2022
@efiop efiop marked this pull request as ready for review January 14, 2022 19:11
@efiop
Copy link
Member

efiop commented Jan 14, 2022

Ok, I've added unit tests for all methods for sync/async_impl/asynchronous filesystems, to keep it lightweight and maintainable. Ideally, some func tests would be really nice too, but we don't yet have nice fixtures in fsspec (we have some in dvc, but didn't get to contributing them yet) to reuse different filesystems in a unified way (e.g. fixtures that will automatically generate temp paths in every filesystem, etc), so leaving it for later.

I'm not happy with some *args, **kwargs in methods, because they try to preserve positional args, but sync/async specs lack * arg guards in them, so we'll likely get back to that later. There are also some questionable transformations or lack thereof in results of some methods (e.g. name is transformed in ls because it is useful and is not transformed in info, as well as destination). But we can add those on top.

@efiop efiop merged commit 00b8123 into fsspec:master Jan 14, 2022
@efiop efiop added the enhancement New feature or request label Jan 14, 2022
efiop added a commit to efiop/filesystem_spec that referenced this pull request Jan 25, 2022
This is used in multiple implementations (s3fs, http, etc) that allow
_external_ async use and is documented in fsspec docs (but the example
is http-specific), so let's add it to the spec, so that we could officially
wrap it in fsspec#745
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow setting the root of a filesystem
6 participants