Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offer improved customization options for user subclasses of UPath #173

Merged
merged 50 commits into from
Feb 8, 2024

Conversation

ap--
Copy link
Collaborator

@ap-- ap-- commented Jan 30, 2024

WIP PR to fix #172

Please refer to the issue for a detailed description.

TODO

  • (1) provide customizable fsspec filesystem factory classmethod
  • (ALL) detect custom _accessor in py3.12 classes and provide warning with instructions on how to migrate.
  • docs (2) provide instructions for how to migrate _FSSpecAccessor._format_path to .path (with potential customization) (moved to new issue)
  • docs (3) recommend overriding UPath.mkdir, etc... directly. (moved to new issue)
  • (4) provide customizable fsspec storage_option extraction classmethod
  • Tests for all use cases
  • Need to fix (or skip?) the pathlib 3.8-3.11 tests...
  • Port the fsspec glob compatibility check to the py3.12 UPath implementation
  • Fix subclassing tests in pathlib3.12 test suite

NOTES

  • (4) is possible now via either overwriting __init__ or by providing a custom _transform_init_args method (needs more tests)
  • Since we need to refactor again in the near future when introducing pathlib_abc as a dependency, it made more sense to now make the Python3.12 implementation the default, and provide backwards support for older versions.

@ap-- ap-- added this to the v0.2 milestone Jan 30, 2024
@ap-- ap-- self-assigned this Jan 30, 2024
@ap--
Copy link
Collaborator Author

ap-- commented Feb 5, 2024

All current tests pass.
Only things missing are migration docs and some migration tests.

@ap--
Copy link
Collaborator Author

ap-- commented Feb 7, 2024

@potiuk could you please run your tests once more?

This implementation should now work without changes to ObjectStoragePath. It'll throw a ton of deprecation warnings but should in theory be completely backwards compatible. Anyhow, I'll try to add the migration docs asap and get the new release out.

@potiuk
Copy link

potiuk commented Feb 7, 2024

@potiuk
Copy link

potiuk commented Feb 8, 2024

Rerun it again :) ?

@ap--
Copy link
Collaborator Author

ap-- commented Feb 8, 2024

Rerun it again :) ?

Thanks! But I think it's not necessary.

The ObjectStoragePath subclass accesses and overwrites a lot of private functionality.
It won't be possible to support all of this in a backwards compatible manner.

I do now cover a lot of different use cases for maintainable subclassing though.

And thank you so much for your fast response time and help with running your test suite!
It uncovered quite a few edge cases that hadn't come to mind otherwise.

@ap-- ap-- merged commit 5c240c1 into fsspec:main Feb 8, 2024
18 checks passed
@potiuk
Copy link

potiuk commented Feb 10, 2024

Ok. I see it's merged now. If I understand correctly - the universal_pathlib will likely release a 0.2.0 (?) version with both Py3.12 some time soon - and the way we are using 0.1.4 is not going to be compatible with it ? Or maybe it will be 0.1.5 or 1.0 :)?

I am asking because we are just about to release Airflow 2.8.2, and since we know that future releases of universal_pathlib will break Airflow, we would rather put some limits in our dependencies to prevent it. Currently we have >=0.1.4 - as we do for most of our dependencies, as we only upper-bind dependencies when we expect breaking changes (whch is the case).

When do you expect this to happen (very roughly, I know those things are difficult to foresee) ?

Do I understand correctly that we will have to change our code to adapt to the new interfaces? Can you help with that - maybe together with @bolkedebruin - you can agree on implementing the right set of changes that will be less "private" and future compatible.

@ap--
Copy link
Collaborator Author

ap-- commented Feb 10, 2024

I went through all code todos now, and am ready to release a new version.
The new version number will be v0.2.0.
Release will be earliest Monday 12.02 ~2000 UTC, but also not too much later.

Right now I still need to write instructions on how to migrate old patterns to the newer style, to make the transition smooth for downstream libraries that subclassed UPath.

If you release a new version of Airflow without changes to your ObjectStoragePath implementation, then I recommend to guard your dependencies.

I'm happy to help with a PR to port ObjectStoragePath to the new style.
It should be possible to make changes to support both 0.1.4 and 0.2.0+ versions too.

@potiuk
Copy link

potiuk commented Feb 10, 2024

Thanks! Scheduled it for 2.8.2 release with apache/airflow#37311. We already have good mechanism to protect against such cases (i.e. constraints ) but not everyone is using them, so yeah, until we migrate, we will limit it.

potiuk added a commit to potiuk/airflow that referenced this pull request Feb 10, 2024
The Universal Pathlib provides  Pathlib-like interface for FSSPEC
In 0.1. *It was not very well defined for extension, so the way how we use it for 0.1.*
so we used a lot of private methods and attributes that were not defined in the interface
an they are broken with version 0.2.0 which is much better suited for extension and supports
Python 3.12. We should limit it, unti we migrate to 0.2.0
See: fsspec/universal_pathlib#173 (comment)
This is prerequistite to make Airflow compatible with Python 3.12
Tracked in apache#36755
potiuk added a commit to apache/airflow that referenced this pull request Feb 10, 2024
The Universal Pathlib provides  Pathlib-like interface for FSSPEC
In 0.1. *It was not very well defined for extension, so the way how we use it for 0.1.*
so we used a lot of private methods and attributes that were not defined in the interface
an they are broken with version 0.2.0 which is much better suited for extension and supports
Python 3.12. We should limit it, unti we migrate to 0.2.0
See: fsspec/universal_pathlib#173 (comment)
This is prerequistite to make Airflow compatible with Python 3.12
Tracked in #36755
potiuk added a commit to apache/airflow that referenced this pull request Feb 13, 2024
The Universal Pathlib provides  Pathlib-like interface for FSSPEC
In 0.1. *It was not very well defined for extension, so the way how we use it for 0.1.*
so we used a lot of private methods and attributes that were not defined in the interface
an they are broken with version 0.2.0 which is much better suited for extension and supports
Python 3.12. We should limit it, unti we migrate to 0.2.0
See: fsspec/universal_pathlib#173 (comment)
This is prerequistite to make Airflow compatible with Python 3.12
Tracked in #36755

(cherry picked from commit 1301274)
ephraimbuddy pushed a commit to apache/airflow that referenced this pull request Feb 22, 2024
The Universal Pathlib provides  Pathlib-like interface for FSSPEC
In 0.1. *It was not very well defined for extension, so the way how we use it for 0.1.*
so we used a lot of private methods and attributes that were not defined in the interface
an they are broken with version 0.2.0 which is much better suited for extension and supports
Python 3.12. We should limit it, unti we migrate to 0.2.0
See: fsspec/universal_pathlib#173 (comment)
This is prerequistite to make Airflow compatible with Python 3.12
Tracked in #36755

(cherry picked from commit 1301274)
@swails
Copy link

swails commented Jun 4, 2024

I recently ran a git-bisect to identify a regression that impacted my use of S3Path, and it identified 5c240c1db985dff4899c1f84449054f2ad1cfe66 as the "first bad commit". here's the simple test:

from upath.implementations.cloud import S3Path

assert str(S3Path("s3://bucket/prefix/to/key/")).endswith("/")

After this PR, S3Path started stripping trailing slashes. While this doesn't matter for traditional paths, S3 paths are really prefixes in a large, flat namespace, and the impact can be severe. Hive naming schemes are pretty commonly used, and we run into problems here, because s3://bucket/path/to/partition=1/ will return only files from partition=1/*. When S3Path started stripping trailing slashes, the silent respelling to s3://bucket/path/to/partition=1 began to match any partition 10, 11, 12, ..., 19, 100, 101, ..., 199, etc.

Perhaps the problem is that the way we're using S3 paths is too different from the abstract concept of a Path, and we need to stick with strings, but this is at least a regression in behavior that started with v0.2.0 (works fine in v0.1.4).

kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Jul 19, 2024
The Universal Pathlib provides  Pathlib-like interface for FSSPEC
In 0.1. *It was not very well defined for extension, so the way how we use it for 0.1.*
so we used a lot of private methods and attributes that were not defined in the interface
an they are broken with version 0.2.0 which is much better suited for extension and supports
Python 3.12. We should limit it, unti we migrate to 0.2.0
See: fsspec/universal_pathlib#173 (comment)
This is prerequistite to make Airflow compatible with Python 3.12
Tracked in apache/airflow#36755
GitOrigin-RevId: 13012744ada457883e57848f6fc45454d9c25a4c
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Sep 20, 2024
The Universal Pathlib provides  Pathlib-like interface for FSSPEC
In 0.1. *It was not very well defined for extension, so the way how we use it for 0.1.*
so we used a lot of private methods and attributes that were not defined in the interface
an they are broken with version 0.2.0 which is much better suited for extension and supports
Python 3.12. We should limit it, unti we migrate to 0.2.0
See: fsspec/universal_pathlib#173 (comment)
This is prerequistite to make Airflow compatible with Python 3.12
Tracked in apache/airflow#36755
GitOrigin-RevId: 13012744ada457883e57848f6fc45454d9c25a4c
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Nov 9, 2024
The Universal Pathlib provides  Pathlib-like interface for FSSPEC
In 0.1. *It was not very well defined for extension, so the way how we use it for 0.1.*
so we used a lot of private methods and attributes that were not defined in the interface
an they are broken with version 0.2.0 which is much better suited for extension and supports
Python 3.12. We should limit it, unti we migrate to 0.2.0
See: fsspec/universal_pathlib#173 (comment)
This is prerequistite to make Airflow compatible with Python 3.12
Tracked in apache/airflow#36755
GitOrigin-RevId: 13012744ada457883e57848f6fc45454d9c25a4c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provide py38-py311 and py312 compatible way for subclassing
3 participants