Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Python 3.12 #152

Merged
merged 61 commits into from
Jan 30, 2024
Merged

Support Python 3.12 #152

merged 61 commits into from
Jan 30, 2024

Conversation

ap--
Copy link
Collaborator

@ap-- ap-- commented Oct 1, 2023

This is going to be a bit difficult. But it will close #137.

some background

We avoided refactoring UPath for 3.11 by vendoring 3.10 pathlib code as much as possible.
Now things have changed again (3.12) (for the better!) and will change again (3.13) (to waaay better).

The roadmap is going to be, that at some point in time Python 3.13's pathlib.PathBase is available in an early version as a backport for older pythons in the pathlib2 library pathlib-abc library. At that point, we can just base our implementation of UPath on the latest version of PathBase, and support older pythons via the backport.

Until then I introduce a Python3.12 compatible implementation in upath that will be used when you're on 3.12 and otherwise we fallback to the current implementation.

current state

This s a wip PR for now, and I still need to fix:

  • there's an issue with subclassing
  • implementation not working azure (touch() is incorrect)
  • implementation not working hdf
  • implementation not working http
  • implementation not working webdav
  • normalization tests for memory:// uri's need revisiting
  • update PosixUPath and WindowsUPath compatibility tests
  • fix PosixUPath subclassing tests
  • fix issues with LocalPath on windows

todo

  • move non-python 3.12 related changes to individual PRs
  • fix python 3.12 MemoryPath normalization
  • need to update the vendored pathlib tests commented tests fixed in 3.12.1

@ap--
Copy link
Collaborator Author

ap-- commented Jan 26, 2024

Thank you so much for offering to help!

I am working on getting out a pre-release with 3.12 support soon.
It would be great if you could help testing that release before we roll out a new version.

@potiuk
Copy link

potiuk commented Jan 26, 2024

Absolutely. I am subscribed to the issu here, so just comment it here and I will test it. we have a number of test failing now and Airflow has generally quite comprehensive set of tests, so happy to pass it through our machinery.

@ap-- ap-- marked this pull request as ready for review January 27, 2024 16:45
@potiuk
Copy link

potiuk commented Jan 27, 2024

Seeing the green PR - I took it for a spin in Airflow and installed it using github URL from this branch and I can confirm all failures in the pathlib related tests are gone!

@potiuk
Copy link

potiuk commented Jan 27, 2024

We still have a few other tests failing (but they are mostly internal/test related but the pathlib ones are working fine https://github.com/apache/airflow/actions/runs/7680234838/job/20932304380?pr=36755

@ap--
Copy link
Collaborator Author

ap-- commented Jan 28, 2024

@potiuk thanks for testing this branch. Tbh I am surprised that the tests related to UPath pass.

I looked at the custom ObjectStoragePath subclass in airflow (https://github.com/apache/airflow/blob/6f41010269e4a2c052b564f8f0f977abbaf5dfc7/airflow/io/path.py#L71), and the way __new__ is implemented, instantiating ObjectStoragePath should crash on python 3.12 because the _from_parts classmethod does not exist anymore.

Is it possible that the test suite is not running on python 3.12 ?

@potiuk
Copy link

potiuk commented Jan 29, 2024

Yeah. It was my mistake, I simply missed it in the report because we have some other issue that triggers "max recursion" error (connected to utcnow() raising a deprecation warning in Python 3.12) so it clouded the output enough for me to miss the pathlib ones. Luckily the very new feature (a week or so) added in GitHub Actions that allows to quickly search even in long output of CI jobs is a life-saver and i started using it :).

Do you have some suggestions on how we can address it @ap-- - since you looked at the code ?

It's been @bolkedebruin and @uranusjr who implemented and reviewed it originally and I would have to look deeper on how it is implemented - but since the heavy changes to Pathlib implementation in Python 3.12 (making it ACTUALLY extendible) probably our approach should also be to have a different approach (also knowing that Python 3.13 is also bringing changes there).

Any words of advise here?

For a bit of context: @bolkedebruin and @uranusjr - I am working on Python 3.12 compatibility in apache/airflow#36755 and after current PR in universal_pathlib got green - brining universal_pathlib (hopefully) to Python 3.12 compatibility, I added installation of universal_pathlib from this git branch and it would be great if we figure out how to convert our IO implementation to also be compatbile.

Currently the tests of ours fail with (and the reason is that _from_parts is removed from both Pathlib (I think) and UniversalPathlib (I am sure):

https://github.com/apache/airflow/actions/runs/7688925657/job/20950993493?pr=36755#step:6:9018

 tests/always/test_example_dags.py:87: AssertionError
  ----------------------------- Captured stderr call -----------------------------
  INFO  [airflow.models.dagbag.DagBag] Filling up the DagBag from /opt/airflow/tests/system/providers/common/io/example_file_transfer_local_to_s3.py
  ERROR [airflow.models.dagbag.DagBag] Failed to import: /opt/airflow/tests/system/providers/common/io/example_file_transfer_local_to_s3.py
  Traceback (most recent call last):
    File "/opt/airflow/airflow/models/dagbag.py", line 344, in parse
      loader.exec_module(new_module)
    File "<frozen importlib._bootstrap_external>", line 994, in exec_module
    File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
    File "/opt/airflow/tests/system/providers/common/io/example_file_transfer_local_to_s3.py", line 35, in <module>
      TEMP_FILE_PATH = ObjectStoragePath("file:///tmp")
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/opt/airflow/airflow/io/path.py", line 153, in __new__
      return cls._from_parts(args_list, url=parsed_url, conn_id=conn_id, **kwargs)  # type: ignore
             ^^^^^^^^^^^^^^^
  AttributeError: type object 'ObjectStoragePath' has no attribute '_from_parts'. Did you mean: '_load_parts'?
  ------------------------------ Captured log call -------------------------------
  INFO     airflow.models.dagbag.DagBag:dagbag.py:538 Filling up the DagBag from /opt/airflow/tests/system/providers/common/io/example_file_transfer_local_to_s3.py
  ERROR    airflow.models.dagbag.DagBag:dagbag.py:348 Failed to import: /opt/airflow/tests/system/providers/common/io/example_file_transfer_local_to_s3.py
  Traceback (most recent call last):
    File "/opt/airflow/airflow/models/dagbag.py", line 344, in parse
      loader.exec_module(new_module)
    File "<frozen importlib._bootstrap_external>", line 994, in exec_module
    File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
    File "/opt/airflow/tests/system/providers/common/io/example_file_transfer_local_to_s3.py", line 35, in <module>
      TEMP_FILE_PATH = ObjectStoragePath("file:///tmp")
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/opt/airflow/airflow/io/path.py", line 153, in __new__
      return cls._from_parts(args_list, url=parsed_url, conn_id=conn_id, **kwargs)  # type: ignore
             ^^^^^^^^^^^^^^^
  AttributeError: type object 'ObjectStoragePath' has no attribute '_from_parts'. Did you mean: '_load_parts'?

Any ideas for good approach we can take there?

@ap--
Copy link
Collaborator Author

ap-- commented Jan 29, 2024

It's been @bolkedebruin and @uranusjr who implemented and reviewed it originally and I would have to look deeper on how it is implemented - but since the heavy changes to Pathlib implementation in Python 3.12 (making it ACTUALLY extendible) probably our approach should also be to have a different approach (also knowing that Python 3.13 is also bringing changes there).

Any words of advise here?

I surveyed custom user subclasses on GitHub and collected a list of features in #172 that would need to land in UPath to allow simpler customization of UPath's behavior. I also opened a new issue in the airflow repository apache/airflow#37067 to track the progress of what's needed to make this change as smooth as possible.

It would be best if I could have some more insight into the intention behind the custom UPath implementation and see if I would cover all cases with #172.

@bolkedebruin
Copy link

Just for reference @potiuk : I was aware of the development direction of Pathlib but it was a bit of a moving target at the time. Hence deciding to use the 'current' and wait for something more stable to arrive, assuming that the user facing api of pathlib will mostly stay the same.

@ap-- ap-- merged commit f7af174 into fsspec:main Jan 30, 2024
18 checks passed
@ap--
Copy link
Collaborator Author

ap-- commented Jan 30, 2024

For everyone following this PR, a new release will be out once #172 is fixed.
Should be within the next days.

@potiuk
Copy link

potiuk commented Jan 31, 2024

Cool

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Python 3.12 support
3 participants