-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python] Add date32 support to __dataframe__ protocol #39539
Comments
It would also be good to add 64-bit date type, 32 and 64-bit time type plus duration type. Contributions are more than welcome as there is no immediate plan to work on this. I can guide anybody interested! ❤️ |
The interchange protocol currently doesn't define a date type AFAIK (https://data-apis.org/dataframe-protocol/latest/API.html#interface), so do you expect it to be written as DATETIME? |
Yes, that was my idea. Similar to what polars does: https://github.com/pola-rs/polars/blob/2b43fc1ac1af84ed118ff3f8840d328a12c35510/py-polars/polars/interchange/utils.py#L35-L54 |
Date and Duration data type classes are added to the staging branch of the protocol: https://github.com/data-apis/dataframe-api/blob/c5f08352e0a1d25387fe1737ffe9cccb36f554f7/spec/API_specification/dataframe_api/dtypes.py#L50 which I guess should be the draft docs page? https://data-apis.org/dataframe-api/draft/API_specification/index.html But I am not sure if this will move forward soon. |
That's for the standard API, though, not for the interchange protocol (I was confused as well, and so wrote a wrong comment on the PR adding it asking for clarification ;) -> data-apis/dataframe-api#197)
Personally I think it would be better if this was first clarified or added in the interchange protocol. While for date it does make some sense (as you could just see it as a different resolution of datetime), duration is really different. And for example the pandas implementation also wouldn't support consuming duration. And pyarrow only supports consuming datetime as timestamp, not even date. |
Oooh, sorry for taking you into the wrong direction!
That does make sense 👍 |
@jorisvandenbossche my expectation was that the buffer would contain 32 bit integers (date64 would be 64). The consumer would be responsible for interpreting that correctly to the appropriate date based off of the precision defined in the format string |
(existing upstream issue about duration/timedelta: data-apis/dataframe-api#329) |
Let me know if you think this is a distinct issue, but I ran into a different error message when converting a Date32 from Polars through the DataFrame interchange protocol. import datetime
import polars as pl
from pyarrow.interchange import from_dataframe
from_dataframe(pl.DataFrame({"date": [datetime.date(2024, 3, 22)]}))
What's the current thinking on the best way forward to support this? |
Thank you for contributing to the discussion @jonmmease. I see that libraries are working around this by defining date and time types as protocol DATETIME data type with Apache Arrow C Data Interface format string (example I do not mind going about it in similar way in PyArrow until date is added to the dataframe protocol spec. Also adding the option to consume this data type. It would be ideal, though, that this is clarified and set in the protocol first. @jorisvandenbossche, what do you think? |
I would still prefer someone to first do a PR to the spec to add this. If it is just clarifying that the existing
AFAIK pandas doesn't actually support this for duration, at least not for the default timedelta dtype (from testing with pandas main):
FWIW, my proposal to add support for the Arrow PyCapsule protocol to the interchange standard (data-apis/dataframe-api#342) would also solve this for the case of polars and pyarrow, as both are Arrow-memory based, and could interchange easily those data types. We could start checking for that protocol in |
Thank you for clarification Joris! I propose we start with a PR to the dataframe protocol specification to add that the existing DATETIME dtype kind can also be used for other Arrow date and time dtype (not duration). I will do this today/tomorrow. The proposal to add support for the Arrow PyCapsule protocol to the interchange standard would be great in my opinion. I hope it will move forward otherwise the libs involved will start checking for the protocol by themselves like you have suggested. |
Describe the enhancement requested
Component(s)
Python
The text was updated successfully, but these errors were encountered: