-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: idiomatic way of elegantly retrieving the underlying DataFrame type #1443
Comments
Hey @elephaint , thanks for your request. This can certainly be a pain point for other libraries trying to adopt narwhals. I would say that the answer is it depends. We have a set of functionalities, namely If that's not enough, In plotly express, I had to do something similar, by adding a flag |
Thanks for the request! I think currently the two documented way would be:
I can see that it would be convenient to have something more ergonomic... 🤔 will think about this one. Thanks for having highlighted this
wait, this would be highly risky as it involves using private methods which may change at any time 😉 Better to stick with the public API, which we make some stability guarantees about |
Thanks for the discussion! I think for now I'll go with Just to be clear - this is really a 'nice to have' but by no means very important to me, so don't make something crazy complex over this 😛 |
Just out of curiosity for now, could you point to such example that requires branching a specific path for pandas? |
@FBruzzesi I haven't had to branch for They both relate to extracting scalar(s):
Although for these, I'd be happy with an API like |
thanks for sharing
sorry not sure i follow, could you clarify please? |
+1 on Marco question, I am not sure I get the point.
I share the pain of working with pyarrow objects. For your specific case here, I think we should implement import narwhals as nw
import pandas as pd
import polars as pl
import pyarrow as pa
def func(value, series):
return value in nw.from_native(series, series_only=True)
data = [1,2,3]
func(1, pd.Series(data)) # True
func(1, pl.Series(data)) # True
func(1, pa.chunked_array([data])) # False |
Sure @MarcoGorelli So for the # 1
nw.DataFrame.item(0, ...) -> PythonDataType
# 2
nw.Series.__iter__() -> Iterator[PythonDataType] But with
|
thanks! for #1 - yup, we could return Python scalars for all PyArrow aggregations (like Polars) does, it would just require a bit of a refactor for #2, I think you'd be better off using In [20]: s
Out[20]:
shape: (10_000_000,)
Series: '' [i64]
[
4
5
7
9
0
…
8
8
2
9
4
]
In [21]: %timeit _ = set(s)
327 ms ± 40.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [22]: %timeit _ = set(s.to_list())
164 ms ± 10.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) Though both solutions are eager so I'm not sure I understand "which is an eager version of what I'm after" |
actually, it's not as bad as I was expecting...gonna make a PR 😎 |
Thanks @MarcoGorelli, good call on the Seems I forgot about the performance impact 🤦♂️ - as I knew this back in vega/altair#3501 (comment) Always fun to learn a refactor can be simpler and more performant 😄 |
done, available in Narwhals 1.15.0 🚀 #1471 |
Currently I often have the following code:
What is difficult about this, is that I need to keep track of
is_pandas
variables throughout the code, send them in subfunctions, etc. If I have multiple DataFrames, I have multiple suchis_pandas
variables. Ideally, I'd be able to do something such as:i.e., having whether the underlying dataframe is pandas or not simply as a boolean attribute of the Narwhals DataFrame. That would allow me to use
df_nw
everywhere without requiring the auxiliary variables everywhere or first converting to native.Of course, I know I can also do this everywhere:
nw.dependencies.is_pandas_dataframe(df_nw.to_native()))
but that feels convoluted.What is the cleanest way to do this?
The text was updated successfully, but these errors were encountered: