Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle cudf.pandas proxy objects properly #11014

Merged
merged 6 commits into from
Nov 25, 2024
Merged

Conversation

galipremsagar
Copy link
Contributor

@galipremsagar galipremsagar commented Nov 22, 2024

cudf.pandas proxy objects will not pass is_cudf.... checks and rather they tend to only pass is_pandas.... But the pandas code-flows have many to_numpy calls that are introducing slow-downs incase of cudf.pandas objects. This PR bypasses this problem by introducing a mechanism to extract proper cudf objects out of cudf.pandas objects. That way there is no slow-down.

@galipremsagar galipremsagar changed the title test pr cudf.pandas specific fixes Nov 22, 2024
@galipremsagar galipremsagar changed the title cudf.pandas specific fixes Handle cudf.pandas proxy objects properly Nov 22, 2024
@galipremsagar galipremsagar marked this pull request as ready for review November 22, 2024 18:58
@hcho3
Copy link
Collaborator

hcho3 commented Nov 22, 2024

@trivialfis Can you review this?

@@ -846,6 +846,22 @@ def _is_cudf_df(data: DataType) -> bool:
return lazy_isinstance(data, "cudf.core.dataframe", "DataFrame")


def _is_cudf_pandas_df(data: DataType) -> bool:
return (
str(type(data)) == "<class 'pandas.core.frame.DataFrame'>"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a lazy isinstance method in the compat module, can we use that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! We can, Verified in local testing this works.

@trivialfis trivialfis mentioned this pull request Nov 23, 2024
4 tasks
@trivialfis
Copy link
Member

Added this to the 2.1.3 release. Will push some changes at Monday for lazy_isinstance along with a test.

@trivialfis
Copy link
Member

Changes from my commit:

  • Added support for DMatrix.
  • Use lazy_isinstance.

@trivialfis
Copy link
Member

Please help test and review the change.

@galipremsagar
Copy link
Contributor Author

Please help test and review the change.

Can confirm this works locally.

@trivialfis trivialfis merged commit 0e48cdc into dmlc:master Nov 25, 2024
30 checks passed
trivialfis added a commit to trivialfis/xgboost that referenced this pull request Nov 25, 2024
---------

Co-authored-by: Hyunsu Cho <phcho@nvidia.com>
Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
trivialfis added a commit that referenced this pull request Nov 26, 2024
---------

Co-authored-by: GALI PREM SAGAR <sagarprem75@gmail.com>
Co-authored-by: Hyunsu Cho <phcho@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants