Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

less memory usage when writing a dataframe #45

Merged
merged 1 commit into from
Mar 16, 2024

Conversation

slemouzy
Copy link
Contributor

@slemouzy slemouzy commented Dec 2, 2022

Instead of converting the whole DataFrame into a list of dicts (which can double the memory usage), uses a Generator function to iterate on the rows of the DataFrame converting each row one after the other.

@marctorsoc
Copy link
Contributor

@slemouzy this seems like a nice addition. Could you fix the conflicts? @ynqa would you agree to add this?

@ynqa
Copy link
Owner

ynqa commented Mar 7, 2024

would you agree to add this?

@marctorsoc Basically, yes. How do you think about leaving the existing implementation as is? (The idea is to branch the process based on some parameter)

@marctorsoc
Copy link
Contributor

I don't quite understand your comment. The current implementation does

    records = __preprocess_dicts(df.to_dict('records'))

here, which means doubling the memory as the OP is trying to fix

I did not review the specific implementation in the PR, but I'm sure if he knows you'll approve, he'll be more motivated to fix the conflicts 😃 . Worst case scenario, I'm happy to fix it

@ynqa
Copy link
Owner

ynqa commented Mar 9, 2024

@marctorsoc Thank you, I misunderstood. Let's incorporate this change.

@slemouzy
Copy link
Contributor Author

slemouzy commented Mar 9, 2024

@slemouzy this seems like a nice addition. Could you fix the conflicts? @ynqa would you agree to add this?

Let see what I can do, I'll maybe be able to have some time on it during my weekend. Glad you've been able to look at my proposal. 🙂

Instead of converting the whole DataFrame into a list of dicts
(which can double the memory usage), uses a generator to iterate
on the rows of the DataFrame converting each row one after the other.
Copy link
Owner

@ynqa ynqa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ynqa ynqa merged commit 996a159 into ynqa:master Mar 16, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants