less memory usage when writing a dataframe #45

slemouzy · 2022-12-02T21:19:31Z

Instead of converting the whole DataFrame into a list of dicts (which can double the memory usage), uses a Generator function to iterate on the rows of the DataFrame converting each row one after the other.

marctorsoc · 2024-03-07T10:30:35Z

@slemouzy this seems like a nice addition. Could you fix the conflicts? @ynqa would you agree to add this?

ynqa · 2024-03-07T10:37:40Z

would you agree to add this?

@marctorsoc Basically, yes. How do you think about leaving the existing implementation as is? (The idea is to branch the process based on some parameter)

marctorsoc · 2024-03-07T10:46:06Z

I don't quite understand your comment. The current implementation does

    records = __preprocess_dicts(df.to_dict('records'))

here, which means doubling the memory as the OP is trying to fix

I did not review the specific implementation in the PR, but I'm sure if he knows you'll approve, he'll be more motivated to fix the conflicts 😃 . Worst case scenario, I'm happy to fix it

ynqa · 2024-03-09T03:09:11Z

@marctorsoc Thank you, I misunderstood. Let's incorporate this change.

slemouzy · 2024-03-09T07:29:28Z

@slemouzy this seems like a nice addition. Could you fix the conflicts? @ynqa would you agree to add this?

Let see what I can do, I'll maybe be able to have some time on it during my weekend. Glad you've been able to look at my proposal. 🙂

Instead of converting the whole DataFrame into a list of dicts (which can double the memory usage), uses a generator to iterate on the rows of the DataFrame converting each row one after the other.

ynqa

LGTM!

slemouzy force-pushed the reduce-write-memory-usage branch from bd68b72 to 13ab063 Compare December 2, 2022 21:49

less memory usage when writing a dataframe.

2a1a4e8

Instead of converting the whole DataFrame into a list of dicts (which can double the memory usage), uses a generator to iterate on the rows of the DataFrame converting each row one after the other.

slemouzy force-pushed the reduce-write-memory-usage branch from 13ab063 to 2a1a4e8 Compare March 11, 2024 12:00

ynqa approved these changes Mar 16, 2024

View reviewed changes

ynqa merged commit 996a159 into ynqa:master Mar 16, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

less memory usage when writing a dataframe #45

less memory usage when writing a dataframe #45

slemouzy commented Dec 2, 2022

marctorsoc commented Mar 7, 2024

ynqa commented Mar 7, 2024 •

edited

Loading

marctorsoc commented Mar 7, 2024

ynqa commented Mar 9, 2024

slemouzy commented Mar 9, 2024

ynqa left a comment

less memory usage when writing a dataframe #45

less memory usage when writing a dataframe #45

Conversation

slemouzy commented Dec 2, 2022

marctorsoc commented Mar 7, 2024

ynqa commented Mar 7, 2024 • edited Loading

marctorsoc commented Mar 7, 2024

ynqa commented Mar 9, 2024

slemouzy commented Mar 9, 2024

ynqa left a comment

Choose a reason for hiding this comment

ynqa commented Mar 7, 2024 •

edited

Loading