Lightweight save/load #673

EdoAlvarezR · 2024-11-06T23:53:38Z

EdoAlvarezR
Nov 6, 2024

Hi! I'm trying to save (pickle) a set of about 10,000 small KRG surrogate models (~100 training points each), however, the size on disk is huge (>10 GB).

Is there a way of lightweighting a surrogate model object (e.g., get rid of the training data) to reduce the size on disk?

Training this batch of models takes about 5 hours, so the surrogate is not very useful unless there is a way of saving/loading the models in a way that is shareable.

- Ed Alvarez

relf · 2024-11-07T10:17:04Z

relf
Nov 7, 2024
Maintainer

Hi! Indeed training data are pickled but not used in prediction, so you can do something like:

sm = KRG()
sm.set_training_values(X_train, y_train)
sm.train()
sm.training_points = {}  # hack: remove training data not used in prediction
with open("krg.pickle", "wb") as handle:
    pickle.dump(sm, handle)

Let me know if it works for you and how much it decreases the size on disk

1 reply

EdoAlvarezR Nov 7, 2024
Author

That indeed got rid of the training points, but it reduced the size on disk by only less than 1% 😕.

Is there any other data in the KRG object not used for prediction that I could get rid of?

relf · 2024-11-08T07:38:21Z

relf
Nov 8, 2024
Maintainer

Actually I should have done the maths, kriging memory cost is in N^2 where N is the number of training points. In your case it explains the 1% decrease as you have just removed ~N (=100).
100 * 100 * 8 bytes = 80kb per surrogate, so something like 800Mo (for the correlation matrix only, you found ~10x more) for the overall set.
So not sure there is a solution here. If we imagine using float32 (provided it still works, not sure at all) you'll still have at least a size > 200Mo.

1 reply

EdoAlvarezR Nov 8, 2024
Author

Ah, that makes sense. I found a way of reducing the number of surrogates that I needed to go from 10,000 down to ~50, so that brought the size of the pickle files down to something more manageable.

Thanks for your help!

relf · 2024-11-13T08:48:20Z

relf
Nov 13, 2024
Maintainer

You can also try GPX which is an equivalent of KRG but it should show better performances.
Behind the scene GPX is a wrapper for a Rust implementation of KRG, Gpx from the egobox library.
GPX cannot be pickled but has native save/load methods.

Once you have installed egobox with pip install egobox, you can run something like:

from smt.surrogate_models import GPX 

sm = GPX()
sm.set_training_values(xt, yt)
sm.train()
sm.save("sm.bin")  

sm2 = GPX.load("sm.bin")
ynew = sm2.predict_values(xnew)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lightweight save/load #673

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Lightweight save/load #673

EdoAlvarezR Nov 6, 2024

Replies: 3 comments · 2 replies

relf Nov 7, 2024 Maintainer

EdoAlvarezR Nov 7, 2024 Author

relf Nov 8, 2024 Maintainer

EdoAlvarezR Nov 8, 2024 Author

relf Nov 13, 2024 Maintainer

EdoAlvarezR
Nov 6, 2024

Replies: 3 comments 2 replies

relf
Nov 7, 2024
Maintainer

EdoAlvarezR Nov 7, 2024
Author

relf
Nov 8, 2024
Maintainer

EdoAlvarezR Nov 8, 2024
Author

relf
Nov 13, 2024
Maintainer