Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] On-demand serialization + standardization of attributes #9924

Merged
merged 53 commits into from
Jan 10, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
09694ac
on-demand serialization, refactor of attributes
david-cortes Dec 24, 2023
ad6490b
solve merge conflicts
david-cortes Dec 26, 2023
27bbdbc
export function for getting booster rounds
david-cortes Dec 26, 2023
88dd947
linter
david-cortes Dec 26, 2023
4a3b5e2
fix incorrect qualifiers
david-cortes Dec 26, 2023
e2331c3
Merge branch 'master' into altrep
david-cortes Dec 26, 2023
147e1cd
remove all references to caret package
david-cortes Dec 26, 2023
e012cce
fix example
david-cortes Dec 26, 2023
f444812
misc fixes
david-cortes Dec 26, 2023
2f30031
allow unsetting booster info
david-cortes Dec 26, 2023
6d4ad8b
remove unused argument
david-cortes Dec 26, 2023
b0054be
more fixes
david-cortes Dec 26, 2023
4050b6f
missing import
david-cortes Dec 26, 2023
2e16f73
swap 'static' with 'namespace'
david-cortes Dec 27, 2023
70affd5
improve wording on compatibility note
david-cortes Dec 27, 2023
af6cdbf
fix non-executed tests and potentially incorrect 'niter_init'
david-cortes Dec 28, 2023
22d4dd7
linter
david-cortes Dec 29, 2023
0465f57
solve merge conflicts
david-cortes Dec 30, 2023
74d5d55
more doc specificity about nrounds reset
david-cortes Dec 30, 2023
1bb74d8
correct function name
david-cortes Dec 30, 2023
c5d711f
solve merge conflicts
david-cortes Dec 31, 2023
b5ec14e
corrections after merge conflicts
david-cortes Dec 31, 2023
ae0de6d
more corrections after merge conflict
david-cortes Dec 31, 2023
1a3d9f7
solve merge conflicts
david-cortes Jan 3, 2024
041dd2f
updates for new default serialization format
david-cortes Jan 8, 2024
7f39bb0
update name for nrounds getter
david-cortes Jan 8, 2024
02c312c
remove in-place training continuation
david-cortes Jan 8, 2024
b4d59f7
change unserialize -> load.raw
david-cortes Jan 8, 2024
24e256a
use R lists instead of JSON text for xgb.config
david-cortes Jan 8, 2024
c97dc1a
remove internal function for nrounds getter
david-cortes Jan 8, 2024
5f8dea5
use _R suffix for all C functions specific to R
david-cortes Jan 8, 2024
8e29769
add test for C and R attributes with saveRDS
david-cortes Jan 8, 2024
9f81e20
add variable.names method for booster
david-cortes Jan 8, 2024
65197e1
add comment about supressed warning
david-cortes Jan 8, 2024
1b58e1b
solve merge conflicts
david-cortes Jan 8, 2024
7dc9b96
update comment
david-cortes Jan 8, 2024
11b213e
update serializers in vignette
david-cortes Jan 8, 2024
c72e663
update vignettes
david-cortes Jan 8, 2024
df88ad9
remove xgb.serialize and xgb.unserialize
david-cortes Jan 9, 2024
74f7f0c
Update R-package/R/xgb.save.R
david-cortes Jan 9, 2024
1ede32e
update docs
david-cortes Jan 9, 2024
37d6b1e
remove 'keep_extra_attributes'
david-cortes Jan 9, 2024
43d938b
remove .Rnw file
david-cortes Jan 9, 2024
692e5a5
add note about booster's R parameters
david-cortes Jan 10, 2024
c161999
user SerializeToBuffer for internal serialization
david-cortes Jan 10, 2024
feedce5
add test for serialization of config
david-cortes Jan 10, 2024
a02abfc
check more attributes
david-cortes Jan 10, 2024
3285ed6
rewrite compatibility note for serialization
david-cortes Jan 10, 2024
8082256
improve wording
david-cortes Jan 10, 2024
6fa7937
update note about attributes in xgb.save
david-cortes Jan 10, 2024
e02ed8f
Update R-package/R/utils.R
david-cortes Jan 10, 2024
ff70221
Update R-package/R/utils.R
david-cortes Jan 10, 2024
d133258
rebuild docs
david-cortes Jan 10, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 35 additions & 10 deletions R-package/R/utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -349,21 +349,41 @@ xgb.createFolds <- function(y, k) {
#' @name xgboost-deprecated
NULL

#' Do not use \code{\link[base]{saveRDS}} or \code{\link[base]{save}} for long-term archival of
#' models. Instead, use \code{\link{xgb.save}} or \code{\link{xgb.save.raw}}.
#' @title Model Serialization and Compatibility
#' @description
#'
#' It is a common practice to use the built-in \code{\link[base]{saveRDS}} function (or
#' \code{\link[base]{save}}) to persist R objects to the disk. While it is possible to persist
#' \code{xgb.Booster} objects using \code{\link[base]{saveRDS}}, it is not advisable to do so if
#' the model is to be accessed in the future. If you train a model with the current version of
#' XGBoost and persist it with \code{\link[base]{saveRDS}}, the model is not guaranteed to be
#' accessible in later releases of XGBoost. To ensure that your model can be accessed in future
#' releases of XGBoost, use \code{\link{xgb.save}} or \code{\link{xgb.save.raw}} instead.
#' When it comes to serializing XGBoost models, it's possible to use R serializers such as
#' \link{save} or \link{saveRDS} to serialize an XGBoost R model, but XGBoost also provides
#' its own serializers with perhaps better compability guarantees and which allow loading
#' said models in other language bindings of XGBoost.
#'
#' Note however that an `xgb.Booster` object might also keep:\itemize{
#' \item Additional model configuration attributes (accessible through \link{xgb.config}),
#' which might be used during model fitting but are not used in e.g. `predict`, feature importance,
#' or plotting methods.
#' \item Additional R-specific attributes - e.g. results of callbacks, such as evaluation logs,
#' which are kept as a `data.table` object, accessible through `attributes(model)$evaluation_log`
#' if present.
#' }
#'
#' The first ones (configuration attributes) do not have the same compatibility guarantees as
#' attributes that are set and accessed through \link{xgb.attributes} - that is, such attributes
#' might be lost after loading the booster in a different XGBoost version, regardless of the
#' serializer that was used. Note that these are saved when using \link{xgb.save}, but not when
#' using \link{xgb.save.raw}.
trivialfis marked this conversation as resolved.
Show resolved Hide resolved
#'
#' The second ones (R attributes) are not part of standard XGBoost model structure, and thus are
#' not saved when using XGBoost's own serializers. These attributes are only used for informational
#' purposes, such as keeping track of evaluation metrics as the model was fit, or saving the R
#' call that produced the model, but are otherwise not used for prediction / importance / plotting / etc.
#' These R attributes are only preserved when using R's own serializers.
#'
#' Note that XGBoost models in R starting from version `2.1.0` and onwards, and XGBoost models
#' before version `2.1.0`; have a very different R object structure and are incompatible with
#' each other. Hence, models that were saved with R serializers live `saveRDS` or `save` before
#' version `2.1.0` will not work with latter `xgboost` versions and vice versa.
#' version `2.1.0` will not work with latter `xgboost` versions and vice versa. Be aware that
#' the structure of R model objects could in theory again in the future, so XGBoost's serializers
#' should be preferred for very long-term storage.
#'
#' Furthermore, note that using the package `qs` for serialization will require version 0.26 or
#' higher of said package, and will have the same compatibility restrictions as R serializers.
Expand All @@ -379,6 +399,11 @@ NULL
#' The \code{\link{xgb.save.raw}} function is useful if you'd like to persist the XGBoost model
#' as part of another R object.
#'
#' Use \link{saveRDS} if you require the R-specific attributes that a booster might have, such
#' as evaluation logs, but note that future compatibility of such objects is outside XGBoost's
#' control as it relies on R's serialization format (see e.g. the details section in
#' \link{serialize} and \link{save} from base R).
#'
#' For more details and explanation about model persistence and archival, consult the page
#' \url{https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html}.
#'
Expand Down
48 changes: 36 additions & 12 deletions R-package/man/a-compatibility-note-for-saveRDS-save.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading