Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in objective = "reg:pseudohubererror" and xgb.plot.tree() #10988

Open
DrJerryTAO opened this issue Nov 7, 2024 · 4 comments
Open

Bug in objective = "reg:pseudohubererror" and xgb.plot.tree() #10988

DrJerryTAO opened this issue Nov 7, 2024 · 4 comments

Comments

@DrJerryTAO
Copy link

DrJerryTAO commented Nov 7, 2024

Hi @mattn, I wanted to use XGBoost for quantile regression but found that the loss function of pseudo Huber error does no better than a null model. Currently, objective = 'reg:pseudohubererror' predicts every case as 0.5, with no information learnt at all no matter how other parameters are specified.

Also, xgb.plot.tree() shows nothing. The Viewer panel is blank.

Further, objective = "reg:quantileerror" results in error although the online documentation mentions it https://xgboost.readthedocs.io/en/latest/parameter.html. I am using the latest R version 1.7.8.1.

library(xgboost)
library(tidyverse)
data(mtcars)
Data <- mtcars %>%
  {xgb.DMatrix(
    data = (.) %>% select(-mpg) %>% as.matrix(), 
    label = (.) %>% pull(mpg))}
Model <- xgboost(
  data = Data, 
  objective = "reg:pseudohubererror", 
  max.depth = 3, eta = 1, nrounds = 100)
"As the log shows, each mean pseudo Hubber error is 18.618537, no changes 
over iteration"
Model <- xgboost(
  data = Data, 
  objective = "reg:pseudohubererror", eval_metric = "mae", 
  max.depth = 3, eta = 1, nrounds = 100)
"mae = 19.590625, no changes over 100 iteration"
mean(abs(mtcars$mpg - 0.5)) # 19.59062
"objective = 'reg:pseudohubererror' predicts every case as 0.5, 
no information learnt at all."
Model <- xgboost(
  data = Data, 
  objective = "reg:quantileerror", eval_metric = "mae", 
  max.depth = 3, eta = 1, nrounds = 100)
"Error in xgb.iter.update(bst$handle, dtrain, iteration - 1, obj) : 
  [02:01:16] src/objective/objective.cc:26: 
  Unknown objective function: `reg:quantileerror`
Objective candidate: survival:aft
Objective candidate: binary:hinge
Objective candidate: rank:pairwise
Objective candidate: rank:ndcg
Objective candidate: rank:map
Objective candidate: multi:softmax
Objective candidate: multi:softprob
Objective candidate: reg:squarederror
Objective candidate: reg:squaredlogerror
Objective candidate: reg:logistic
Objective candidate: binary:logistic
Objective candidate: binary:logitraw
Objective candidate: reg:linear
Objective candidate: reg:pseudohubererror
Objective candidate: count:poisson
Objective candidate: survival:cox
Objective candidate: reg:gamma
Objective candidate: reg:tweedie
Objective candidate: reg:absoluteerror"
Model <- xgboost(
  data = Data, 
  objective = "reg:tweedie", eval_metric = "mae", 
  max.depth = 3, eta = 1, nrounds = 4)
xgb.plot.tree(model = Model)
"The Viewer panel shows blank. This is not because my environment has errors."
xgb.plot.importance(importance_matrix = xgb.importance(model = Model))
"If I plot variable importance, I do see a plot in Plots."
@DrJerryTAO DrJerryTAO changed the title Bug in objective = "reg:pseudohubererror" or "reg:quantileerror" Bug in objective = "reg:pseudohubererror" and "reg:quantileerror" Nov 7, 2024
@hcho3
Copy link
Collaborator

hcho3 commented Nov 7, 2024

The "reg:quantileerror" objective was added in XGBoost 2.0, which isn't available on CRAN. You should install the R package from the source to use the feature.

@DrJerryTAO DrJerryTAO changed the title Bug in objective = "reg:pseudohubererror" and "reg:quantileerror" Bug in objective = "reg:pseudohubererror" and xgb.plot.tree() Nov 7, 2024
@DrJerryTAO
Copy link
Author

DrJerryTAO commented Nov 7, 2024

@hcho3 thanks. @kashif @darxriggs Do you know why objective = 'reg:pseudohubererror' does not update over iterations? Could you address xgb.plot.tree() and objective = 'reg:pseudohubererror' bugs? How do we install from the source? I did not see sample codes for R.

@DrJerryTAO
Copy link
Author

Hi all, I have found the key. It is about setting the base score. Switching initial prediction from 0.5 to a weakly informative mean() or median() would push the gradient search away from the starting point. It is against my intuition that a prediction far away from the observation should generate large gradients towards the direction that lowers the loss function. And the pseudo Huber loss function is not flat. Unlike in other models, the base_score appears to have a huge impact on the model even in small data sets.

Will the new support of "Intercept" https://xgboost.readthedocs.io/en/latest/tutorials/intercept.html in version 2.0 will solve this problem automatically? I think it is also important to document that the default base_score = 0.5 will work very poorly for objective = "reg:pseudohubererror".

See the impacts when base_score = median() is specified.

# Solution: set base_score
library(xgboost)
library(tidyverse)
data(mtcars)
Data <- mtcars %>%
  {xgb.DMatrix(
    data = (.) %>% select(-mpg) %>% as.matrix(), 
    label = (.) %>% pull(mpg))}
Model <- xgboost(
  data = Data, 
  objective = "reg:pseudohubererror", 
  base_score = median(mtcars$mpg), 
  max.depth = 3, eta = 1, nrounds = 100)
"[1]	train-mphe:2.019801 
[100]	train-mphe:0.000000 "
Model <- xgboost(
  data = Data, 
  objective = "reg:pseudohubererror", eval_metric = "mae", 
  base_score = median(mtcars$mpg), 
  max.depth = 3, eta = 1, nrounds = 100)
"[1]	train-mae:2.685595
[100]	train-mae:0.000482"

@trivialfis
Copy link
Member

Thank you for sharing, we will have to do some experiments once the R interface is ready. It's using median by default with the latest XGBoost, so I suspect it should work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants