Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boolean variables not supported in combination with other categoricals #10

Open
Dvermetten opened this issue Oct 21, 2020 · 2 comments
Open

Comments

@Dvermetten
Copy link
Collaborator

Describe the bug
When the search space contains a boolean variable in addition to another categorical variable which is non-boolean, the search will fail.

To Reproduce
The following is a slight modification of an existing test to show the problem
`dim_r = 2 # dimension of the real values
def obj_fun(x):
x_r = np.array([x['continuous_%d'%i] for i in range(dim_r)])
x_i = x['ordinal']
x_d = x['nominal']
_ = 0 if x_d == 'OK' else 1
return np.sum(x_r ** 2) + abs(x_i - 10) / 123. + _ * 2

search_space = ContinuousSpace([-5, 5], var_name='continuous') * dim_r +
OrdinalSpace([5, 15], var_name='ordinal') +
NominalSpace(['OK', 'A', None], var_name='nominal') +
NominalSpace([True, False], var_name='boolvar')

model = RandomForest(levels=search_space.levels)

opt = ParallelBO(
search_space=search_space,
obj_fun=obj_fun,
model=model,
max_FEs=6,
DoE_size=3, # the initial DoE size
eval_type='dict',
acquisition_fun='MGFI',
acquisition_par={'t' : 2},
n_job=3, # number of processes
n_point=3, # number of the candidate solution proposed in each iteration
verbose=False # turn this off, if you prefer no output
)
xopt, fopt, stop_dict = opt.run()`

Expected behavior
This should perform exactly the same as the case without boolean variable

Additional context
It seems to be related to the checking of input in the random forest

@Dvermetten Dvermetten changed the title Boolean variables not directly supported Boolean variables not supported in combination with other categoricals Oct 21, 2020
@Dvermetten
Copy link
Collaborator Author

Somewhere in the random forest these values are converted to strings, which is leading to this issue.
I haven't yet found exactly where this issue occurs, but it is not just limited to boolean variable, but any nominal space where the options are not strings originally share this problem.

@MariosKef
Copy link

MariosKef commented Jun 21, 2021

@Dvermetten I actually have a similar issue (maybe the same). I used pip to install the latest version and I am getting errors of the form:
ValueError: Found unknown categories ['22', '18', '26', '96'] in column 0 during transform. I am guessing it's what you said above because I have a specific nominal range the elements of which are not strings originally: max_depth = NominalSpace([None] + np.arange(2,102,2).tolist())

Is there an update/fix on this? I also think the pip version is an older version than what is now on master...

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants