Velvet Star Monitor

Standout celebrity highlights with iconic style.

general

XGBRegressor: change random_state no effect

Writer Matthew Harrington

the xgboost.XGBRegressor seems to produce the same results despite the fact a new random seed is given.

According to the xgboost documentation xgboost.XGBRegressor:

seed : int Random number seed. (Deprecated, please use random_state)

random_state : int Random number seed. (replaces seed)

random_state is the one to be used, however, no matter what random_state or seed I use, the model produce the same results. A Bug?

from xgboost import XGBRegressor
from sklearn.datasets import load_boston
import numpy as np
from itertools import product
def xgb_train_predict(random_state=0, seed=None): X, y = load_boston(return_X_y=True) xgb = XGBRegressor(random_state=random_state, seed=seed) xgb.fit(X, y) y_ = xgb.predict(X) return y_
check = xgb_train_predict()
random_state = [1, 42, 58, 69, 72]
seed = [None, 2, 24, 85, 96]
for r, s in product(random_state, seed): y_ = xgb_train_predict(r, s) assert np.equal(y_, check).all() print('CHECK! \t random_state: {} \t seed: {}'.format(r, s))
[Out]: CHECK! random_state: 1 seed: None CHECK! random_state: 1 seed: 2 CHECK! random_state: 1 seed: 24 CHECK! random_state: 1 seed: 85 CHECK! random_state: 1 seed: 96 CHECK! random_state: 42 seed: None CHECK! random_state: 42 seed: 2 CHECK! random_state: 42 seed: 24 CHECK! random_state: 42 seed: 85 CHECK! random_state: 42 seed: 96 CHECK! random_state: 58 seed: None CHECK! random_state: 58 seed: 2 CHECK! random_state: 58 seed: 24 CHECK! random_state: 58 seed: 85 CHECK! random_state: 58 seed: 96 CHECK! random_state: 69 seed: None CHECK! random_state: 69 seed: 2 CHECK! random_state: 69 seed: 24 CHECK! random_state: 69 seed: 85 CHECK! random_state: 69 seed: 96 CHECK! random_state: 72 seed: None CHECK! random_state: 72 seed: 2 CHECK! random_state: 72 seed: 24 CHECK! random_state: 72 seed: 85 CHECK! random_state: 72 seed: 96

2 Answers

It seems (I didn't know it myself before starting to dig for an answer :) ), that xgboost uses random generator only for sub-sampling, see this Laurae's comment on a similar github issue. And otherwise behavior is deterministic.

If you would have used sampling, there is an issue in the seed/random_state handling by the current sklearn API in xgboost. seed is indeed claimed to be deprecated, but it seems that if one provides it, it will still be used over random_state, as can be seen here in the code. This comment is relevant only when you have seed not None

3

Tested with xgboost-1.6.1 Setting only random_state worked.

When reading X and y from the data source. Please make sure they are in the same order and with the same decimal values.

The following example is using pd.DataFrame

round_value = ..
my_seed = ...
X = round(X, round_value)
X = X[sorted(X)]
y = round(y, round_value)
y = y[sorted(y)]
X_train = X....
y_train = y....
X_test = X....
xgb_model = XGBRegressor(random_state=my_seed)
xgb_model.fit(X_train, y_train)
xgb_pred = xgb_model.predict(X_test)

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy