random_state和随机播放

时间:2018-11-11 14:16:19

标签: python scikit-learn shuffle

对于将random_stateshuffle一起使用,我有点困惑。我想拆分数据而不改组它。在我看来,当我将shuffle设置为False时,我为random_state选择的数字并不重要,我具有相同的输出(对于random_state 42或2、7、17等,拆分相同)。为什么?

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25,random_state=42,shuffle=False )

但是,如果shuffle为True,那么对于不同的random_states,我会有不同的输出(拆分)。

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25,random_state=42)

1 个答案:

答案 0 :(得分:1)

如果将shuffle设置为False,train_test_split只会按原始顺序读取数据。因此,参数random_state被完全忽略。

示例:

X = [k for k in range(0, 50)] # create array with numbers ranging from 0 to 49
y = X # just for testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42, shuffle=False)

print(X_train) // prints [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36]

shuffle设置为True时,random_state将用作随机数生成器的种子。结果,您的数据集被随机分为训练集和测试集。

random_state = 42的示例:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42, shuffle=True)

print(X_train) // prints [8, 3, 6, 41, 46, 47, 15, 9, 16, 24, 34, 31, 0, 44, 27, 33, 5, 29, 11, 36, 1, 21, 2, 43, 35, 23, 40, 10, 22, 18, 49, 20, 7, 42, 14, 28, 38]

random_state = 44的示例:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=44, shuffle=True)

print(X_train) // prints [13, 11, 2, 12, 34, 41, 30, 16, 39, 28, 24, 8, 18, 9, 4, 10, 0, 19, 21, 29, 14, 1, 48, 38, 7, 43, 25, 22, 23, 42, 46, 49, 32, 3, 45, 35, 20]