Question

我是python的初学者。我写下面的函数来分区从csv文件读取的数据。索引生成没有错误，但是当我通过此索引拆分df时结果不正确。我的代码有什么问题？

def partition(k, number_of_fold):
    names = ['Mcg', 'Gvh', 'Alm', 'Mit', 'Erl', 'Pox', 'Vac', 'Nuc', 'class']
    file = 'yeast3.dat'

    df = pd.read_csv(file, header=None, names=names)
    print(df.ix[1:2])
    print(df.ix[1:2, 3:4])
    print('size: ' + str(df.size))
    fold_zize = df.size / k
    for i in range(k):
        start_test = i * fold_zize
        x_test = np.array(df.ix[start_test: (start_test + fold_zize), 0:8])
        y_test = np.array(df.ix[start_test: (start_test + fold_zize), 8:9])
        print("test = " + str(start_test) + " : " + str(start_test + fold_zize))

        x_train = np.concatenate \
            ((np.array(df.ix[: start_test, 0:8]), np.array(df.ix[start_test + fold_zize:, 0:8])))
        y_train = np.concatenate \
            ((np.array(df.ix[: start_test, 8:9]), np.array(df.ix[start_test + fold_zize:, 8:9])))
        print("train1 = 0 : " + str(start_test))
        print("train2 = " + str((start_test + fold_zize)) + " : " + str(df.size))
        if(x_train.size + x_test.size != df.size):
            print('EROOOOOOOOOOOOOOOOOOOOOOOR: ' + str(x_train.size + x_test.size) + ' ' + str(df.size))

在列表和测试数组的print语句范围内是正确的，但在if语句中，列车和测试的总和不等于主df大小。

k折交叉验证 - python

0 个答案: