根据值分割熊猫数据框

时间:2018-07-09 07:57:34

标签: python pandas

我想将pandas数据框拆分为多个组,以便分别处理每个组。我的“ value.csv”文件包含以下数字

num tID y x height width
2   0   0   0   1   16
2   1   1   0   1   16 
5   0   1   0   1   16 
5   1   0   0   1   8 
5   2   0   8   1   8 
6   0   0   0   1   16 
6   1   1   0   1   8 
6   2   1   8   1   8
2   0   0   0   1   16
2   1   1   0   1   16 
5   0   1   0   1   16 
5   1   0   0   1   8 
5   2   0   8   1   8 
6   0   0   0   1   16 
6   1   1   0   1   8 
6   2   1   8   1   8

我想根据0列中tID的起始值拆分数据,就像前4个分隔一样。

第一:

2   0   0   0   1   16
2   1   1   0   1   16 

第二:

5   0   1   0   1   16 
5   1   0   0   1   8 
5   2   0   8   1   8 

第三:

6   0   0   0   1   16 
6   1   1   0   1   8 
6   2   1   8   1   8

第四:

2   0   0   0   1   16
2   1   1   0   1   16 

为此,我尝试使用是否有效的想法(如果没有成功的话)将其拆分?

    import pandas as pd
    statQuality = 'value.csv'
    df = pd.read_csv(statQuality, names=['num','tID','y','x','height','width'])


    df2 = df.copy()
    df2.drop(['num'], axis=1, inplace=True)

    x = []

    for index, row in df2.iterrows():
        if row['tID'] == 0:
            x = []
            x.append(row)
            print(x)
        else:
            x.append(row)

1 个答案:

答案 0 :(得分:1)

使用:

#create groups by consecutive values
s = df['num'].ne(df['num'].shift()).cumsum()
#create helper count Series for duplicated groups like `2_0`, `2_1`...
g = s.groupby(df['num']).transform(lambda x: x.factorize()[0])
#dictionary of DataFrames
d = {'{}_{}'.format(i,j): v.drop('num', axis=1) for (i, j), v in df.groupby(['num', g])}
print (d)
{'2_0':    tID  y  x  height  width
0    0  0  0       1     16
1    1  1  0       1     16, '2_1':    tID  y  x  height  width
8    0  0  0       1     16
9    1  1  0       1     16, '5_0':    tID  y  x  height  width
2    0  1  0       1     16
3    1  0  0       1      8
4    2  0  8       1      8, '5_1':     tID  y  x  height  width
10    0  1  0       1     16
11    1  0  0       1      8
12    2  0  8       1      8, '6_0':    tID  y  x  height  width
5    0  0  0       1     16
6    1  1  0       1      8
7    2  1  8       1      8, '6_1':     tID  y  x  height  width
13    0  0  0       1     16
14    1  1  0       1      8
15    2  1  8       1      8}