如何将变量值分组到箱中

时间:2019-03-05 02:12:01

标签: python pandas

当前,我面临一个有关如何根据变量中存在的值将dataframe分组到不同bin中的问题。

以下是我的数据

df[['col','val']]

Out[490]: 
    col  val
0    65    0
1     6    0
2    23    0
3     6    0
4    19    0
5    10    0
6    30    0
7    64    0
8     4    0
9     3    0
10    6    0
11    5    0
12    9    0
13   10    0
14   11    0
15    1    0
16    0    0
17    0    1
18    4    0
19    2    0

我使用cut的输出给出了这些输出

df['bins'] = pd.cut(df['col'], binsize)

                bins  val
0  (-0.065, 13.0]    1
1    (13.0, 26.0]    0
2    (26.0, 39.0]    0
4    (52.0, 65.0]    0

我希望得到的是这些输出

col Value
(0, 2]  1
(3, 5]  0
(6, 9]  0
(10, 19]    0
(23, 65]    0

2 个答案:

答案 0 :(得分:0)

一种解决方案是将指定的bin作为IntervalIndex传递给pd.cut

# default is closed='right', but this would miss the first row 
# of your expected output of (0, 2]  1
bins = pd.IntervalIndex.from_tuples([(0, 2), 
                                     (3, 5), 
                                     (6, 9), 
                                     (10, 19), 
                                     (23, 65)], 
                                    closed='left')

df['bins'] = pd.cut(df['col'], bins=bins)
df
    col  val          bins
0    65    0           NaN
1     6    0    [6.0, 9.0)
2    23    0  [23.0, 65.0)
3     6    0    [6.0, 9.0)
4    19    0           NaN
5    10    0  [10.0, 19.0)
6    30    0  [23.0, 65.0)
7    64    0  [23.0, 65.0)
8     4    0    [3.0, 5.0)
9     3    0    [3.0, 5.0)
10    6    0    [6.0, 9.0)
11    5    0           NaN
12    9    0           NaN
13   10    0  [10.0, 19.0)
14   11    0  [10.0, 19.0)
15    1    0    [0.0, 2.0)
16    0    0    [0.0, 2.0)
17    0    1    [0.0, 2.0)
18    4    0    [3.0, 5.0)
19    2    0           NaN

# Get something close to expected output: for each
# unique bin, take the maximum value

(df[['bins', 'val']].dropna()
                    .groupby('bins')
                    .max()
                    .reset_index())
       bins  val
0    [0, 2)    1
1    [3, 5)    0
2    [6, 9)    0
3  [10, 19)    0
4  [23, 65)    0

答案 1 :(得分:0)

当前,我正在使用以下SAS代码对其进行装箱,但希望将其转换为python

&allweights = count of of rows in dataset
weight = 1;
binsize = 5;
data temp;
        set temp nobs=numobs;
        by dataset;
        retain group nn;
        nn = sum(nn,weight);
        if first.&x then do;
            group = floor(nn*binsize/(&allweights+1));
        end;
    run;