计算Dataframe中连续零的数量

时间:2017-07-24 13:12:03

标签: python python-2.7 pandas numpy

我想计算下面显示的Dataframe中连续零的数量,请帮助

  DEC  JAN  FEB  MARCH  APRIL  MAY        consecutive zeros
0    X    X    X      1      0    1              0
1    X    X    X      1      0    1              0
2    0    0    1      0      0    1              2
3    1    0    0      0      1    1              3
4    0    0    0      0      0    1              5
5    X    1    1      0      0    0              3
6    1    0    0      1      0    0              2
7    0    0    0      0      1    0              4

3 个答案:

答案 0 :(得分:1)

这是我的两分钱......

将所有其他非零元素视为1,那么您将拥有二进制代码。您现在需要做的就是找到“最大间隔”,其中0没有位翻转。

我们可以编写一个函数并使用lambda

'应用'
def len_consec_zeros(a):
    a = np.array(list(a))    # convert elements to `str`
    rr = np.argwhere(a == '0').ravel()  # find out positions of `0`
    if not rr.size:  # if there are no zeros, return 0
        return 0

    full = np.arange(rr[0], rr[-1]+1)  # get the range of spread of 0s

    # get the indices where `0` was flipped to something else
    diff = np.setdiff1d(full, rr)
    if not diff.size:     # if there are no bit flips, return the 
        return len(full)  # size of the full range

    # break the array into pieces wherever there's a bit flip
    # and the result is the size of the largest chunk
    pos, difs = full[0], []
    for el in diff:
        difs.append(el - pos)
        pos = el + 1

    difs.append(full[-1]+1 - pos)

    # return size of the largest chunk
    res = max(difs) if max(difs) != 1 else 0

    return res

现在您已拥有此功能,请在每一行上调用它......

# join all columns to get a string column

# assuming you have your data in `df`
df['concated'] = df.astype(str).apply(lambda x: ''.join(x), axis=1)
df['consecutive_zeros'] = df.concated.apply(lambda x: len_consec_zeros(x))

答案 1 :(得分:0)

这是一种方法 -

# Inspired by https://stackoverflow.com/a/44385183/
def pos_neg_counts(mask):
    idx = np.flatnonzero(mask[1:] != mask[:-1])
    if len(idx)==0: # To handle all 0s or all 1s cases
        if mask[0]:
            return np.array([mask.size]), np.array([0])
        else:
            return np.array([0]), np.array([mask.size])
    else:
        count = np.r_[ [idx[0]+1], idx[1:] - idx[:-1], [mask.size-1-idx[-1]] ]
        if mask[0]:
            return count[::2], count[1::2] # True, False counts
        else:
            return count[1::2], count[::2] # True, False counts

def get_consecutive_zeros(df):
    arr = df.values
    mask = (arr==0) | (arr=='0')
    zero_count = np.array([pos_neg_counts(i)[0].max() for i in mask])
    zero_count[zero_count<2] = 0
    return zero_count

示例运行 -

In [272]: df
Out[272]: 
  DEC JAN FEB  MARCH  APRIL  MAY
0   X   X   X      1      0    1
1   X   X   X      1      0    1
2   0   0   1      0      0    1
3   1   0   0      0      1    1
4   0   0   0      0      0    1
5   X   1   1      0      0    0
6   1   0   0      1      0    0
7   0   0   0      0      1    0

In [273]: df['consecutive_zeros'] = get_consecutive_zeros(df)

In [274]: df
Out[274]: 
  DEC JAN FEB  MARCH  APRIL  MAY  consecutive_zeros
0   X   X   X      1      0    1                  0
1   X   X   X      1      0    1                  0
2   0   0   1      0      0    1                  2
3   1   0   0      0      1    1                  3
4   0   0   0      0      0    1                  5
5   X   1   1      0      0    0                  3
6   1   0   0      1      0    0                  2
7   0   0   0      0      1    0                  4

答案 2 :(得分:-1)

对于每一行,您希望cumsum(1-row)row == 1时的每个点重置ts = pd.Series([0,0,0,0,1,1,0,0,1,1,1,0]) ts2 = 1-ts tsgroup = ts.cumsum() consec_0 = ts2.groupby(tsgroup).transform(pd.Series.cumsum) consec_0.max() 。然后你取最大行

例如

{{1}}

会根据需要给你4个。

将其写入函数并应用于您的数据框