如何在python中查找具有条件的值的连续出现

时间:2020-01-22 08:57:59

标签: python pandas

我在熊猫中有以下数据框

 code      tank     date         time       no_operation_flag
 123       1        01-01-2019   00:00:00   1
 123       1        01-01-2019   00:30:00   1
 123       1        01-01-2019   01:00:00   0
 123       1        01-01-2019   01:30:00   1
 123       1        01-01-2019   02:00:00   1
 123       1        01-01-2019   02:30:00   1
 123       1        01-01-2019   03:00:00   1
 123       1        01-01-2019   03:30:00   1
 123       1        01-01-2019   04:00:00   1
 123       1        01-01-2019   05:00:00   1                   
 123       1        01-01-2019   14:00:00   1                     
 123       1        01-01-2019   14:30:00   1                  
 123       1        01-01-2019   15:00:00   1                  
 123       1        01-01-2019   15:30:00   1                  
 123       1        01-01-2019   16:00:00   1                    
 123       1        01-01-2019   16:30:00   1                  
 123       2        02-01-2019   00:00:00   1
 123       2        02-01-2019   00:30:00   0
 123       2        02-01-2019   01:00:00   0
 123       2        02-01-2019   01:30:00   0
 123       2        02-01-2019   02:00:00   1
 123       2        02-01-2019   02:30:00   1
 123       2        02-01-2019   03:00:00   1
 123       2        03-01-2019   03:30:00   1
 123       2        03-01-2019   04:00:00   1
 123       1        03-01-2019   14:00:00   1
 123       2        03-01-2019   15:00:00   1
 123       2        03-01-2019   00:30:00   1
 123       2        04-01-2019   11:00:00   1
 123       2        04-01-2019   11:30:00   0
 123       2        04-01-2019   12:00:00   1
 123       2        04-01-2019   13:30:00   1
 123       2        05-01-2019   03:00:00   1
 123       2        05-01-2019   03:30:00   1
 123       2        05-01-2019   04:00:00   1

我想做的是在no_operation_flag中将连续的1标记为在储罐水平和日间水平上超过5次,但是时间应该是连续的(时间在半小时内)。数据框已按储罐,日期和时间级别分类。

我想要的数据框是

 code       tank      date          time        no_operation_flag   final_flag
 123       1        01-01-2019   00:00:00       1                   0                   
 123       1        01-01-2019   00:30:00       1                   0
 123       1        01-01-2019   01:00:00       0                   0  
 123       1        01-01-2019   01:30:00       1                   1
 123       1        01-01-2019   02:00:00       1                   1  
 123       1        01-01-2019   02:30:00       1                   1
 123       1        01-01-2019   03:00:00       1                   1
 123       1        01-01-2019   03:30:00       1                   1
 123       1        01-01-2019   04:00:00       1                   1
 123       1        01-01-2019   05:00:00       1                   0
 123       1        01-01-2019   14:00:00       1                   1  
 123       1        01-01-2019   14:30:00       1                   1
 123       1        01-01-2019   15:00:00       1                   1
 123       1        01-01-2019   15:30:00       1                   1
 123       1        01-01-2019   16:00:00       1                   1  
 123       1        01-01-2019   16:30:00       1                   1
 123       2        02-01-2019   00:00:00       1                   0
 123       2        02-01-2019   00:30:00       0                   0    
 123       2        02-01-2019   01:00:00       0                   0
 123       2        02-01-2019   01:30:00       0                   0
 123       2        02-01-2019   02:00:00       1                   0
 123       2        02-01-2019   02:30:00       1                   0
 123       2        02-01-2019   03:00:00       1                   0
 123       2        03-01-2019   03:30:00       1                   0
 123       2        03-01-2019   04:00:00       1                   0
 123       1        03-01-2019   14:00:00       1                   0
 123       2        03-01-2019   15:00:00       1                   0
 123       2        03-01-2019   00:30:00       1                   0
 123       2        04-01-2019   11:00:00       1                   0
 123       2        04-01-2019   11:30:00       0                   0 
 123       2        04-01-2019   12:00:00       1                   0
 123       2        04-01-2019   13:30:00       1                   0
 123       2        05-01-2019   03:00:00       1                   0
 123       2        05-01-2019   03:30:00       1                   0 
 123       2        05-01-2019   04:00:00       1                   0

如何在熊猫中做到这一点?

4 个答案:

答案 0 :(得分:2)

您可以使用this之类的解决方案,仅使用新助手jingle.wav过滤每个组的连续日期时间,并添加所有缺少的日期时间,最后DataFrame用于添加新列:

merge

df['datetimes'] = pd.to_datetime(df['date'].astype(str) + ' ' + df['time'].astype(str))
df1 = (df.set_index('datetimes')
          .groupby(['code','tank', 'date'])['no_operation_flag']
          .resample('30T')
          .first()
          .reset_index())

shifted1 = df1.groupby(['code','tank', 'date'])['no_operation_flag'].shift()
g1 = df1['no_operation_flag'].ne(shifted1).cumsum()
mask1 = g1.map(g1.value_counts()).gt(5) & df1['no_operation_flag'].eq(1)

df1['final_flag'] = mask1.astype(int)
#print (df1.head(40))

df = df.merge(df1[['code','tank','datetimes','final_flag']]).drop('datetimes', axis=1)

答案 1 :(得分:2)

使用:

df['final_flag'] = ( df.groupby([df['no_operation_flag'].ne(1).cumsum(),
                                 'tank',
                                 'date',
                                 pd.to_datetime(df['time'].astype(str))
                                   .diff()
                                   .ne(pd.Timedelta(minutes = 30))
                                   .cumsum(),
                                'no_operation_flag'])['no_operation_flag']
                    .transform('size')
                    .gt(5)
                    .view('uint8') )
print(df)

输出

    code  tank        date      time  no_operation_flag  final_flag
0    123     1  01-01-2019  00:00:00                  1           0
1    123     1  01-01-2019  00:30:00                  1           0
2    123     1  01-01-2019  01:00:00                  0           0
3    123     1  01-01-2019  01:30:00                  1           1
4    123     1  01-01-2019  02:00:00                  1           1
5    123     1  01-01-2019  02:30:00                  1           1
6    123     1  01-01-2019  03:00:00                  1           1
7    123     1  01-01-2019  03:30:00                  1           1
8    123     1  01-01-2019  04:00:00                  1           1
9    123     1  01-01-2019  05:00:00                  1           0
10   123     1  01-01-2019  14:00:00                  1           1
11   123     1  01-01-2019  14:30:00                  1           1
12   123     1  01-01-2019  15:00:00                  1           1
13   123     1  01-01-2019  15:30:00                  1           1
14   123     1  01-01-2019  16:00:00                  1           1
15   123     1  01-01-2019  16:30:00                  1           1
16   123     2  02-01-2019  00:00:00                  1           0
17   123     2  02-01-2019  00:30:00                  0           0
18   123     2  02-01-2019  01:00:00                  0           0
19   123     2  02-01-2019  01:30:00                  0           0
20   123     2  02-01-2019  02:00:00                  1           0
21   123     2  02-01-2019  02:30:00                  1           0
22   123     2  02-01-2019  03:00:00                  1           0
23   123     2  03-01-2019  03:30:00                  1           0
24   123     2  03-01-2019  04:00:00                  1           0
25   123     1  03-01-2019  14:00:00                  1           0
26   123     2  03-01-2019  15:00:00                  1           0
27   123     2  03-01-2019  00:30:00                  1           0
28   123     2  04-01-2019  11:00:00                  1           0
29   123     2  04-01-2019  11:30:00                  0           0
30   123     2  04-01-2019  12:00:00                  1           0
31   123     2  04-01-2019  13:30:00                  1           0
32   123     2  05-01-2019  03:00:00                  1           0
33   123     2  05-01-2019  03:30:00                  1           0

答案 2 :(得分:0)

也许可以一口气做,但是两步走的方法更简单, 首先,您一个一个地选择坦克,然后寻找五个1的序列。

https://developers.facebook.com/docs/marketing-api/offline-conversions#extern-id已经解决了在列中搜索模式的问题。

如果您想以其他方式查看This other question,则可以求和1或使用all values are True条件来查找{{1 }}元素。

您也可以rolling屏蔽一列,但这只会给您屏蔽中的值。这解决了另一个问题,“在给定的时间哪些坦克在哪些地方不起作用”。

答案 3 :(得分:0)

我认为这是一种非常过时且有点脏的方式,但易于理解。

  1. 对于循环行,检查4行之后的时间是2个小时。
  2. (如果1为True),检查df['no_operation_flag']的五个对应值全部为1。
  3. (如果2为True,请在df['final_flag']的对应五个值中放入1。

# make col with zero
df['final_flag'] = 0

for i in range(1, len(df)-4):
    j = i + 4
    dt1 = df['date'].iloc[i]+' '+df['time'].iloc[i]
    ts1 = pd.to_datetime(dt1)
    dt2 = df['date'].iloc[j]+' '+df['time'].iloc[j]
    ts2 = pd.to_datetime(dt2)

    # timedelta is 2 hours?
    if ts2 - ts1 == datetime.timedelta(hours=2, minutes=0):
        # all of no_operation_flag == 1?
        if (df['no_operation_flag'].iloc[i:j+1] == 1).all():
            df['final_flag'].iloc[i:j+1] = 1
相关问题