在这里，我正在过滤干扰

Question

我目前正在尝试使用熊猫分析网络数据。我读过其他文章，而最接近我的问题的是Pandas - Find and index rows that match row sequence pattern。

我的数据框如下所示：

我正在尝试检查某些包裹是否丢失，并计算丢失的包裹数量。因此，我想定义一个2x2的窗口或矩阵。然后定义一个模式，在这种情况下为。

现在，我要检查窗口是否完全是重复出现的窗口。如果可能的话，应该在额外的一栏中输入false或true（或nan）。我已经在以下代码示例中尝试过此操作。

在第一个示例中，我尝试对行进行迭代检查。我的第三个示例是我正在寻找的更多内容：使用滚动命令，我定义了一个窗口和一个模式，代码应检查行，但是由于模式是一个字符串，所以我得到了一个错误。这就是我想要的样子。

import pandas as pd

df = pd.read_csv（'hallo'）

在这里，我正在过滤干扰

   Protocol_filtered = df[df['Protocol']== 'ICMP']
   Protocol_filtered1 = Protocol_filtered[['Time','Source','Destination','Info']] 
   Protocol_filtered1 = Protocol_filtered1.reset_index(drop=True)

我开始检查丢失的包裹

    s0 = 0
    s1 = 1

   for row in Protocol_filtered1.iterrows():
  while s1 <= len (Protocol_filtered1):
    source = Protocol_filtered1.loc[s0,'Source']
    dest = Protocol_filtered1.loc[s1,'Destination']

    if source == dest:
        Protocol_filtered1['Check']= True
    else:
        Protocol_filtered1['Check']= False

    source1 = Protocol_filtered1.loc[s1,'Source']
    dest1 = Protocol_filtered1.loc[s0,'Destination']



    if source1 == dest1:
        Protocol_filtered1['Check1']= True
    else:
        Protocol_filtered1['Check1']= False

    s0 = s0 + 2
    s1 = s1 + 2

此代码的结果不是我想要的结果，因为它给了我例如在第2行应该为false的true。

以下代码的逻辑是正确的，但它会为每一行检查i，而应始终同时检查两个连续的行（0＆1，2＆3，4＆5 ...）：

pattern = ['192.168.20.35', '192.168.20.31']
i = (Protocol_filtered1['Source'] == '192.168.20.35') &         (Protocol_filtered1['Source'].shift(-1) == '192.168.20.31')
i &= (Protocol_filtered1['Destination'] == '192.168.20.31') & (Protocol_filtered1['Destination'].shift(-1)== '192.168.20.35')

Protocol_filtered1.index[i]

Protocol_filtered1 ['Check1'] = i

这里的结果是（应该是：检查：真，真，假，假，真，真）：

我在论坛中发现并尝试应用的一个非常优雅的解决方案是：

pattern = ['192.168.20.35', '192.168.20.31']
obs = len(pattern)
Protocol_filtered1['S1'] = (Protocol_filtered1['Source']
                        .rolling(window = obs, min_periods = )
                        .apply(lambda x: (x==pattern).all())
                        .astype(bool)
                        .shift(-1*(obs-1)))

但是我的代码似乎也有问题。我更喜欢最后一种解决方案，在该解决方案中，我可以定义特定的模式和窗口的大小，并让熊猫遍历所有数据框，然后再使用isull（）来计算丢失的包裹数量。

我将非常感谢您的帮助！非常感谢你！

熊猫-检查数据框中的重复模式

在这里，我正在过滤干扰

我开始检查丢失的包裹

0 个答案: