如果字符串包含单列实例

时间:2015-12-04 01:24:33

标签: python regex string pandas dataframe

我在消息A和B的pandas DF中有以下内容:

Message_A
"(Live Storage: 20.00   included in Plan for $15.00 - Exceess of 10.0   @ $6.0)" 
"(Live Storage: 5.00   included in Plan for $5.00 - Exceess of 11.0   @ $40.0)" 
"(Live Storage: 10.0   out of 150.00   included in Plan for $10.00)" 
"(Live Storage: 146.0   out of 200.00   included in Plan for $150.00)" 
"(Live Storage: 150.0   - Tier 1501 to 2000   @ $350)" 
"(PY Solution -Flat Fee- of $30.00 applied)" 
"(Live Storage: 17.0   out of 40.00   included in Plan for $20.00)" 
"(Live Storage: 67.0   @ $5.00)" 
"(Live Storage: 5.00   included in Plan for $55.00 - Exceess of 13.0   @ $6.0)" 
"(Live Storage: 741.0   @ $3.00)" 
"(Live Storage: 30.00   included in Plan for $150.00 - Exceess of 39.0   @ $6.0)" 
"(Live Storage: 65.0   - Tier 51 to 75   @ $250)" 
"(Live Storage: 567.0   - Tier 501 to 750   @ $1750)" 

Message_B
"(! Price for Live Storage not found in Pricing Plan !)" 
"(! Price for Live Storage not found in Pricing Plan !) ( ABC Storage: 141.0   @ $2.00) (Discount of 10.0% applied to storage amount)" 
"(! Price for Live Storage not found in Pricing Plan !)" 
"(! Price for Live Storage not found in Pricing Plan !) ( ABC Storage: 1.0   @ $3.00)" 
"( ABC Storage: 137.0   - Tier 1251 to 150   @ $100) (!  ABC Storage Limit of 00   Exceeded !) (Local Allocated Storage: 20.00   @ $0.40) (Live Storage: 16.0   @ $??)" 
"(Discount of 10.0% applied to storage amount) (! Price for Live Storage not found in Pricing Plan !)"
"(! Live Storage not found in Pricing Plan !) (Discount of 10.0% applied to storage amount)" 
"(! Price for Live Storage not found in Pricing Plan !) (Local Allocated Storage: 100.00   @ $0.50)" 
"(! Price for Storage not found in Pricing Plan !) (Live Storage: 18.0   @ $??)" 
"(! Price for Storage not found in Pricing Plan !)(Live Storage: 69.0   @ $??)  ( ABC Storage: 401.0   @ $1.50)" 
"(Live Storage: 6.0   @ $??) (! Price for Storage not found in Pricing Plan !)" 
"(! Price for Live Storage not found in Pricing Plan !) (Discount of 10.0% applied to storage amount)" 
"(! Price for Live Storage not found in Pricing Plan !) ( ABC Storage: 270.0   - Tier 201 to 300   @ $400)" 

我希望从message_B中删除错误消息。这些是一些文本发生更改的消息,但所有错误消息都包含“'!'或者'?$$'在他们中。然后,我想加入message_A获取单列消息。 为清楚起见,中间步骤如下:

Message_B
Nan
"( ABC Storage: 141.0   @ $2.00) (Discount of 10.0% applied to storage amount)" 
Nan
"( ABC Storage: 1.0   @ $3.00)" 
"( ABC Storage: 137.0   - Tier 1251 to 150   @ $100)(Local Allocated Storage: 20.00   @ $0.40)" 
"(Discount of 10.0% applied to storage amount)" 
"(Discount of 10.0% applied to storage amount)" 
 "(Local Allocated Storage: 100.00   @ $0.50)" 
Nan
"( ABC Storage: 401.0   @ $1.50)" 
Nan
"(Discount of 10.0% applied to storage amount)" 
"( ABC Storage: 270.0   - Tier 201 to 300   @ $400)" 

最终结果只是一个单列字符串(drop Nan)。 我已经能够通过删除'('和.replace')'来分割message_B。用' |'给分隔符分开。 我已将message_B拆分为(新)不同的数据帧,但如何迭代完整 DF并删除不需要的消息? (我不想丢掉整行) 我已经尝试df[df['Message_B'].str.contains("(Live Storage: 18.0 @ $??)")==False]但我需要为每种类型的消息执行此操作,并且消息中的数字会发生变化。 此外,我现在意识到我不能在完整的DF上使用.str.contains。 任何帮助将不胜感激,并抱歉我如何在消息中设置DF,发现它是最容易阅读的。感谢

修改 我已经能够用以下内容取出标准错误消息:

error_msg1 = "(! Price for live Storage not found in Pricing Plan !)" 
replace_with = ''
bumi_output['Message_B'] = [i.replace(error_msg1, replace_with) for i in bumi_output['Message_B']]

有没有办法使用这种方法来取出错误消息,其中一部分的消息可以改变?例如:     (实时存储:18.0 @ $ ??)     (实时存储:69.0 @ $ ??)

谢谢。

1 个答案:

答案 0 :(得分:1)

以下相当丑陋的列表理解通过简单地找到所有括号并排除带有'!'的括号,从而从消息B中获得您想要的内容和'$ ??'然后将其余部分加在一起

new_B = [' '.join([subs for subs in re.findall('\(.+?\)', val) if '!' not in subs and '$??' not in subs]) 
for val in df['Message_B']]

然后将其添加到A

df['Message_A'] = df['Message_A'] + new_B

要看到这一点有效:

In [26]: df['Message_A'][1]
Out[26]: '(Live Storage: 5.00   included in Plan for $5.00 - Exceess of 11.0   @ $40.0)( ABC Storage: 141.0   @ $2.00) (Discount of 10.0% applied to storage amount)'