Question

我的电子邮件地址有一个大熊猫数据框，并希望将所有.edu电子邮件替换为“ Edu”。我想出了一种效率极低的方法，但是必须有一种更好的方法。这是我的方法：

import pandas as pd
import re
inp = [{'c1':10, 'c2':'gedua.com'},   {'c1':11,'c2':'wewewe.Edu'},   {'c1':12,'c2':'wewewe.edu.ney'}]
dfn = pd.DataFrame(inp)

for index, row in dfn.iterrows():
    try:
        if len(re.search('\.edu', row['c2']).group(0)) > 1:
            dfn.c2[index] = 'Edu'
            print('Education')
    except:
        continue

Answer 1

使用str.contains进行不区分大小写的选择，并使用loc进行分配。

dfn.loc[dfn.c2.str.contains(r'\.Edu', case=False), 'c2'] = 'Edu'    
dfn

   c1         c2
0  10  gedua.com
1  11        Edu
2  12        Edu

如果只是要替换为.edu的电子邮件结尾，则

dfn.loc[dfn.c2.str.contains(r'\.Edu$', case=False), 'c2'] = 'Edu'

或者，按照piR的建议，

dfn.loc[dfn.c2.str.endswith('.Edu'), 'c2'] = 'Edu'

dfn

   c1              c2
0  10       gedua.com
1  11             Edu
2  12  wewewe.edu.ney

Answer 2

`replace`

dfn.replace('^.*\.Edu$', 'Edu', regex=True)

   c1              c2
0  10       gedua.com
1  11             Edu
2  12  wewewe.edu.ney

模式'^.*\.Edu$'表示抓取从字符串开头到找到'.Edu'的所有内容，然后是字符串的结尾，然后将整个内容替换为'Edu' < / p>

特定于列

您可能希望将范围限制为一列（或多列）。您可以通过将字典传递到replace来实现，其中外键指定列，而字典值指定要替换的内容。

dfn.replace({'c2': {'^.*\.Edu$': 'Edu'}}, regex=True)

   c1              c2
0  10       gedua.com
1  11             Edu
2  12  wewewe.edu.ney

不区分大小写的[thx @coldspeed]

pandas.DataFrame.replace没有大小写标志。但是您可以使用'(?i)'

将其嵌入到模式中

dfn.replace({'c2': {'(?i)^.*\.edu$': 'Edu'}}, regex=True)

   c1              c2
0  10       gedua.com
1  11             Edu
2  12  wewewe.edu.ney

根据正则表达式匹配替换整个字符串

2 个答案:

`replace`

特定于列

不区分大小写的[thx @coldspeed]