替换n个出现在DataFrame中的字符串

时间:2020-06-24 17:01:54

标签: python pandas dataframe

string.replace(s, old, new[, maxreplace])相比,pandas.DataFrame.replace()函数似乎缺少一个参数,该参数限制了您希望替换的出现次数。

例如:

df = pd.DataFrame({'col1': ['horse', 'dog', 'snake', 'dog'], 'col2': ['dog', 'snake', 'dog', 'cow']})

$ python run.py
    col1   col2
0  horse    dog
1    dog  snake
2  snake    dog
3    dog    cow

我想用BEAR(在所有列和行中)替换df中出现的 n = 3个 dog

所需的输出:

$ python run.py
    col1   col2
0  horse    BEAR
1    BEAR  snake
2  snake    dog
3    BEAR    cow

实现此目标的最佳方法是什么?我想避免遍历df的每个单元格。

3 个答案:

答案 0 :(得分:4)

一种方法是先拆栈,然后遮罩然后拆栈:

<!DOCTYPE html>
<html xmlns:th="http://www.w3.org/1999/xhtml">
<head>
    <meta charset="UTF-8">

    <link rel="stylesheet" th:href="@{/css/style.css}"/>
</head>
<body>

<div class="button">
<a href="restricted">Click here to Login</a>
</div>

<!--<div class="box">
<form method="GET" action="/restricted">
    <input type="submit" value="Click here to login">
</form>
</div>-->
</body>
</html>

使用numpy的另一种选择:

n = 3
s = df.unstack()
c = s.eq("dog").groupby(s).cumsum()
s.mask(c<=n,s.replace("dog","BEAR")).unstack(0)

arr = np.cumsum(np.ravel(df.eq("dog").to_numpy(),'F')).reshape(df.shape,order='F')
df[:] = np.where(arr<=3,df.replace("dog","BEAR"),df) #changes the array inplace
print(df)

答案 1 :(得分:3)

DataFrame.maskDataFrame.fillna与参数limit=3一起使用,该参数仅替换前三个NaN

df.mask(df.eq('dog')).unstack().fillna('BEAR', limit=3).fillna('dog').unstack(level=0)

    col1   col2
0  horse   BEAR
1   BEAR  snake
2  snake    dog
3   BEAR    cow

或更复杂的带有参数的函数:

def replace_n(data, to_replace, new, n):
    data = data.mask(data.eq(to_replace))
    data = data.unstack().fillna(new, limit=n)
    data = data.fillna(to_replace).unstack(level=0)
    
    return data


replace_n(df, 'dog', 'BEAR', n=3)

    col1   col2
0  horse   BEAR
1   BEAR  snake
2  snake    dog
3   BEAR    cow

答案 2 :(得分:0)

您可以使用此循环:

import pandas as pd

d = {'col1': ['horse', 'dog', 'snake', 'dog'], 'col2': ['dog', 'snake', 'dog', 'cow']}

n = 3

for k in d.keys():
    for i,s in enumerate(d[k]):
        if s == 'dog' and n > 0:
            d[k].pop(i)
            d[k].insert(i,'BEAR')
            n -= 1

df = pd.DataFrame(d)

print(df)

输出:

    col1   col2
0  horse   BEAR
1   BEAR  snake
2  snake    dog
3   BEAR    cow
相关问题