在另一个数据集中替换一个数据集

时间:2020-01-27 19:31:09

标签: r powerbi

我有一个名为Messages的数据集,其中包含C#错误。我有一个名为Usernames的第二个数据集,其中包含一个用户名列表。我想从邮件中删除任何用户名的出现。没有消息应该包含1个以上的用户名。我以为我可以用gsubfn做到这一点,但它输出所有NULL。有人可以指导我实现最佳方法吗?

usrNm <- c(dataset2$username)
stripUsername <- function(x) {gsubfn(usrNm,'',x)}
noUsernames <- within(dataset,{Message=stripUsername(dataset$Message)})
+----------------------------------+----------------------------------+    +--------------+
| Message                          | Expected output                  |    | Username     |
+----------------------------------+----------------------------------+    +--------------+
| User: Mary.Jane sent bad data    | User:  sent bad data             |    | Mary.Jane    |
+----------------------------------+----------------------------------+    +--------------+
| Error occurred in System.Module. | Error occurred in System.Module. |    | Robert.Frost |
+----------------------------------+----------------------------------+    +--------------+
| Hello, world!                    | Hello, world!                    |    | BB.Wolf      |
+----------------------------------+----------------------------------+    +--------------+
| Tracing request by Robert.Frost! | Tracing request by !             |
+----------------------------------+----------------------------------+

1 个答案:

答案 0 :(得分:1)

这是一种方法:

library(stringi)

stri_replace_all_fixed(dataset$Message, dataset2$Username, '', vectorize_all = FALSE)

输出

[1] "User:  sent bad data"             "Error occurred in System.Module."
[3] "Hello, world!"                    "Tracing request by !" 

数据

dataset <- data.frame(
  Message = c("User: Mary.Jane sent bad data", "Error occurred in System.Module.", "Hello, world!", "Tracing request by Robert.Frost!"),
  stringsAsFactors = FALSE
)

dataset2 <- data.frame(
  Username = c("Mary.Jane", "Robert.Frost", "BB.Wolf")
)
相关问题