我有一个名为Messages的数据集,其中包含C#错误。我有一个名为Usernames的第二个数据集,其中包含一个用户名列表。我想从邮件中删除任何用户名的出现。没有消息应该包含1个以上的用户名。我以为我可以用gsubfn做到这一点,但它输出所有NULL。有人可以指导我实现最佳方法吗?
usrNm <- c(dataset2$username)
stripUsername <- function(x) {gsubfn(usrNm,'',x)}
noUsernames <- within(dataset,{Message=stripUsername(dataset$Message)})
+----------------------------------+----------------------------------+ +--------------+
| Message | Expected output | | Username |
+----------------------------------+----------------------------------+ +--------------+
| User: Mary.Jane sent bad data | User: sent bad data | | Mary.Jane |
+----------------------------------+----------------------------------+ +--------------+
| Error occurred in System.Module. | Error occurred in System.Module. | | Robert.Frost |
+----------------------------------+----------------------------------+ +--------------+
| Hello, world! | Hello, world! | | BB.Wolf |
+----------------------------------+----------------------------------+ +--------------+
| Tracing request by Robert.Frost! | Tracing request by ! |
+----------------------------------+----------------------------------+
答案 0 :(得分:1)
这是一种方法:
library(stringi)
stri_replace_all_fixed(dataset$Message, dataset2$Username, '', vectorize_all = FALSE)
输出
[1] "User: sent bad data" "Error occurred in System.Module."
[3] "Hello, world!" "Tracing request by !"
数据
dataset <- data.frame(
Message = c("User: Mary.Jane sent bad data", "Error occurred in System.Module.", "Hello, world!", "Tracing request by Robert.Frost!"),
stringsAsFactors = FALSE
)
dataset2 <- data.frame(
Username = c("Mary.Jane", "Robert.Frost", "BB.Wolf")
)