我想要总结一个大型数据集。 数据是健康记录,其中每个人检查了许多器官/组织,并且以叙述形式输入诊断。我有一些我想要找到的关键诊断术语,然后我想知道哪些器官与诊断有关。
示例 (所有条目都转换为字符串)
dataframe1
Organ Diagnosis
lungs interstitial pneumonia
liver hepatic congestion ; diffuse
cerebrum traumatic disruption and hemorrhage
adrenal gland focal hemorrhage
dataframe2
Keywords
congestion
hemorrhage
trauma
pneumonia
我想在dataframe1$Diagnosis
中搜索与dataframe2$Keywords
匹配的字符串,
并且对于每个匹配,返回在dataframe1$Organ
的相应行中输入的管风琴。
dataframe1 <- structure(list(Organ = c("lungs", "liver", "cerebrum", "adrenal gland"
), Diagnosis = c("interstitial pneumonia", "hepatic congestion ; diffuse",
"traumatic disruption and hemorrhage", "focal hemorrhage")), .Names = c("Organ",
"Diagnosis"), class = "data.frame", row.names = c(NA, -4L))
dataframe2 <- data.frame(Keywords=c("congestion","hemorrhage","trauma","pneumonia"),stringsAsFactors=FALSE)
答案 0 :(得分:2)
我们可以使用grep
sapply(dataframe2$Keywords, function(x)
toString(trimws(dataframe1[,1][grep(x, dataframe1[,2])])))
答案 1 :(得分:2)
我认为返回符合条件的堆栈列表可能很有价值,如:
stack(
sapply(dataframe2$Keywords,
function(x) dataframe1$Organ[grepl(x, dataframe1$Diagnosis)])
)
# values ind
#1 liver congestion
#2 cerebrum hemorrhage
#3 adrenal gland hemorrhage
#4 cerebrum trauma
#5 lungs pneumonia