如何将一个data.frame的两列与R中另一个data.frame的一个特定列进行比较?

时间:2018-04-13 07:12:15

标签: r

背景

我有两个data.frames,一个有多个公司,另一个有一个索引,

我想在满足这些两个条件时计算:

第一个条件:两家公司继续前进(仅当A = A或C = C时)

第二个条件:索引显示方向相反,当公司显示A = A,索引显示C或公司显示C = C时,索引显示A

示例:第1列 - Comp1(C)Comp3(C)&第1列 - index1(A)| COUNT = 1

6对将是Comp1& Comp2,Comp1& Comp3,Comp1& Comp4,Comp2& Comp3,Comp2& Comp4和Comp3& Comp4 - 加上每对的索引

不知道哪个功能可以帮助我...

data.frames代码:

  #Data.frame1 COMPANIES

  comp1 <- c("C","A","B","B","A")
  comp2 <- c("A","A","C","C","C")
  comp3 <- c("C","B","B","A","A")
  comp4 <- c("C","C","A","A","A")



  dfcomp <- data.frame(comp1, comp2, comp3, comp4)

  #Data.frame2 INDEX

   index1 <- c("A","B","C","C","C")

   dfindex <- data.frame(index1)

最终输出:像4x4矩阵一样产生一行(只是有趣的值)

         [12i] [13i] [14i] [23i] [24i] [34i]
     [1]   0     2     2     0     0     3 

1 个答案:

答案 0 :(得分:1)

其中一种方法可能是

library(dplyr)

comp_func <- function(x, y, temp, index){
  temp <- bind_cols(temp[,!is.na(match(names(temp), c(x, y)))], index)
  temp[,] <- lapply(temp, function(i) as.character(i))
  ret <- sum(temp[,1] == temp[,2] & 
             temp[,1] %in% c('A', 'C') &
             ((temp[,1]=='A' & temp[,3]=='C') | (temp[,1]=='C' & temp[,3]=='A')))
  return(ret)
}

df <- as.data.frame.matrix(t(combn(names(dfcomp),2)), stringsAsFactors = F)
df %>%
  rowwise() %>%
  mutate(val = comp_func(V1, V2, dfcomp, dfindex))

输出为:

  V1    V2      val
1 comp1 comp2     0
2 comp1 comp3     2
3 comp1 comp4     2
4 comp2 comp3     0
5 comp2 comp4     0
6 comp3 comp4     3

示例数据:

dfcomp <- structure(list(comp1 = structure(c(3L, 1L, 2L, 2L, 1L), .Label = c("A", 
"B", "C"), class = "factor"), comp2 = structure(c(1L, 1L, 2L, 
2L, 2L), .Label = c("A", "C"), class = "factor"), comp3 = structure(c(3L, 
2L, 2L, 1L, 1L), .Label = c("A", "B", "C"), class = "factor"), comp4 = structure(c(2L, 2L, 1L, 1L, 1L), .Label = c("A", "C"), class = "factor")), .Names = c("comp1", "comp2", "comp3", 
"comp4"), row.names = c(NA, -5L), class = "data.frame")

dfindex <- structure(list(index1 = structure(c(1L, 2L, 3L, 3L, 3L), .Label = c("A", 
"B", "C"), class = "factor")), .Names = "index1", row.names = c(NA, 
-5L), class = "data.frame")