R - 根据第二个数据帧更改数据框列的值

时间:2018-01-23 21:10:26

标签: r

我需要根据第二个数据帧(列Gene_SYMBOL)的值更改数据帧第一列(ID_REF)的名称,并与两个数据帧的第一列(ID_REF和IlmnID)匹配。

df1
ID_REF  Sample1 Sample2 Sample3
cg00000292  0.2841738   1.212398    0.5326877
cg00002426  -4.7278154  -4.217920   -4.1224573
cg00003994  -5.7353341  -5.966922   -6.2235540

df2
IlmnID  NameIlmnStrand  AddressA_ID Gene_Symbol 
cg00002426  cg00002426  TOP SLMAP
cg00005847  cg00005847  BOT HOXD3
cg00000292  cg00000292  TOP ATP2A1
cg00006414  cg00006414  BOT ZNF398
cg00003994  cg00003994  TOP MEOX2

我的输出:

new_df
    Gene_Symbol Sample1 Sample2 Sample3
    ATP2A1  0.2841738   1.212398    0.5326877
    SLMAP   -4.7278154  -4.217920   -4.1224573
    MEOX2   -5.7353341  -5.966922   -6.2235540

3 个答案:

答案 0 :(得分:1)

这只是一个简单的inner_join。您可以使用dplyr包,或使用基础R中的merge。请注意,如果没有在df中匹配的ID_REF,则使用inner_join将省略该行。

library(dplyr)

new_df <- inner_join(df1, df2, by = c("ID_REF" = "IlmnID")) %>%
               select(Gene_Symbol, Sample1, Sample2, Sample3)

答案 1 :(得分:1)

基础套餐:

merge(df2[ , c("NameIlmnStrand", "Gene_Symbol")], df1,
      by.x = "NameIlmnStrand", by.y = 'ID_REF',
      all.y = TRUE)[ ,-1]

<强>输出继电器

 Gene_Symbol    Sample1   Sample2    Sample3
1      ATP2A1  0.2841738  1.212398  0.5326877
2       SLMAP -4.7278154 -4.217920 -4.1224573
3       MEOX2 -5.7353341 -5.966922 -6.2235540

答案 2 :(得分:0)

df1<- data.frame(
  ID_REF=c("cg00000292", "cg00002426", "cg00003994"),
  sample1 = rnorm(3),
  Sample2 = rnorm(3),
  stringsAsFactors = F
)

df2 <- data.frame(
  IlmnID = c("cg00000292", "cg00002426", "cg00003994"),
  Gene_Symbol= c("SLMAP", "ATP2A", "MEOX2"),
  stringsAsFactors = F
)


# If you are sure that all IDs are included in df2
df1$ID_REF <- df2$Gene_Symbol[df2$IlmnID == df1$ID_REF]

#otherwise use sapply
df1$ID_REF <- sapply(df1$ID_REF , function(x) {
  if (x %in% df2$IlmnID) {
    df2$Gene_Symbol[df2$IlmnID == x]
  } else {
    NA
  }})