优雅地更改多列

时间:2019-02-28 17:51:58

标签: r dplyr mutate

具有df,如下所示

df <- read.table(text="name id_final    id1 id2 id3
sample1 10.96311    4.767571    3.692556    2.966773
sample2 10.83782    11.61998    11.402257   10.301068
sample3 13.98669    12.123346   10.299306   8.85533
sample4 13.97313    12.200774   11.874366   11.013115
sample5 13.89532    10.712515   9.102278    9.832699
sample6 13.86255    11.808834   9.180613    8.813621", header=T, sep='\t')
head(df)
> head(df)
     name id_final       id1       id2       id3
1 sample1 10.96311  4.767571  3.692556  2.966773
2 sample2 10.83782 11.619980 11.402257 10.301068
3 sample3 13.98669 12.123346 10.299306  8.855330
4 sample4 13.97313 12.200774 11.874366 11.013115
5 sample5 13.89532 10.712515  9.102278  9.832699
6 sample6 13.86255 11.808834  9.180613  8.813621

需要做一些基本的数学运算,将每列与id_final列相除 并使用后缀with_log创建新列,这可以通过简单的mutate完成,如下所示。

df <- df %>%
  mutate(id1_log = log2(id1/id_final),
         id2_log = log2(id2/id_final),
         id3_log = log2(id3/id_final))
head(df)
> head(df)
     name id_final       id1       id2       id3    id1_log     id2_log     id3_log
1 sample1 10.96311  4.767571  3.692556  2.966773 -1.2013308 -1.56996541 -1.88569067
2 sample2 10.83782 11.619980 11.402257 10.301068  0.1005330  0.07324483 -0.07328067
3 sample3 13.98669 12.123346 10.299306  8.855330 -0.2062667 -0.44150746 -0.65943661
4 sample4 13.97313 12.200774 11.874366 11.013115 -0.1956825 -0.23480474 -0.34343264
5 sample5 13.89532 10.712515  9.102278  9.832699 -0.3753018 -0.61029950 -0.49893967
6 sample6 13.86255 11.808834  9.180613  8.813621 -0.2313261 -0.59453027 -0.65338590

在给定的示例中,如果只有3列,这很容易,如果我有3列以上,我将如何自动执行此操作,每次键入此命令都不是很优雅。

mutate(id1_log = log2(id1/id_final),
          id2_log = log2(id2/id_final),
          id3_log = log2(id3/id_final))

为了提供更大的图像,我正在尝试编写一个可以在具有多个id1 ... n列的多个文件中使用的函数

2 个答案:

答案 0 :(得分:2)

可以做到:

library(dplyr)

df %>% mutate_at(vars(matches("id\\d+$")), list(log = ~ log2(. / id_final)))

我们更改(用mutate_at一次)所需的列-这些都与正则表达式id\\d+$相匹配,而正则表达式id基本上与以数字结尾并以id_final开头的列名匹配(例如,避免捕获id_..或任何其他log列。

然后,我们提供包含所需转换的列表。您可以为转换提供一个名称,然后该名称会自动附加到列名称中。我们说_log,所以列的结尾自动得到 name id_final id1 id2 id3 id1_log id2_log id3_log 1 sample1 10.96311 4.767571 3.692556 2.966773 -1.2013308 -1.56996541 -1.88569067 2 sample2 10.83782 11.619980 11.402257 10.301068 0.1005330 0.07324483 -0.07328067 3 sample3 13.98669 12.123346 10.299306 8.855330 -0.2062667 -0.44150746 -0.65943661 4 sample4 13.97313 12.200774 11.874366 11.013115 -0.1956825 -0.23480474 -0.34343264 5 sample5 13.89532 10.712515 9.102278 9.832699 -0.3753018 -0.61029950 -0.49893967 6 sample6 13.86255 11.808834 9.180613 8.813621 -0.2313261 -0.59453027 -0.65338590 ;您可以在那里写其他任何东西。

如果您不提供名称,则将修改已经存在的列;如果这样做,您会得到像我们这样的其他人。

输出:

$user

答案 1 :(得分:1)

这是一个data.table选项:

library(data.table)
cols <- names(df)[3:5] # first, select columns you are interested in (or names(df)[grepl("id\\d+$", names(df))])
setDT(df)[, paste(cols, "log", sep = "_") :=  lapply(.SD, function(x) log2(x/id_final)),
          .SDcols = cols][] # apply { function(x) log2(x/id_final) } to selected columns
# output
      name id_final       id1       id2       id3    id1_log     id2_log     id3_log
1: sample1 10.96311  4.767571  3.692556  2.966773 -1.2013308 -1.56996541 -1.88569067
2: sample2 10.83782 11.619980 11.402257 10.301068  0.1005330  0.07324483 -0.07328067
3: sample3 13.98669 12.123346 10.299306  8.855330 -0.2062667 -0.44150746 -0.65943661
4: sample4 13.97313 12.200774 11.874366 11.013115 -0.1956825 -0.23480474 -0.34343264
5: sample5 13.89532 10.712515  9.102278  9.832699 -0.3753018 -0.61029950 -0.49893967
6: sample6 13.86255 11.808834  9.180613  8.813621 -0.2313261 -0.59453027 -0.65338590