重新编码引用多个向量/列的变量

时间:2018-10-10 23:37:00

标签: r if-statement

这是有关有效编写逻辑条件的问题。

假设我要重新编码变量,如果集合中的任何列等于特定值。

test <- tibble(
 CompanyA = rep(c(0:1),5),
 CompanyB = rep(c(0),10),
 CompanyC = c(1,1,1,1,0,0,1,1,1,1)
)
test

一种基本方法是:

test$newvar <- ifelse(test$CompanyA==1 | test$CompanyB == 1 | test$CompanyC == 1,-99,0)

table(test$newvar)

但是,如果我有几十个列怎么办?我不想写出CompanyACompanyB等。是否有一种实质上使用%in类型语句的方法?这是一个明显错误的方法:

condition <- columns %in% c("CompanyA", "CompanyB", "CompanyC") . # obviously doesn't work

test$newvar[condition] <- 1

或者这是一种更简单的方式-例如if CompanyA:CompanyC == 1, then do...

1 个答案:

答案 0 :(得分:1)

概述

通过reshaping test从长到宽,我能够创建一列来测试CompanyX列中的任何值是否包含1。

代码

# load necessary packages ----
library(tidyverse)

# load necessary data ----
test <- 
  tibble(CompanyA = rep(c(0:1),5),
         CompanyB = rep(c(0),10),
         CompanyC = c(1,1,1,1,0,0,1,1,1,1)) %>% 
  # create an 'id' column
  mutate(id = 1:n())

# calculations -----
new.var <-
  test  %>%
  # transfrom data from long to wide
  gather(key = "company", value = "value", -id) %>%
  # for each 'id' value
  # test if any 'value' is equal to 1
  # if so, return -99; else return 0
  group_by(id) %>%
  summarize(new_var = if_else(any(value == 1), -99, 0))

# left join new.var onto test ---
test <-
  test %>%
  left_join(new.var, by = "id")

# view results ---
test
# A tibble: 10 x 5
#    CompanyA CompanyB CompanyC    id new_var
#       <int>    <dbl>    <dbl> <int>   <dbl>
#  1        0        0        1     1     -99
#  2        1        0        1     2     -99
#  3        0        0        1     3     -99
#  4        1        0        1     4     -99
#  5        0        0        0     5       0
#  6        1        0        0     6     -99
#  7        0        0        1     7     -99
#  8        1        0        1     8     -99
#  9        0        0        1     9     -99
# 10        1        0        1    10     -99

# end of script #