使用列名模式在列表中创建新变量

时间:2020-01-21 15:02:13

标签: dplyr

嗨,我正在尝试使用以下代码将33个变量减少为一个指标(我知道这样做效率极低:

<?php

header('Cache-Control: no-store');
header('Content-Type: text/javascript');

if ($_GET['callback']=='') {
    echo 'alert("Error: A callback function must be specified.")';
}
elseif (!isset($_GET['cookieName'])) {// Cookie not set yet
    $cookieName = strtr((string)$_SERVER['UNIQUE_ID'], '@', '_');
    while (isset($_COOKIE[$cookieName]) || $cookieName=='') {
        $cookieName = dechex(mt_rand());// Get random cookie name
    }
    setcookie($cookieName, '3rd-party', 0, '/');
    header('Location: '.$_SERVER['REQUEST_URI'].'&cookieName='.$cookieName);
}
elseif ($_COOKIE[$_GET['cookieName']]=='3rd-party') {// Third party cookies are enabled.
    setcookie($_GET['cookieName'], '', -1, '/'); // delete cookie
    echo $_GET['callback'].'(1)';
}
else {// Third party cookies are not enabled.
    echo $_GET['callback'].'(0)';
}

我该如何遍历这11个变量,以便对以特定数字“ _1”结尾的每个变量进行度量,然后从1:11将这些数字求和以得到1个指标?

我如何遍历所有以数字结尾的变量,将它们加在一起以创建以数字结尾的新变量,然后将所有这些变量的11个总和汇总为一个指标?

当前数据帧的示例:

data_indicator <- data %>%
  mutate(plot_1=(farm_sell_1+farm_lease_1+farm_bequeath_1)/3, na.rm=T) %>%
  mutate(plot_sec_1=ifelse(plot_1>.5, 1, 0)) %>%
  mutate(plot_2=(farm_sell_2+farm_lease_2+farm_bequeath_2)/3, na.rm=T) %>%
  mutate(plot_sec_2=ifelse(plot_2>.5, 1, 0)) %>%
  mutate(plot_3=(farm_sell_3+farm_lease_3+farm_bequeath_3)/3, na.rm=T) %>%
  mutate(plot_sec_3=ifelse(plot_3>.5, 1, 0)) %>%
  mutate(plot_4=(farm_sell_4+farm_lease_4+farm_bequeath_4)/3, na.rm=T) %>%
  mutate(plot_sec_4=ifelse(plot_4>.5, 1, 0)) %>%
  mutate(plot_5=(farm_sell_5+farm_lease_5+farm_bequeath_5)/3, na.rm=T) %>%
  mutate(plot_sec_5=ifelse(plot_5>.5, 1, 0)) %>%
  mutate(plot_6=(farm_sell_6+farm_lease_6+farm_bequeath_6)/3, na.rm=T) %>%
  mutate(plot_sec_6=ifelse(plot_6>.5, 1, 0)) %>%
  mutate(plot_7=(farm_sell_7+farm_lease_7+farm_bequeath_7)/3, na.rm=T) %>%
  mutate(plot_sec_7=ifelse(plot_7>.5, 1, 0)) %>%
  mutate(plot_8=(farm_sell_8+farm_lease_8+farm_bequeath_8)/3, na.rm=T) %>%
  mutate(plot_sec_8=ifelse(plot_8>.5, 1, 0)) %>%
  mutate(plot_9=(farm_sell_9+farm_lease_9+farm_bequeath_9)/3, na.rm=T) %>%
  mutate(plot_sec_9=ifelse(plot_9>.5, 1, 0)) %>%
  mutate(plot_10=(farm_sell_10+farm_lease_10+farm_bequeath_10)/3, na.rm=T) %>%
  mutate(plot_sec_10=ifelse(plot_10>.5, 1, 0)) %>%
  mutate(plot_11=(farm_sell_11+farm_lease_11+farm_bequeath_11)/3, na.rm=T) %>%
  mutate(plot_sec_11=ifelse(plot_11>.5, 1, 0)) %>%
  mutate(num_plots_sec = plot_sec_1+plot_sec_2+plot_sec_3+plot_sec_4+plot_sec_5+plot_sec_6+plot_sec_7+plot_sec_8+plot_sec_9+plot_sec_10+plot_sec_11, na.rm=T) 

1 个答案:

答案 0 :(得分:0)

使此方法更简洁的一种方法是将这些数据转换为长数据。有关宽幅还是长幅的更多信息,请访问以下站点:https://uc-r.github.io/tidyr

我将逐步介绍该过程,以便您了解其工作原理,然后包含最后一次全部完成的少量代码。

首先要使用一些虚假数据:

fake.data <- data.frame(matrix(data = rbinom(1650, 1, 0.5), nrow = 50, ncol = 33))
colnames(fake.data) <- c(paste0("farm_sell_", 1:11), paste0("farm_lease_", 1:11),
                          paste0("farm_bequeath_", 1:11))

上面看起来像你的

'data.frame':   50 obs. of  33 variables:
 $ farm_sell_1     : int  0 0 0 1 0 1 0 0 1 1 ...
 $ farm_sell_2     : int  0 0 1 1 1 1 1 1 0 0 ...
 $ farm_sell_3     : int  1 0 0 0 1 1 0 1 0 0 ...
 $ farm_sell_4     : int  1 1 1 0 1 0 0 0 1 1 ...
 $ farm_sell_5     : int  1 1 0 0 1 0 0 0 1 1 ...
 $ farm_sell_6     : int  0 1 0 0 0 0 0 0 0 0 ...
 $ farm_sell_7     : int  1 0 1 1 0 0 0 1 0 1 ...
 $ farm_sell_8     : int  0 0 1 0 0 1 1 0 1 0 ...
 $ farm_sell_9     : int  1 1 1 0 0 0 1 1 1 1 ...
 $ farm_sell_10    : int  1 1 0 0 1 0 1 1 0 0 ...
 $ farm_sell_11    : int  0 0 0 0 1 1 1 0 0 0 ...
 $ farm_lease_1    : int  0 0 1 1 0 0 1 0 1 0 ...
 $ farm_lease_2    : int  0 0 0 1 1 1 1 1 1 0 ...
 $ farm_lease_3    : int  0 1 1 1 0 1 1 1 0 0 ...
 $ farm_lease_4    : int  1 0 1 1 0 1 1 1 1 1 ...
 $ farm_lease_5    : int  0 0 0 0 1 1 0 1 0 1 ...
 $ farm_lease_6    : int  0 1 1 0 1 1 0 0 1 1 ...
 $ farm_lease_7    : int  0 0 0 1 1 1 0 1 1 1 ...
 $ farm_lease_8    : int  0 1 0 1 0 0 1 0 1 0 ...
 $ farm_lease_9    : int  0 0 1 0 0 1 0 0 1 1 ...
 $ farm_lease_10   : int  1 1 1 1 0 1 1 1 0 1 ...
 $ farm_lease_11   : int  1 0 0 1 1 0 0 0 1 1 ...
 $ farm_bequeath_1 : int  1 1 1 0 0 1 1 1 0 0 ...
 $ farm_bequeath_2 : int  0 1 1 0 0 1 1 1 1 1 ...
 $ farm_bequeath_3 : int  1 0 0 1 1 0 1 0 0 1 ...
 $ farm_bequeath_4 : int  0 0 1 1 1 0 0 0 1 0 ...
 $ farm_bequeath_5 : int  1 1 0 0 0 0 0 1 1 0 ...
 $ farm_bequeath_6 : int  0 1 0 0 0 0 1 0 1 1 ...
 $ farm_bequeath_7 : int  0 0 0 0 1 0 1 0 0 1 ...
 $ farm_bequeath_8 : int  0 1 0 1 1 0 0 1 1 1 ...
 $ farm_bequeath_9 : int  0 0 1 1 0 1 1 0 0 1 ...
 $ farm_bequeath_10: int  0 0 0 1 1 1 0 0 1 1 ...
 $ farm_bequeath_11: int  0 0 0 1 0 0 0 0 0 0 ...

您需要dplyrtidyr软件包来完成所有这些工作。

library(dplyr)
library(tidyr)

然后,我们使用pivot_longer中的tidyr使其变长。我在此处添加了一个键,以引用每个指标用于哪个服务器场。稍后我们将需要对其进行分组,但基本上会与原始数据中的行号匹配。

data.long <- fake.data %>%
  #add a key to keep track of stuff
  mutate(farm_key = 1:n()) %>%
  pivot_longer(farm_sell_1:farm_bequeath_11, names_to = "variable", values_to = "value")

这看起来像这样

# A tibble: 6 x 3
  farm_key variable    value
     <int> <chr>       <int>
1        1 farm_sell_1     0
2        1 farm_sell_2     0
3        1 farm_sell_3     1
4        1 farm_sell_4     1

接下来,我们使用separate将您的farm_sell_1等变量拆分为更具机器可读性的内容:

data.long2 <- data.long %>%
  tidyr::separate(col = variable, into = c("farm", "var", "var_num"), sep = "_")

得出这样的数据:

# A tibble: 6 x 5
  farm_key farm  var   var_num value
     <int> <chr> <chr> <chr>   <int>
1        1 farm  sell  1           0
2        1 farm  sell  2           0
3        1 farm  sell  3           1
4        1 farm  sell  4           1

然后,我们完成您上面所做的所有添加。首先,我们按var_num分组,然后为每个服务器场添加这些变量。这与添加farm_sell_1 + farm_lease_1 + farm_bequeath_1并除以三一样,就像上面所做的一样。然后,我们通过ifelse语句计算plot_sec。最后,我们可以为每个服务器场总计这11个索引(每个_1,_2,_3一个索引),从而每个服务器场获得一个索引值。

data.long3 <- data.long2 %>%
  group_by(farm_key, var_num) %>%
  summarise(plot_val = sum(value, na.rm = T)/3) %>% #same as plot_1, plot_2, etc.
  ungroup() %>%
  mutate(plot_sec = ifelse(plot_val>0.5,1,0)) %>%
  #sum together to get one value for each farm_key
  group_by(farm_key) %>%
  summarise(num_plots_sec = sum(plot_sec)) %>%
  ungroup() 

然后数据如下:

# A tibble: 6 x 2
  farm_key num_plots_sec
     <int>         <dbl>
1        1             4
2        2             4
3        3             4
4        4             8
5        5             7

并且如所承诺的,一大堆代码可以一次完成所有操作:

data.one.ind <- fake.data %>%
  #add a key to keep track of stuff
  mutate(farm_key = 1:n()) %>%
  pivot_longer(farm_sell_1:farm_bequeath_11, names_to = "variable", values_to = "value") %>%
  tidyr::separate(col = variable, into = c("farm", "var", "var_num"), sep = "_") %>%
  group_by(farm_key, var_num) %>%
  summarise(plot_val = sum(value, na.rm = T)/3) %>% #same as plot_1, plot_2, etc.
  ungroup() %>%
  mutate(plot_sec = ifelse(plot_val>0.5,1,0)) %>%
  #sum together to get one value for each farm_key
  group_by(farm_key) %>%
  summarise(num_plots_sec = sum(plot_sec)) %>%
  ungroup()

总而言之,它实际上可能不会为您节省太多的键入时间。但是它更适应变量的变化。