有条件的tidyr :: complete(),最大可变

时间:2019-03-15 14:23:08

标签: r dplyr tidyr

我想使用complete()(或另一种方法)填充一列,但是直到每个值都达到一定值为止。

使用以下数据:

library(tidyverse)

df <- tribble(~Question_Code,   ~RespondentLevel,
"Engagement - Inclusion",   5,
"External engagement - policies",   2,
"External engagement - technology", 5,
"Community data",   5,
"Internal engagement",  5,
"Internal use of technology",   4,
"Familiarity/Alignment",    5,
"Environmental impacts",    5,
"Innovation",   2,
"Use of open-source technology",    2,
"Regulation of hardware & software",    5,
"In-house technical capacity",  5,
"Infrastructure procurement",   5,
"Algorithmic Error & Bias", 2,
"Control: Privacy", 5,
"Accountability in Governance Structures",  3,
"Open procurement", 5,
"Use in decision-making",   1,
"Accountability",   1,
"External Control", 4,
"Internal Control", 2,
"Open Data",    2)

#A tibble: 22 x 2
   Question_Code                    RespondentLevel
   <chr>                                      <dbl>
 1 Engagement - Inclusion                         5
 2 External engagement - policies                 2
 3 External engagement - technology               5
 4 Community data                                 5
 5 Internal engagement                            5
 6 Internal use of technology                     4
 7 Familiarity/Alignment                          5
 8 Environmental impacts                          5
 9 Innovation                                     2
10 Use of open-source technology                  2
# ... with 12 more rows

例如,“订婚-包含”为5级,因此我希望此“完成”为1,2,3,4,5。但是,“外部参与-政策”是第2级,因此我希望它仅以1,2完成。

使用

df_full <- df %>%
  complete(nesting(Question_Code), RespondentLevel) %>%
  mutate(RespondentLevel = as.character(RespondentLevel)) 
# A tibble: 110 x 3
   Question_Code    RespondentLevel  
   <fct>            <chr>             
 1 Open Data        1                   
 2 Open Data        2              
 3 Open Data        3                 
 4 Open Data        4                   
 5 Open Data        5                  
 6 Internal Control 1                
 7 Internal Control 2               
 8 Internal Control 3                    
 9 Internal Control 4                    
10 Internal Control 5                    
# ... with 100 more rows

从1:5开始完成每个操作,但是如何通过值限制每个最大值?

谢谢,我正在努力提供ifelse()解决方案。

3 个答案:

答案 0 :(得分:1)

一种tidyverse可能是:

df %>%
 group_by(Question_Code) %>%
 complete(RespondentLevel = full_seq(1:max(RespondentLevel), 1))

   Question_Code                           RespondentLevel
   <chr>                                             <dbl>
 1 Accountability                                        1
 2 Accountability in Governance Structures               1
 3 Accountability in Governance Structures               2
 4 Accountability in Governance Structures               3
 5 Algorithmic Error & Bias                              1
 6 Algorithmic Error & Bias                              2
 7 Community data                                        1
 8 Community data                                        2
 9 Community data                                        3
10 Community data                                        4

答案 1 :(得分:1)

以下是一些多样性的data.table方法:

library(data.table)
setDT(df)
df[, .(RespondentLevel = seq_len(RespondentLevel)), by = .(Question_Code)]
#                              Question_Code RespondentLevel
# 1:                  Engagement - Inclusion               1
# 2:                  Engagement - Inclusion               2
# 3:                  Engagement - Inclusion               3
# 4:                  Engagement - Inclusion               4
# 5:                  Engagement - Inclusion               5
# 6:          External engagement - policies               1
# 7:          External engagement - policies               2
# 8:        External engagement - technology               1
# 9:        External engagement - technology               2
# 10:        External engagement - technology               3
# 11:        External engagement - technology               4
# 12:        External engagement - technology               5

答案 2 :(得分:1)

您也可以使用expand

library(tidyverse)

df %>%
  group_by(Question_Code) %>%
  expand(RespondentLevel = 1:max(RespondentLevel))

# Question_Code                           RespondentLevel
#   <chr>                                             <int>
# 1 Accountability                                        1
# 2 Accountability in Governance Structures               1
# 3 Accountability in Governance Structures               2
# 4 Accountability in Governance Structures               3
# 5 Algorithmic Error & Bias                              1
# 6 Algorithmic Error & Bias                              2
# 7 Community data                                        1
# 8 Community data                                        2
# 9 Community data                                        3
#10 Community data                                        4
# … with 70 more rows