使用dplyr更新其他因子级别给定匹配的空白级别因子

时间:2017-06-29 16:55:53

标签: r

我有一个像这样的数据框:

df <- read.table(sep = ",", header = TRUE, stringsAsFactors = FALSE, 
text = "
plantfam,lepfam,lepsp\n
             Asteraceae,Geometridae,Eois sp\n
             Asteraceae,Erebidae,\n
             Poaceae,Erebidae,\n
             Poaceae,Noctuidae,\n
             Asteraceae,Saturnidae,Polyphemous sp\n
             Melastomaceae,Noctuidae,\n
             Asteraceae,,\n
             Melastomaceae,,\n
             ,Noctuidae,\n
             ,Erebidae,\n
             Poaceae, Erebidae,\n")

我希望以lepspplantfam的唯一组合为条件,创建唯一的lepfam个名称。每个lepfam必须首先进行子集化。对于该lepfam子集中的每个唯一plantfam lepfam组合,指定了morpho种类名称。对于那些植物纤维或lepfam是空白的,没有指定一个morpho物种。重复的plantfam lepfam组合应该被赋予相同的形态物种名称。输出应如下所示:

output<- 
 plantfam        lepfam                      lepsp
 Asteraceae      Geometridae                 Eois sp         
 Asteraceae      Erebidae                    Erebidae_morphosp1                 
 Poaceae         Erebidae                    Erebidae_morphosp2
 Poaceae         Noctuidae                   Noctuidae_morphosp1      
 Asteraceae      Saturnidae                  Polyphemous sp        
 Melastomaceae   Noctuidae                   Noctuidae_morphosp2
 Asteraceae             
 Melastomaceae   
                 Noctuidae
                 Erebidae
 Poaceae          Erebidae                    Erebidae_morphosp2

我试过了:

condition <- quote(lepsp == "" & plantfam != "" & lepfam != "")
subset1 <- df %>% filter(condition) %>% group_by(lepfam) %>% 
mutate(lepsp= 
paste0(lepfam,"_morphosp",match(plantfam,unique(plantfam))))
subset2 <- df %>% filter(condition) %>% setdiff(df, .)
union(subset1, subset2) %>% arrange(lepsp)

然而,PoaceaeErebidae这两行在它们应该相同时返回不同的morphosp数Erebidae_morphosp1Erebidae_morphosp2

Source: local data frame [11 x 3]
Groups: lepfam [6]

                     plantfam      lepfam               lepsp
                        <chr>       <chr>               <chr>
1                   Melastomaceae                                
2                      Asteraceae                                
3                         Poaceae    Erebidae  Erebidae_morphosp1
4                      Asteraceae Geometridae             Eois sp
5                      Asteraceae    Erebidae  Erebidae_morphosp1
6                         Poaceae    Erebidae  Erebidae_morphosp2
7                                    Erebidae  Erebidae_morphosp3
8                         Poaceae   Noctuidae Noctuidae_morphosp1
9                   Melastomaceae   Noctuidae Noctuidae_morphosp2
10                                  Noctuidae Noctuidae_morphosp3
11                     Asteraceae  Saturnidae      Polyphemous sp

1 个答案:

答案 0 :(得分:0)

我认为问题可能只是在你的df中,最后一行在Erebidae之前有一个空间,这导致R认为它与另一个不同。

我发现我正在完成答案。这里&#39;我该怎么做你想做的事。我在lepfam_number粘贴之前引入了一个组号mutate

library(dplyr)
df %>%
  group_by(lepfam) %>%
  mutate(lepfam_number= match(plantfam, unique(plantfam)),
         lepsp=ifelse(lepsp=="" & lepfam!="" & trimws(plantfam)!="",
               paste0(lepfam,"_morphosp",lepfam_number),
               lepsp)
  )

                     plantfam      lepfam               lepsp lepfam_number
                        <chr>       <chr>               <chr>         <int>
1                  Asteraceae Geometridae             Eois sp             1
2                  Asteraceae    Erebidae  Erebidae_morphosp1             1
3                     Poaceae    Erebidae  Erebidae_morphosp2             2
4                     Poaceae   Noctuidae Noctuidae_morphosp1             1
5                  Asteraceae  Saturnidae      Polyphemous sp             1
6               Melastomaceae   Noctuidae Noctuidae_morphosp2             2
7                  Asteraceae                                             1
8               Melastomaceae                                             2
9                               Noctuidae                                 3
10                               Erebidae                                 3
11                    Poaceae    Erebidae  Erebidae_morphosp2             2

数据

df <- read.table(sep = ",", header = TRUE, stringsAsFactors = FALSE, 
                 text = "
plantfam,lepfam,lepsp\n
             Asteraceae,Geometridae,Eois sp\n
             Asteraceae,Erebidae,\n
             Poaceae,Erebidae,\n
             Poaceae,Noctuidae,\n
             Asteraceae,Saturnidae,Polyphemous sp\n
             Melastomaceae,Noctuidae,\n
             Asteraceae,,\n
             Melastomaceae,,\n
             ,Noctuidae,\n
             ,Erebidae,\n
             Poaceae,Erebidae,\n")