使用另一个变量的选定级别创建一个新变量

时间:2018-03-14 21:36:57

标签: r mutate r-factor

我在使用另一个变量的选定级别创建新变量时遇到问题。数据集为gss,变量为5级“低级”“工人级”“中级”“上级”“无级”和NA

如果我跑,

'data.frame':   57061 obs. of  1 variable:
$ class: Factor w/ 5 levels "Lower Class",..: 3 3 2 3 2 3 3 2 2 2 ...

它给了我

gss <- gss %>%
mutate(filteredclass = ifelse(class == "Lower Class", "Lower Class", 
ifelse(class == "Working Class", "Working Class", ifelse(class == "Middle 
Class", "Middle Class", ifelse(class == "Upper Class", "Upper Class", NA)))))

由于我只对那些指定经济舱的人感兴趣,我想取出“无级别”级别和NA。我不知道有什么更好的办法,所以我做了

with (gss, table(filteredclass))

然后,我试着看看它是否有效,所以我跑了:

filteredclass
Lower Class  Middle Class   Upper Class Working Class 
     3147         24289          1741         24458

然后给了我如下的混合顺序:

with (gss, table(class))
class
Lower Class Working Class  Middle Class   Upper Class 
     3147         24458         24289          1741 
 No Class 
        1 

我希望新变量filteredclass显示为与变量'class'相同的顺序。因为如果我对变量'class'做同样的事情,它会给我:

{{1}}

有什么方法可以解决这个问题吗?或者,有没有什么方法可以在不通过我上面做的mutate命令的情况下取出No Class级别?

提前感谢您的帮助!

3 个答案:

答案 0 :(得分:0)

将来,如果您提供reproducible example,则会更容易。

如果你想摆脱&#34; No Class&#34;你可以使用filter

gss <- gss %>% 
  filter(class != "No Class") %>%
  droplevels()

要删除NAs,请使用

gss <- na.omit(gss)

答案 1 :(得分:0)

最简单的方法可以在课堂上factor

gss$filteredclass <- factor(gss$class, c("Lower Class", "Working Class",
                             "Middle Class", "Upper Class"))

这将省略“No class”并将其设置为NA

答案 2 :(得分:0)

您必须使用与gss$class相同的顺序重新调整因子。 为此,您可以在mutate()语句中添加另一行,在该语句中创建具有相同级别的因子并删除未使用的级别(无级别)。

library(tidyverse)
# Generate the data you showed
gss <- data.frame(class = factor(sample(c("Lower Class",  "Working Class",  "Middle Class",    "Upper Class", NA, "No Class"), 
                                        45000, replace = TRUE))) %>%
  mutate(class = factor(class, levels = c("Lower Class",  "Working Class",  "Middle Class",    "Upper Class", "No Class", NA)))

# Sampled data
with(gss, table(class, useNA = "always"))

# Mutate gss the way you did it
gss <-  gss %>%
  mutate(filteredclass = ifelse(class == "Lower Class", "Lower Class", 
                                ifelse(class == "Working Class", "Working Class",
                                       ifelse(class == "Middle Class", "Middle Class", 
                                              ifelse(class == "Upper Class", "Upper Class", NA)))),
         # Then make filteredclass into a factor with the same levels as class
         # Use droplevels() to remove unused classes (since we removed the No Class)
         filteredclass = droplevels(factor(filteredclass, levels = levels(class))))

with(gss, table(class))
with(gss, table(filteredclass))

输出就是这个,

> with(gss, table(class, useNA = "always"))
class
  Lower Class Working Class  Middle Class   Upper Class      No Class 
         7362          7469          7626          7450          7457 
         <NA> 
         7636 

> with(gss, table(class))
class
  Lower Class Working Class  Middle Class   Upper Class      No Class 
         7362          7469          7626          7450          7457 

> with(gss, table(filteredclass))
filteredclass
  Lower Class Working Class  Middle Class   Upper Class 
         7362          7469          7626          7450 

更快捷的方法是使用droplevels()代替ifelse()语句链

# Filter/remove obs where class is No Class or NA
with(gss %>% mutate(filteredclass = droplevels(class, exclude = c(NA, "No Class"))),
     table(filteredclass))


filteredclass
  Lower Class Working Class  Middle Class   Upper Class 
         7362          7469          7626          7450