Question

在考虑了一段时间之后，我现在的想法有点分散了一个子设置数据框wnd的过程，它有一个变量列ORIGIN（类：因子）。 / p>

a = sort(table(wnd$ORIGIN), decreasing=T)[1:20]
a

ATL    ORD    DFW    DEN    LAX    IAH    PHX    SFO    CLT..
123915  94422  90184  70970  69298  58850  57316  52702  44234..

# a is a table 20 factors of interest (highest volume).

b = names(a) 
b
[1] "ATL" "ORD" "DFW" "DEN" "LAX" "IAH" "PHX" "SFO" "CLT" "LAS" "DTW" "EWR" "MSP"
[14] "MCO" "SLC" "JFK" "BOS" "BWI" "LGA" "SEA"
#b pulls out the names of the airport i require in my subset

然后我想创建一个新的数据框，其中只有b中的这些因素（即子集）。对于一个他们不属于同一类：

> class(b)
[1] "character"

> class(wnd$ORIGIN)
[1] "factor

我尝试过几个不同的东西（as.factor(b)，wnd$ORIGIN==b等），但现在我的困惑正在增长，并且可能希望有人解释思考这个问题的正确方法。

Answer 1

data.frame默认情况下将字符串转换为因子。

data.frame(origin=b, count=unname(a))
  origin count
1    DFW     8
2    ATL     6
3    ORD     3

由于unname的输出，

a会从table中删除名称属性。

数据

set.seed(111) a <- c("ATL", "ORD", "DFW", "DEN", "LAX") wnd <- data.frame(ORIGIN=sample(x,20,T)) a <- sort(table(wnd$ORIGIN), decreasing=T)[1:3] b <- names(a)

用所选因子对数据帧进行子集化

1 个答案: