以下代码行之间有什么区别?

时间:2016-01-15 21:33:04

标签: r levels

这是代码

levels(data[,7])           ## to output the levels of a column in a vector
levels(data[,7])[data[,7]] ## this is what I am not 100% sure what does it do

我认为第二个只是给出了一个非重复值的向量(就我得到的而言)。任何澄清将不胜感激。

2 个答案:

答案 0 :(得分:0)

第一行显示数据[,7]中因子变量的级别 - 即该因子的唯一值。

第二行使用data [,7]中的值来索引唯一级别。在这种情况下,只提供数据[,7]。

它是一个有用的构造,如果不是水平,你有一些像你想要用于绘图中不同点的颜色矢量。

> levels(data[,2])[data[,2]]
 [1] "a" "b" "b" "b" "c" "b" "a" "a" "b" "b" "c" "b" "a" "c" "a" "c" "a" "a" "a" "a"
> c("red", "blue", "green")[data[,2]]
 [1] "red"   "blue"  "blue"  "blue"  "green" "blue"  "red"   "red"   "blue"  "blue" 
[11] "green" "blue"  "red"   "green" "red"   "green" "red"   "red"       "red"   "red" 

答案 1 :(得分:0)

levels是一个提供对变量的levels属性的访问的函数。这基本上是独特的障碍。如下例所示

df <- data.frame(websites = c("git", "git", "python", "R", "python", 
"stackoverflow", "R"))
df
       websites
1           git
2           git
3        python
4             R
5        python
6 stackoverflow
7             R

str(df)

'data.frame':   7 obs. of  1 variable:
$ websites: Factor w/ 4 levels "git","python",..: 1 1 2 3 2 4 3

levels(df[,1]) # this basically gives you the unique levels( or obs)in the variable.
#if you want to replace specific observations the easy way to do that is 
`levels(df[,1]) <- c("git", "veg", "R", "drink")`

现在,

(df[,1])[df[,1]]
# this is something like accessing values by passing the index like this..
`df[,1][1:6]`# in the prior case you are passing the names itself.
# hence, when you call function levels you are basically calling all the 
# obs, and the levels of that variable
[1] git           git           python        R             python        stackoverflow
[7] R            
Levels: git python R stackoverflow