了解geom_histogram的行为

时间:2016-08-12 20:02:00

标签: r ggplot2

我注意到来自geom_histogram的{​​{1}}的一些奇怪行为。它似乎省略了一个酒吧,我无法弄清楚原因。这是一个例子:

ggplot2

最后一行中的直方图似乎排除了最高类别中var1的值: enter image description here

奇怪的是它难以复制。如果我手动制作变量"",显示正确的条形图,但我怀疑它与有效数字有关:

> # show the data
> head(df)
  other_variable  variable
1              0  3.663562
2              0  3.663562
3              0  3.663562
4              0  3.663562
5              0 -3.663562
6              1 -3.663562
> 
> # select 25 random rows
> set.seed(1)
> var1 <- df[runif(25,0,nrow(df)),]$variable
> 
> # display the data
> var1
 [1] -3.6635616  3.6635616  3.6635616  3.6635616 -3.6635616 -0.8001193
 [7]  3.6635616  3.6635616  3.6635616  3.6635616 -3.6635616  3.6635616
[13]  3.6635616  3.6635616  3.6635616  3.6635616  3.6635616  3.6635616
[19]  3.6635616  3.6635616  3.6635616  3.6635616  3.6635616 -1.2950457
[25] -3.6635616
> 
> # histogram of var1 doesn't include values = 3.6635616
> ggplot(data=NULL, aes(x=var1)) + geom_histogram()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

这里是相应的(正确的)直方图: enter image description here

它似乎也与箱子的数量有关。如果我修补它们,我可以让它出现:

> # make a new vector with the same data
> var2 <- c(
+ -3.6635616, 3.6635616, 3.6635616, 3.6635616, -3.6635616, -0.8001193, 
+  3.6635616, 3.6635616, 3.6635616, 3.6635616, -3.6635616, 3.6635616, 
+  3.6635616, 3.6635616, 3.6635616, 3.6635616, 3.6635616, 3.6635616, 
+  3.6635616, 3.6635616, 3.6635616, 3.6635616, 3.6635616, -1.2950457, 
+ -3.6635616
+ )
> 
> # confirm that they're equal
> all.equal(var1, var2)
[1] TRUE
> 
> # something suspicious
> var1[1]==var2[1]
[1] FALSE
> 
> # histogram of var2 does include values = 3.6635616
> ggplot(data=NULL, aes(x=var2)) + geom_histogram()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

发生了什么?

修改

添加更多信息以尝试使其可重现。

> # if I mess with the bin number I can get it to show up
> ggplot(data=NULL, aes(x=var1)) + geom_histogram(bins=40) # no 
> ggplot(data=NULL, aes(x=var1)) + geom_histogram(bins=41) # yes

有趣的是,即使> dput(var1) c(-3.66356164612965, 3.66356164612965, 3.66356164612965, 3.66356164612965, -3.66356164612965, -0.800119300112113, 3.66356164612965, 3.66356164612965, 3.66356164612965, 3.66356164612965, -3.66356164612965, 3.66356164612965, 3.66356164612965, 3.66356164612965, 3.66356164612965, 3.66356164612965, 3.66356164612965, 3.66356164612965, 3.66356164612965, 3.66356164612965, 3.66356164612965, 3.66356164612965, 3.66356164612965, -1.29504568965475, -3.66356164612965) > sprintf("%a",var1) [1] "-0x1.d4ef968880dd4p+1" "0x1.d4ef968880dd4p+1" "0x1.d4ef968880dd4p+1" [4] "0x1.d4ef968880dd4p+1" "-0x1.d4ef968880dd4p+1" "-0x1.99a93ca5c286dp-1" [7] "0x1.d4ef968880dd4p+1" "0x1.d4ef968880dd4p+1" "0x1.d4ef968880dd4p+1" [10] "0x1.d4ef968880dd4p+1" "-0x1.d4ef968880dd4p+1" "0x1.d4ef968880dd4p+1" [13] "0x1.d4ef968880dd4p+1" "0x1.d4ef968880dd4p+1" "0x1.d4ef968880dd4p+1" [16] "0x1.d4ef968880dd4p+1" "0x1.d4ef968880dd4p+1" "0x1.d4ef968880dd4p+1" [19] "0x1.d4ef968880dd4p+1" "0x1.d4ef968880dd4p+1" "0x1.d4ef968880dd4p+1" [22] "0x1.d4ef968880dd4p+1" "0x1.d4ef968880dd4p+1" "-0x1.4b881d43e494fp+0" [25] "-0x1.d4ef968880dd4p+1" 也没有重现这个问题:

dput

enter image description here

0 个答案:

没有答案