geom_density返回的观测值超出预期

时间:2018-09-18 14:03:08

标签: r ggplot2

我正在使用geom_density中的ggplot2在一个图中绘制多个密度曲线。我正在使用三个不同变量的数据框,每个变量有100个观察值。当我绘制其中两个变量时,一切似乎都不错,但是对于第三个变量,密度超过400的结果出乎意料。

这是数据的代码:

ad <- c(-0.0132492114254477, -0.0131566406997403, -0.0124505699056991, -0.0115071942052754, -0.0137753259532595, -0.0123873418067515, -0.013484307776411, -0.0134860926609266, -0.0126213557468908, -0.0125706300396337, -0.0130604154708213, -0.0128227278939455, -0.0115426841601749, -0.0122782889162225, -0.013070774907749, -0.0119269454694547, -0.0116610578105781, -0.0121781467814678, -0.0124721634549679, -0.012449585895859, -0.0119129861965286, -0.0127578461117945, -0.0128044526445264, -0.013716807434741, -0.0112243706437065, -0.0116435861691951, -0.0114757004236708, -0.0127175755090884, -0.0116204482711493, -0.0130377477108104, -0.0137735602022686, -0.0115581604482711, -0.012729930299303, -0.0112369577695777, -0.0109428317616508, -0.0117127921279212, -0.0115321825884927, -0.0119841820418205, -0.0130280606806068, -0.0135132485991527, -0.0115461937952712, -0.0119339866065326, -0.011019811398114, -0.0129747054803881, -0.0121079158124913, -0.0128866529998634, -0.0121608692086921, -0.0114331529315293, -0.0119070302036353, -0.0119004100041, -0.0117581221812217, -0.011107114937816, -0.0131571764384311, -0.0141545086784201, -0.0100181331146644, -0.0119012190788575, -0.0115824982916497, -0.0113907448407818, -0.0133925816591499, -0.0127234057673909, -0.0131873199398661, -0.0132453409867432, -0.010473172065054, -0.0122787289872899, -0.0118153122864562, -0.0110454803881372, -0.0126237939046056, -0.012450955309553, -0.0121033155664889, -0.0115688861555282, -0.0143594615279486, -0.0119171873718737, -0.0123140139401394, -0.0131844881782151, -0.0107496569632364, -0.0126211343446768, -0.0115844608446084, -0.0114007844745114, -0.0128332786661199, -0.0128161158944922, -0.0114647013803472, -0.011756602432691, -0.0128521142544759, -0.0108213858138581, -0.0125040645073117, -0.0124875495421622, -0.0117613284132842, -0.0127021347546809, -0.0118033675003416, -0.0119659368593686, -0.0116807571409046, -0.0125886674866749, -0.0134783763837637, -0.0127761268279349, -0.0131142927429275, -0.0119841902419024, -0.0124082930162635, -0.0117776711767118, -0.0103475632089655, -0.0117088369550362)    

jv <- c(-0.0482115384615385,0.0157269230769231,-0.0738038461538462,0.0211679487179487,-0.0435153846153846,-0.123296153846154,-0.0276717948717949,0.0533141025641026,0.0181576923076923,0.0129294871794872,-0.0384320512820513,0.0192589743589744,-0.0173948717948718,-0.0714230769230769,-0.0332628205128205,-0.0706025641025641,0.0366705128205128,0.0291115384615385,-0.0759076923076923,0.00654615384615385,-0.00717435897435898,-0.0177871794871795,0.101819230769231,0.0550935897435897,0.0267064102564103,-0.0546858974358974,-0.0297051282051282,-0.00357179487179487,-0.0270423076923077,-0.0272679487179487,0.0187871794871795,-0.0283602564102564,-0.0277012820512821,-0.105816666666667,0.0205679487179487,-0.0592487179487179,0.0306692307692308,-0.0260294871794872,0.00484615384615385,0.00461666666666667,-0.00527307692307692,-0.0263,-0.0303576923076923,0.0370576923076923,-0.0291346153846154,-0.0259294871794872,-0.0230320512820513,-0.0300089743589744,-0.0328589743589744,0.000247435897435898,-0.0256371794871795,-0.00738333333333333,-0.00796410256410257,0.00740000000000001,0.0251282051282051,-0.0435948717948718,0.0045474358974359,-0.0328589743589744,-0.028224358974359,-0.0188525641025641,-0.0164871794871795,-0.0456153846153846,-0.0882666666666667,0.0340987179487179,-0.0272166666666667,0.0326153846153846,-0.0682730769230769,-0.0203346153846154,-0.0712448717948718,0.0139166666666667,-0.00764487179487179,0.0173282051282051,-0.0299807692307692,0.0117282051282051,0.0266089743589744,-0.0869025641025641,-0.0227051282051282,0.053675641025641,0.0453115384615385,-0.00631794871794872,-0.0243923076923077,0.000192307692307693,-0.0350705128205128,-0.0226307692307692,0.019925641025641,-0.0162,-0.00284615384615385,0.0322615384615385,-0.024424358974359,-0.0704871794871795,-0.00747564102564103,-0.0441782051282051,0.0897589743589744,-0.00944871794871795,0.0320948717948718,-0.00680512820512821,-0.0837705128205128,-0.0299435897435897,-0.0639474358974359,0.0137384615384615)

all <- c(-0.0307303749434931,0.0012851411885914,-0.0431272080297726,0.00483037725633668,-0.0286453552843221,-0.0678417478264527,-0.0205780513241029,0.019914004951588,0.00276816828040077,0.000179428569926724,-0.0257462333764363,0.00321812323251441,-0.0144687779775233,-0.0418506829196497,-0.0231667977102848,-0.0412647547860094,0.0125047275049673,0.00846669584003534,-0.0441899278813301,-0.0029517160248526,-0.00954367258544381,-0.015272512799487,0.0445073890623522,0.0206883911544244,0.00774101980635189,-0.0331647418025463,-0.0205904143143995,-0.00814468519044164,-0.0193313779817285,-0.0201528482143795,0.00250680964245544,-0.0199592084292638,-0.0202156061752925,-0.0585268122181222,0.00481255847814894,-0.0354807550383196,0.00956852409036905,-0.0190068346106538,-0.00409095341722648,-0.00444829096624301,-0.00840963535917408,-0.0191169933032663,-0.0206887518529032,0.0120414934136521,-0.0206212655985533,-0.0194080700896753,-0.0175964602453717,-0.0207210636452518,-0.0223830022813048,-0.00582648705333207,-0.0186976508342006,-0.00924522413557467,-0.0105606395012668,-0.00337725433921004,0.00755503600677036,-0.0277480454368647,-0.0035175311971069,-0.0221248595998781,-0.0208084703167544,-0.0157879849349775,-0.0148372497135228,-0.0294303628010639,-0.0493699193658603,0.010909994480714,-0.0195159894765614,0.0107849521136237,-0.0404484354138413,-0.0163927853470842,-0.0416740936806804,0.00117389025556923,-0.0110021666614102,0.00270550887816572,-0.0211473915854543,-0.000728141525005001,0.007929658697869,-0.0497618492236205,-0.0171447945248683,0.0211374282755648,0.0162391298977093,-0.00956703230622048,-0.0179285045363275,-0.00578214737019165,-0.0239613135374943,-0.0167260775223137,0.00371078825916465,-0.0143437747710811,-0.00730374112971901,0.00977970185342875,-0.0181138632373503,-0.041226558173274,-0.00957819908327283,-0.02838343630744,0.0381402989876053,-0.0111124223883264,0.00949028952597218,-0.00939465922351531,-0.0480894029183882,-0.0208606304601508,-0.0371474995532007,0.00101481229171268)

wang <- c(0.2383,-0.0022,-0.1754,0.0201,-0.2122,-0.2433,-0.0417,-0.087,-0.1733,-0.0926,0.0108,0.1159,0.0116,-0.0188,-0.0521,0.0927,-0.029,-0.1382,-0.1039,-0.1547,0.178,0.1101,0.008,-0.0127,0.0442,0.0036,0.0718,0.0529,-0.0873,-0.4223,-0.016,0.1449,0.1787,0.2187,0.132,0.0556,-0.1027,0.2228,-0.305,-0.1352,0.0763,0.0236,0.2504,-0.046,0.1139,-0.1191,0.0101,0.0876,-0.1283,0.0761,0.1044,-0.0583,0.0929,-0.0966,-0.0196,0.1311,0.0329,-0.2297,0.0595,-0.3032,-0.0741,0.2044,0.0406,0.0533,0.0826,0.0035,-0.0818,-0.0747,-0.218,-9e-04,0.0666,-0.0916,-0.0613,-0.2477,-0.0238,0.1959,-0.3,0.069)

# data frames
df <- data.frame(ad = ad, jv = jv, all = all)

wang <- data.frame(wang = wang)

当我使用以下函数绘制df$all时,一切正常。 这是我得到的图,其预期密度值考虑了100个观察值。

enter image description here

ggplot() + 
geom_density(aes(x = wang, colour = 'observed'), wang, size = 1) +
   geom_density(aes(x = jv, colour = 'expected within'), df, size = 1) +
   geom_density(aes(x = all, colour = 'expected adults'), df, size = 1) + #df$all in this line 
   geom_vline(aes(xintercept = mean(wang$wang), 
                  colour = 'observed mean')) +
   scale_colour_manual("", values = c('observed' = "dodgerblue2", 
                                      'expected within' = "darkgoldenrod2",
                                      'expected adults' = 'darkolivegreen4',
                                      'observed mean' = 'red')) + 
   scale_x_continuous(expand = c(0, 0), limits = c(-0.3, 0.2)) + 
   scale_y_continuous(expand = c(0, 0))

但是,当使用df$ad而不是第四df$all中的geom_density时,我得到的该图的密度值远高于观察次数

enter image description here

ggplot() + 
   geom_density(aes(x = wang, colour = 'observed'), wang, size = 1) +
   geom_density(aes(x = jv, colour = 'expected within'), df, size = 1) +
   geom_density(aes(x = ad, colour = 'expected adults'), df, size = 1) + #df$ad in this line 
   geom_vline(aes(xintercept = mean(wang$wang), 
                  colour = 'observed mean')) +
   scale_colour_manual("", values = c('observed' = "dodgerblue2", 
                                      'expected within' = "darkgoldenrod2",
                                      'expected adults' = 'darkolivegreen4',
                                      'observed mean' = 'red')) + 
   scale_x_continuous(expand = c(0, 0), limits = c(-0.3, 0.2)) + 
   scale_y_continuous(expand = c(0, 0))

然后我绘制了df$ad的直方图以及密度图(下面的代码),这就是我得到的

enter image description here

ggplot() + 
   geom_density(aes(x = ad, colour = 'density'), df) + 
   geom_histogram(aes(x = ad), df)

为什么当我只有100个观测值时绘制df$ad的密度时,为什么会得到如此高的值,如直方图所示?以及为什么在绘制df$all时不会发生这种情况?

谢谢

1 个答案:

答案 0 :(得分:3)

因为'geom_density'绘制了密度估计值,而'geom_hist'给出了落入不同bin中的数据计数。 y轴上“ geom_density”的单位不计算在内。有关密度估算实际含义的更多信息,请参见Can a probability distribution value exceeding 1 be OK?

您的ad变量的变量要少得多,相对于all的标准差(sd)为0.019,标准偏差为0.00085。

如果要使所有变量?geom_density的密度相同,请看帮助aes(x = ad, colour = 'expected adults', y=..scaled..)中的缩放比例为1。无论哪种方式,“ geom_density”都可以正确显示数据,尽管您可能想探索直方图是否不是显示数据分布的更好方法。