如何在R中创建累积分布表?

时间:2014-03-14 16:29:13

标签: r

我使用ecdf绘制了速度的累积分布,但我也希望得到累积概率的输出,如下所示:

Speed  Cumulative Probability
40  0.20
45  0.45
55  0.51
60  0.70
70  0.90
80  1.00

对于我的数据,当我使用ecdf时,它会让我关注(请注意'cc'是我的原始数据框):

> ccf <- subset(cc, cc$svel>=55 & cc$Headway>=4)  
> cdf<-  ecdf(ccf$svel)
> cdf
Empirical CDF 
Call: ecdf(ccf$svel)
 x[1:356] =     55,  55.01,  55.02,  ...,  76.76,   76.8

如何获得上面示例中的表格?请注意,我尝试了'cumsum',但它只给出累积频率,而我需要累积概率。

修改

这是我的数据:

  

dput(CCF $ svel)   c(67.9,67.62,67.37,67.19,67.04,66.93,66.83,66.74,66.65,   66.55,66.46,66.36,66.25,66.12,65.97,61.12,61.2,61.29,   61.39,61.49,61.58,61.66,61.73,61.79,57.98,57.73,57.5,   57.29,57.1,56.92,56.75,56.59,56.45,56.32,56.19,58,58.18,   58.36,58.52,58.69,56.28,56.19,56.08,55.96,55.83,55.68,   55.52,55.34,55.15,58.58,58.89,59.17,59.4,59.58,55.01,   55.14,55.23,55.3,55.36,55.41,55.47,55.53,55.59,55.66,   55.74,55.83,55.92,56.03,56.16,56.3,56.44,56.58,56.71,   56.82,56.91,56.98,57.03,57.06,57.07,57.07,57.06,57.04,   57.02,55.05,55.22,55.39,55.56,55.73,55.92,56.11,56.31,   56.53,56.77,57.02,57.28,57.54,57.79,58,58.18,58.32,58.43,   58.5,58.56,58.6,58.64,58.68,58.73,58.8,58.86,58.92,58.97,   59.01,59.03,59.05,59.05,59.04,59.02,58.99,58.97,58.95,   55.1,55.39,55.68,55.97,56.24,56.48,56.68,56.82,56.9,   56.94,56.96,56.97,56.99,57.02,57.07,57.14,57.22,57.3,   57.37,57.41,57.45,57.48,57.51,57.56,57.62,57.69,57.77,   57.86,57.95,58.06,58.17,58.29,58.42,58.53,58.64,58.74,   58.83,58.91,58.98,55.01,55.08,55.15,55.22,55.3,55.37,   55.45,55.53,55.62,55.73,55.85,55.99,56.14,56.31,56.49,   56.67,56.87,57.05,57.22,57.37,57.51,57.65,57.79,57.95,   58.13,58.3,58.47,58.63,58.78,58.91,59.03,59.14,59.24,   59.34,59.43,59.53,59.62,59.72,59.81,59.9,59.98,60.07,   60.15,60.22,60.31,60.39,60.47,60.56,60.65,60.75,60.86,   60.98,61.11,61.24,61.39,61.54,61.71,61.89,62.09,62.31,   62.56,62.84,63.14,63.46,63.78,64.08,64.81,64.84,64.85,   64.87,64.89,64.92,64.94,64.97,65,65.02,65.04,65.07,65.11,   65.15,65.17,65.18,65.17,65.15,65.13,65.1,65.06,65.01,   64.96,64.9,64.84,64.79,64.76,55.04,55.15,55.25,55,55.23,   55.45,55.68,55.9,56.69,56.74,55,55,55,55,55,55.01,   55.26,55.51,55.77,56.02,56.28,56.56,56.84,57.13,57.42,   57.7,57.98,58.25,58.49,58.73,58.94,59.13,59.29,59.4,   59.48,59.5,59.48,59.42,59.31,59.17,59,58.8,58.6,58.38,   58.17,57.96,57.77,57.59,57.44,57.31,57.21,57.13,57.07,   57.04,57.03,57.04,57.07,57.11,57.18,57.26,57.34,57.43,   57.51,57.59,57.68,57.78,57.88,57.99,58.08,58.16,58.22,   58.27,58.3,58.31,58.31,58.3,58.27,58.25,58.22,58.18,   58.14,58.08,58.01,57.93,57.84,57.72,57.59,57.43,57.27,   57.1,56.93,56.77,56.63,56.5,56.38,56.28,56.19,56.12,   56.05,55.99,55.94,55.9,55.88,55.86,55.85,55.86,55.87,   55.89,55.9,55.91,55.91,55.88,55.84,55.78,55.71,55.63,   55.56,55.5,55.45,55.4,55.37,55.34,55.32,55.3,55.29,55.27,   55.26,55.26,55.25,55.25,55.26,55.26,55.27,55.28,55.29,   55.31,55.33,55.36,55.39,55.02,55.07,55.12,55.16,55.21,   55.26,55.31,55.04,55.21,55.38,55.54,55.71,55.88,56.05,   56.21,56.38,56.54,56.71,56.88,57.04,57.2,57.35,55.46,   55.59,55.74,55.92,56.11,56.32,56.54,56.77,57.02,57.28,   55.22,55.28,55.35,55.42,55.5,55.58,55.68,55.78,55.88,   56,55.15,55.45,55.72,55.94,56.11,56.22,56.29,56.33,56.36,   56.4,56.45,56.51,56.59,56.69,56.81,56.95,57.11,57.27,   57.44,57.61,57.78,57.95,58.12,58.29,58.46,58.63,58.79,   58.94,59.08,59.21,59.32,59.41,55.13,55.3,55.47,55.65,   55.83,56.02,56.22,56.43,56.66,56.9,55.17,56.02,56.11,   56.21,56.32,56.42,56.52,57.18,57.29,57.42,76.27,76.28,   76.3,76.33,76.37,76.41,76.47,76.54,76.62,76.7,76.76,   76.8,76.8,55.08,55.16,55.24,55.32,55.4,55.48,55.12,55.39,   55.67,55.94,56.21,56.47,56.72,56.97,57.19,57.4,57.58,   57.73,57.87,57.99,58.11)

1 个答案:

答案 0 :(得分:1)

这是一个可以执行此操作的函数:

cumprob <- function(y) {
  fun <- function(y, x) length(y[y<x])/length(y)
  prob<-sapply(y, fun, y=y)
  data<- data.frame(value=unique(y[order(y)]), prob=unique(prob[order(prob)]))
}

测试您的数据(此处我称之为data):

cp<-cumprob(data)
head(cp)
  value       prob
1 55.00 0.00000000
2 55.01 0.01156069
3 55.02 0.01734104
4 55.04 0.01926782
5 55.05 0.02312139
6 55.07 0.02504817

简介:

plot(cp)

enter image description here

我发现另一种非常方便的快捷方式是使用hist函数自动cut数据并获取中点。

将您的数据用作data

h <- hist(data)
cum.prob <- data.frame(value=h$mids, prob=cumsum(h$counts)/sum(h$counts))

这会给你:

   cum.prob
   value      prob
1     55 0.2793834
2     57 0.6319846
3     59 0.8285164
4     61 0.8786127
5     63 0.8921002
6     65 0.9479769
7     67 0.9749518
8     69 0.9749518
9     71 0.9749518
10    73 0.9749518
11    75 0.9749518
12    77 1.0000000