字符向量的累积计数

时间:2018-01-31 12:20:12

标签: r dplyr

我想从数据框中累计计算国家/地区名称:

df <- data.frame(country = c("Sweden", "Germany", "Sweden", "Sweden", "Germany",
                             "Vietnam"), year= c(1834, 1846, 1847, 1852, 1860, 1865))

我尝试了不同版本的count(),cumsum()和tally(),但似乎无法正确使用。

输出应如下所示:

country year n
Sweden  1834 1
Germany 1846 2
Sweden  1847 2
Sweden  1852 2
Germany 1860 2
Vietnam 1865 3

2 个答案:

答案 0 :(得分:0)

你可以试试这个:

library(ggplot2)
library(plyr)
df<-data.frame(country=c("Sweden","Germany","Sweden","Sweden","Germany","Vietnam", "Germany"),year= c(1834,1846,1847,1852,1860,1865,1860))
counts <- ddply(df, .(df$country, df$year), nrow)

输出结果为:

> counts
     df$country df$year V1
1    Germany    1846  1
2    Germany    1860  2
3     Sweden    1834  1
4     Sweden    1847  1
5     Sweden    1852  1
6    Vietnam    1865  1

答案 1 :(得分:0)

df %>% mutate(count = cumsum(!duplicated(.$country))) %>% as_tibble()
#> # A tibble: 6 x 3
#>   country  year count
#>    <fctr> <dbl> <int>
#> 1  Sweden  1834     1
#> 2 Germany  1846     2
#> 3  Sweden  1847     2
#> 4  Sweden  1852     2
#> 5 Germany  1860     2
#> 6 Vietnam  1865     3    

或     dist_cum&lt; - function(var)       sapply(seq_along(var),function(x)length(unique(head(var,x))))

df %>% mutate(var2=dist_cum(country))
#>   country year var2
#> 1  Sweden 1834    1
#> 2 Germany 1846    2
#> 3  Sweden 1847    2
#> 4  Sweden 1852    2
#> 5 Germany 1860    2
#> 6 Vietnam 1865    3
相关问题