使用数据表分组后包括列

时间:2015-06-17 19:00:32

标签: r include data.table subset

我的目标是通过zip计算组%列。我通过zip创建了%列,但继续丢失我的组(' cgrp')变量。如何在最终结果中保留这一点?

我的数据表脚本给出了以下结果:

     zip       V1  
1: 12007 19.35484  
2: 12007 48.38710  
3: 12007 32.25806  
4: 12008 40.00000  
5: 12008 41.66667  
6: 12008 18.33333 

但我也想要包含cgrp列。一直在尝试.SDSDcols的不同组合,但无法让它发挥作用。这就是我想要的:

     zip       V1 cgrp  
1: 12007 19.35484 3  
2: 12007 48.38710 4  
3: 12007 32.25806 1  
4: 12008 40.00000 1  
5: 12008 41.66667 4  
6: 12008 18.33333 3 

脚本:

zip.grp <- ninefive[, .(zgrp = .N), by = .(cgrp,zip)  
                    ][, 100 *(zgrp/sum(zgrp)), by = zip]

示例九个数据:

    zip     lower avg    upper       SSN RISK idk diff  avgDiff cgrp  
 1: 12007 -170.3723 592 1354.372 127  676   1   84 137.2903    3  
 2: 12007 -170.3723 592 1354.372 064  828   1  236 137.2903    4  
 3: 12007 -170.3723 592 1354.372 080  627   1   35 137.2903    1  
 4: 12007 -170.3723 592 1354.372 057  770   1  178 137.2903    4  
 5: 12007 -170.3723 592 1354.372 014  770   1  178 137.2903    4  
 6: 12007 -170.3723 592 1354.372 084  893   1  301 137.2903    4  
 7: 12007 -170.3723 592 1354.372 105  757   1  165 137.2903    4  
 8: 12007 -170.3723 592 1354.372 093  494   1   98 137.2903    1  
 9: 12007 -170.3723 592 1354.372 080  744   1  152 137.2903    4  
10: 12007 -170.3723 592 1354.372 102  494   1   98 137.2903    1  
11: 12007 -170.3723 592 1354.372 062  748   1  156 137.2903    4  
12: 12007 -170.3723 592 1354.372 729  711   1  119 137.2903    3  
13: 12007 -170.3723 592 1354.372 059  677   1   85 137.2903    3  
14: 12007 -170.3723 592 1354.372 090  718   1  126 137.2903    3  
15: 12007 -170.3723 592 1354.372 053  636   1   44 137.2903    1  
16: 12007 -170.3723 592 1354.372 081  855   1  263 137.2903    4  
17: 12007 -170.3723 592 1354.372 073  811   1  219 137.2903    4  
18: 12007 -170.3723 592 1354.372 092  614   1   22 137.2903    1  
19: 12007 -170.3723 592 1354.372 081  789   1  197 137.2903    4  
20: 12007 -170.3723 592 1354.372 105  831   1  239 137.2903    4  
21: 12007 -170.3723 592 1354.372 108  809   1  217 137.2903    4  
22: 12007 -170.3723 592 1354.372 093  649   1   57 137.2903    1  
23: 12007 -170.3723 592 1354.372 128  685   1   93 137.2903    3  
24: 12007 -170.3723 592 1354.372 093  574   1   18 137.2903    1  
25: 12007 -170.3723 592 1354.372 119  640   1   48 137.2903    1  
26: 12007 -170.3723 592 1354.372 163  813   1  221 137.2903    4  
27: 12007 -170.3723 592 1354.372 062  678   1   86 137.2903    3  
28: 12007 -170.3723 592 1354.372 102  652   1   60 137.2903    1  
29: 12007 -170.3723 592 1354.372 379  532   1   60 137.2903    1  
30: 12007 -170.3723 592 1354.372 107  803   1  211 137.2903    4  
31: 12007 -170.3723 592 1354.372 060  782   1  190 137.2903    4  
32: 12008 -262.0840 729 1720.084 110  547   1  182 104.8667    1  
33: 12008 -262.0840 729 1720.084 023  821   1   92 104.8667    4  
34: 12008 -262.0840 729 1720.084 072  649   1   80 104.8667    1  
35: 12008 -262.0840 729 1720.084 119  602   1  127 104.8667    1  
36: 12008 -262.0840 729 1720.084 076  553   1  176 104.8667    1  
37: 12008 -262.0840 729 1720.084 083  606   1  123 104.8667    1  
38: 12008 -262.0840 729 1720.084 124  645   1   84 104.8667    1  
39: 12008 -262.0840 729 1720.084 086  700   1   29 104.8667    3  
40: 12008 -262.0840 729 1720.084 063  579   1  150 104.8667    1  
41: 12008 -262.0840 729 1720.084 086  746   1   17 104.8667    4  
42: 12008 -262.0840 729 1720.084 075  732   1    3 104.8667    4  
43: 12008 -262.0840 729 1720.084 082  656   1   73 104.8667    1  
44: 12008 -262.0840 729 1720.084 057  515   1  214 104.8667    1  
45: 12008 -262.0840 729 1720.084 068  806   1   77 104.8667    4  
46: 12008 -262.0840 729 1720.084 103  797   1   68 104.8667    4  
47: 12008 -262.0840 729 1720.084 110  578   1  151 104.8667    1  
48: 12008 -262.0840 729 1720.084 102  709   1   20 104.8667    3  
49: 12008 -262.0840 729 1720.084 565  567   1  162 104.8667    1  
50: 12008 -262.0840 729 1720.084 037  886   1  157 104.8667    4  

1 个答案:

答案 0 :(得分:5)

您可以使用:=创建新列

ninefive[, .(zgrp=.N), by= .(cgrp, zip)][, V1:=100*(zgrp/sum(zgrp)), 
              by=zip][, zgrp:=NULL]
#    cgrp   zip       V1
#1:    3 12007 19.35484
#2:    4 12007 48.38710
#3:    1 12007 32.25806
#4:    1 12008 57.89474
#5:    4 12008 31.57895
#6:    3 12008 10.52632

或者@Frank评论说,您可以在cgrp

中加入list
ninefive[, .(zgrp=.N), by= .(cgrp, zip)][, list(cgrp,V1=100*
                 (zgrp/sum(zgrp))), by=zip]
#      zip cgrp       V1
#1: 12007    3 19.35484
#2: 12007    4 48.38710
#3: 12007    1 32.25806
#4: 12008    1 57.89474
#5: 12008    4 31.57895
#6: 12008    3 10.52632