如何从R中的向量中提取和求和特定数字

时间:2014-05-23 15:41:33

标签: r for-loop

我有一个66个数字的矢量,称为a,每个都有一个名字:

    Age0_i0     Age1_i0     Age1_i1     Age2_i0     Age2_i1     Age2_i2     Age3_i0         Age3_i1     Age3_i2 
1000000.000  680000.000  170000.000  462400.000  115600.000  144500.000  314432.000       78608.000   98260.000 
    Age3_i3     Age4_i0     Age4_i1     Age4_i2     Age4_i3     Age4_i4     Age5_i0       Age5_i1     Age5_i2 
 122825.000  213813.760   53453.440   66816.800   83521.000  104401.250  145393.357      36348.339   45435.424 
    Age5_i3     Age5_i4     Age5_i5     Age6_i0     Age6_i1     Age6_i2     Age6_i3        Age6_i4     Age6_i5 
  56794.280   70992.850   88741.062   98867.483   24716.871   30896.088   38620.110    48275.138   60343.922 
    Age6_i6     Age7_i0     Age7_i1     Age7_i2     Age7_i3     Age7_i4     Age7_i5      Age7_i6     Age7_i7 
  75429.903   67229.888   16807.472   21009.340   26261.675   32827.094   41033.867   51292.334   64115.418 
    Age8_i0     Age8_i1     Age8_i2     Age8_i3     Age8_i4     Age8_i5     Age8_i6     Age8_i7     Age8_i8 
  45716.324   11429.081   14286.351   17857.939   22322.424   27903.030   34878.787   43598.484   54498.105 
    Age9_i0     Age9_i1     Age9_i2     Age9_i3     Age9_i4     Age9_i5     Age9_i6     Age9_i7     Age9_i8 
  31087.100    7771.775    9714.719   12143.399   15179.248   18974.060   23717.575   29646.969   37058.711 
    Age9_i9    Age10_i0    Age10_i1    Age10_i2    Age10_i3    Age10_i4    Age10_i5    Age10_i6    Age10_i7 
  46323.389   21139.228    5284.807    6606.009    8257.511   10321.889   12902.361   16127.951   20159.939 
   Age10_i8    Age10_i9   Age10_i10 
  25199.924   31499.905   39374.881 

我想生成一些这些向量的总和列表。具体来说,我想将Age3_i3,Age3_i4 ......到Age3_i10的所有Age3s相加。然后所有Age4s从_i3到_i10和Age5s _i3到_i10一直到Age10 _i3到_i10。我想在这样的循环中做到这一点:

  x <- 10

for (i in 3:x){
  for (j in 3:i){

s <- sum(a[paste0("Age",i,"_i",j)])

}}

取值

但它只给了我一个[66],a的最后一个值。理想情况下,它会给我一个8个总计的列表。

任何帮助表示赞赏!

EDIT ##

添加dput数据:

structure(c(1e+06, 680000, 170000, 462400, 115600, 144500, 314432, 
78608, 98260, 122825, 213813.76, 53453.44, 66816.8, 83521, 104401.25, 
145393.357, 36348.339, 45435.424, 56794.28, 70992.85, 88741.062, 
98867.483, 24716.871, 30896.088, 38620.11, 48275.138, 60343.922, 
75429.903, 67229.888, 16807.472, 21009.34, 26261.675, 32827.094, 
41033.867, 51292.334, 64115.418, 45716.324, 11429.081, 14286.351, 
17857.939, 22322.424, 27903.03, 34878.787, 43598.484, 54498.105, 
31087.1, 7771.775, 9714.719, 12143.399, 15179.248, 18974.06, 
23717.575, 29646.969, 37058.711, 46323.389, 21139.228, 5284.807, 
6606.009, 8257.511, 10321.889, 12902.361, 16127.951, 20159.939, 
25199.924, 31499.905, 39374.881), .Names = c("Age0_i0", "Age1_i0", 
"Age1_i1", "Age2_i0", "Age2_i1", "Age2_i2", "Age3_i0", "Age3_i1", 
"Age3_i2", "Age3_i3", "Age4_i0", "Age4_i1", "Age4_i2", "Age4_i3", 
"Age4_i4", "Age5_i0", "Age5_i1", "Age5_i2", "Age5_i3", "Age5_i4", 
"Age5_i5", "Age6_i0", "Age6_i1", "Age6_i2", "Age6_i3", "Age6_i4", 
"Age6_i5", "Age6_i6", "Age7_i0", "Age7_i1", "Age7_i2", "Age7_i3", 
"Age7_i4", "Age7_i5", "Age7_i6", "Age7_i7", "Age8_i0", "Age8_i1", 
"Age8_i2", "Age8_i3", "Age8_i4", "Age8_i5", "Age8_i6", "Age8_i7", 
"Age8_i8", "Age9_i0", "Age9_i1", "Age9_i2", "Age9_i3", "Age9_i4", 
"Age9_i5", "Age9_i6", "Age9_i7", "Age9_i8", "Age9_i9", "Age10_i0", 
"Age10_i1", "Age10_i2", "Age10_i3", "Age10_i4", "Age10_i5", "Age10_i6", 
"Age10_i7", "Age10_i8", "Age10_i9", "Age10_i10"))

2 个答案:

答案 0 :(得分:1)

我认为你可以使用聪明的grep

在一个循环中完成
nn <- names(y)
sapply (c(3,4,5) ,function(i) 
    sum(y[grep(paste0('Age',i,'_i10|Age',i,'_i','[3-9]'),nn)]))

[1] 122825.0 187922.2 216528.2

修改

此解决方案适用于任何范围(最小值,最大值)。它生成一个序列并使用na.rm参数来删除缺失值。效率较低(产生的数量超过需要),但总是有效并且不使用正则表达式。

sum_filter <- 
function(min=3,max=10)
sapply (c(3,4,5) ,function(i) 
  sum(y[paste0('Age',i,'_i',seq(min,max))],na.rm=T))

答案 1 :(得分:1)

构造您想要的名称,然后按它们进行子集化:

nm = expand.grid(age = 3:5, id = 3:10)
sum(y[paste0('Age', nm$age, '_i', nm$id)], na.rm = T)
#[1] 527275.4

如果你想要为每个年龄组想要这些总和,我会改为

library(data.table)

nm = CJ(age = 3:5, id = 3:10)
nm[, sum(y[paste0('Age', age, '_i', id)], na.rm = T), by = age]
#   age       V1
#1:   3 122825.0
#2:   4 187922.2
#3:   5 216528.2