R:两个条件的列求和

时间:2018-09-09 14:01:28

标签: r apply netcdf sumifs

我有两个数据表。第一个表是具有坐标和降水量的矩阵。它由四列组成,分别是纬度,经度,降水量和监测日期。该表的示例为:

latitude_1 longitude_1 precipitation day_mon 54.17 62.15 5 34 69.61 48.65 3 62 73.48 90.16 7 96 66.92 90.27 7 19 56.19 96.46 9 25 72.23 74.18 5 81 88.00 95.20 7 97 92.44 44.41 6 18 95.83 52.91 9 88 99.68 96.23 8 6 81.91 48.32 8 96 54.66 52.70 0 62 95.31 91.82 2 84 60.32 96.25 9 71 97.39 47.91 7 76 65.21 44.63 9 3 第二个表由5列组成:站号,经度,纬度,开始监视的日期,结束监视的日期。看起来像:

station latitude_2 longitude_2 day_begin day_end 15 50.00 93.22 34 46 11 86.58 85.29 15 47 14 93.17 63.17 31 97 10 88.56 61.28 15 78 13 45.29 77.10 24 79 6 69.73 99.52 13 73 4 45.60 77.36 28 95 13 92.88 62.38 9 51 1 65.10 64.13 7 69 10 60.57 86.77 34 64 3 53.62 60.76 23 96 16 87.82 59.41 38 47 1 47.83 95.89 21 52 11 75.42 46.20 38 87 3 55.71 55.26 2 73 16 71.65 96.15 36 93

我想对1个表的降水量求和。但是我有两个条件:

  1. day_begin 。监视的天数(从1个表开始的day_mon)应小于结束日期且大于开始的天数(从2个表开始)

  2. 从最接近的点开始的总降水量。监视点之间的距离(坐标包括 经度_1和纬度_1)和测站(坐标由 经度_2和纬度_2)应最小。距离由以下公式计算: R = 6400*arccos(sin(latitude_1)*sin(latitude_2)+cos(latitude_1)*cos(latitude_2))*cos(longitude_1-longitude_2))

最后我想得到结果如表: station latitude_2 longitude_2 day_begin day_end Sum 15 50 93.22 34 46 188 11 86.58 85.29 15 47 100 14 93.17 63.17 31 97 116 10 88.56 61.28 15 78 182 13 45.29 77.1 24 79 136 6 69.73 99.52 13 73 126 4 45.6 77.36 28 95 108 13 92.88 62.38 9 51 192 1 65.1 64.13 7 69 125 10 60.57 86.77 34 64 172 3 53.62 60.76 23 96 193 16 87.82 59.41 38 47 183 1 47.83 95.89 21 52 104 11 75.42 46.2 38 87 151 3 55.71 55.26 2 73 111 16 71.65 96.15 36 93 146 我知道如何用C ++计算它。我应该在R中使用什么功能? 谢谢您的帮助!

1 个答案:

答案 0 :(得分:0)

我不确定我是否能正确解决您的问题...但是到了。

我使用了data.table方法。

library( tidyverse )
library( data.table )


#step 1. join days as periods
#create a dummy variables to create a virtual period in dt1
dt1[, point_id := .I]
dt1[, day_begin := day_mon]
dt1[, day_end := day_mon]
setkey(dt2, day_begin, day_end)
#overlap join finding all stations for each point that overlap periods
dt <- foverlaps( dt1, dt2, type = "within" )

#step 2. calculate the distance station for each point based on TS-privided formula
dt[, distance := 6400 * acos( sin( latitude_1 ) * sin( latitude_2 ) + cos( latitude_1 ) * cos( latitude_2 ) ) * cos( longitude_1 - longitude_2 ) ]

#step 3. filter (absolute) minimal distance based on point_id
dt[ , .SD[which.min( abs( distance ) )], by = point_id ]

#    point_id station latitude_2 longitude_2 day_begin day_end latitude_1 longitude_1 precipitation day_mon i.day_begin i.day_end    distance
# 1:        1       1      47.83       95.89        21      52      54.17       62.15             5      34          34        34  -248.72398
# 2:        2       6      69.73       99.52        13      73      69.61       48.65             3      62          62        62   631.89228
# 3:        3      14      93.17       63.17        31      97      73.48       90.16             7      96          96        96 -1519.84886
# 4:        4      11      86.58       85.29        15      47      66.92       90.27             7      19          19        19  1371.54757
# 5:        5      11      86.58       85.29        15      47      56.19       96.46             9      25          25        25  1139.46849
# 6:        6      14      93.17       63.17        31      97      72.23       74.18             5      81          81        81   192.99264
# 7:        7      14      93.17       63.17        31      97      88.00       95.20             7      97          97        97  5822.81529
# 8:        8       3      55.71       55.26         2      73      92.44       44.41             6      18          18        18  -899.71206
# 9:        9       3      53.62       60.76        23      96      95.83       52.91             9      88          88        88    45.16237
# 10:       10       3      55.71       55.26         2      73      99.68       96.23             8       6           6         6   -78.04484
# 11:       11      14      93.17       63.17        31      97      81.91       48.32             8      96          96        96 -5467.77459
# 12:       12       3      53.62       60.76        23      96      54.66       52.70             0      62          62        62 -1361.57863
# 13:       13      11      75.42       46.20        38      87      95.31       91.82             2      84          84        84  -445.18765
# 14:       14      14      93.17       63.17        31      97      60.32       96.25             9      71          71        71  -854.86321
# 15:       15       3      53.62       60.76        23      96      97.39       47.91             7      76          76        76  1304.41634
# 16:       16       3      55.71       55.26         2      73      65.21       44.63             9       3           3         3 -7015.57516

样本数据

dt1 <- read.table( text = "latitude_1     longitude_1 precipitation    day_mon
54.17   62.15   5   34
69.61   48.65   3   62
73.48   90.16   7   96
66.92   90.27   7   19
56.19   96.46   9   25
72.23   74.18   5   81
88.00   95.20   7   97
92.44   44.41   6   18
95.83   52.91   9   88
99.68   96.23   8   6
81.91   48.32   8   96
54.66   52.70   0   62
95.31   91.82   2   84
60.32   96.25   9   71
97.39   47.91   7   76
65.21   44.63   9   3", header = TRUE ) %>% 
  setDT()

dt2 <- read.table( text = "station    latitude_2  longitude_2  day_begin   day_end 
15  50.00   93.22   34  46
                   11  86.58   85.29   15  47
                   14  93.17   63.17   31  97
                   10  88.56   61.28   15  78
                   13  45.29   77.10   24  79
                   6   69.73   99.52   13  73
                   4   45.60   77.36   28  95
                   13  92.88   62.38   9   51
                   1   65.10   64.13   7   69
                   10  60.57   86.77   34  64
                   3   53.62   60.76   23  96
                   16  87.82   59.41   38  47
                   1   47.83   95.89   21  52
                   11  75.42   46.20   38  87
                   3   55.71   55.26   2   73
                   16  71.65   96.15   36  93", header = TRUE ) %>% 
  setDT()