重组R中地理邻近度分析的数据

时间:2014-11-18 22:30:08

标签: r geographic-distance

我有一个人的地理坐标数据集,如下所示:

Person  Latitude    Longitude
  1     46.0614     -23.9386
  2     48.1792      63.1136
  3     59.9289      66.3883
  4     42.8167      58.3167
  5     43.1167      63.25

我计划使用R中的geosphere包计算二元级别的地理邻近度。为了实现这一点,我需要创建一个如下所示的数据集:

Person1 Person2 LatitudeP1  LongitudeP1 LatitudeP2  LongitudeP2
   1       2     46.0614    -23.9386     48.1792     63.1136
   1       3     46.0614    -23.9386     59.9289     66.3883
   1       4     46.0614    -23.9386     42.8167     58.3167
   1       5     46.0614    -23.9386     43.1167     63.25
   2       3     48.1792     63.1136     59.9289     66.3883
   2       4     48.1792     63.1136     42.8167     58.3167
   2       5     48.1792     63.1136     43.1167     63.25
   3       4     59.9289     66.3883     42.8167     58.3167
   3       5     59.9289     66.3883     43.1167     63.25
   4       5     42.8167     58.3167     43.1167     63.25

因此,结果数据对于数据集中的每个可能的二元组都有一行,并且包括二元组中两个个体的坐标。 “LatitudeP1”和“LongitudeP1”是二元组中“Person1”的坐标,“LatitudeP2”和“LongitudeP2”是二元组中“Person2”的坐标。此外,将哪个ID列为Person1与Person2并不重要,因为地理距离不是定向关系。

2 个答案:

答案 0 :(得分:2)

只需采用combn 1到5的可能组合(Person),并从原始数据中对Lat / long进行子集化:

dat <- read.table(header = TRUE, text="Person  Latitude    Longitude
1     46.0614     -23.9386
2     48.1792      63.1136
3     59.9289      66.3883
4     42.8167      58.3167
5     43.1167      63.25")

tmp <- t(combn(nrow(dat),2))

#      [,1] [,2]
# [1,]    1    2
# [2,]    1    3
# [3,]    1    4
# [4,]    1    5
# [5,]    2    3
# [6,]    2    4
# [7,]    2    5
# [8,]    3    4
# [9,]    3    5
# [10,]    4    5

res <- cbind(tmp,
             do.call('cbind', lapply(1:2, function(x) 
               mapply(`[`, dat[, 2:3], MoreArgs = list(i=tmp[, x])))))
colnames(res) <- c('Person1','Person2','LatitudeP1','LongitudeP1',
                   'LatitudeP2','LongitudeP2')

data.frame(res)

#    Person1 Person2 LatitudeP1 LongitudeP1 LatitudeP2 LongitudeP2
# 1        1       2    46.0614    -23.9386    48.1792     63.1136
# 2        1       3    46.0614    -23.9386    59.9289     66.3883
# 3        1       4    46.0614    -23.9386    42.8167     58.3167
# 4        1       5    46.0614    -23.9386    43.1167     63.2500
# 5        2       3    48.1792     63.1136    59.9289     66.3883
# 6        2       4    48.1792     63.1136    42.8167     58.3167
# 7        2       5    48.1792     63.1136    43.1167     63.2500
# 8        3       4    59.9289     66.3883    42.8167     58.3167
# 9        3       5    59.9289     66.3883    43.1167     63.2500
# 10       4       5    42.8167     58.3167    43.1167     63.2500

答案 1 :(得分:1)

如果你想要成对距离,并且你正在使用包geosphere,为什么不使用distm(...)而不是跳过所有这些火热的箍:

# df is the dataset from your question
library(geosphere)
distm(df[,3:2],fun=distHaversine)   # distance in *meters*
#         [,1]      [,2]    [,3]      [,4]      [,5]
# [1,]       0 6224407.2 5743824 6243068.1 6553157.4
# [2,] 6224407       0.0 1324950  704260.1  563654.6
# [3,] 5743824 1324949.8       0 1982326.1 1883584.1
# [4,] 6243068  704260.1 1982326       0.0  403183.0
# [5,] 6553157  563654.6 1883584  403183.0       0.0

您也可以使用fossil包。

library(fossil)
earth.dist(df[,3:2],dist=FALSE)     # distance in *kilometers*
#          [,1]      [,2]     [,3]      [,4]      [,5]
# [1,]    0.000 6219.1967 5739.016 6237.8420 6547.6718
# [2,] 6219.197    0.0000 1323.841  703.6706  563.1828
# [3,] 5739.016 1323.8407    0.000 1980.6667 1882.0073
# [4,] 6237.842  703.6706 1980.667    0.0000  402.8455
# [5,] 6547.672  563.1828 1882.007  402.8455    0.0000

请注意,这些函数需要经度,然后是纬度,所以你必须传递cols 3:2,而不是2:3。


编辑对OP评论的回应。

“边缘列表”听起来像是想要以igraph对象结束。您可以使用距离矩阵作为igraph中的邻接矩阵,距离将自动填充边缘列表上的权重。

library(igraph)
library(geosphere)
g <- graph.adjacency(distm(df[,3:2],fun=distHaversine),
                     mode="undirected",weighted=TRUE)
set.seed(1)   # for reproducible plot
plot(g, layout=layout.fruchterman.reingold(g,weights=E(g)$weight))

get.data.frame(g,"edges")
#    from to    weight
# 1     1  2 6224407.2
# 2     1  3 5743824.5
# 3     1  4 6243068.1
# 4     1  5 6553157.4
# 5     2  3 1324949.8
# 6     2  4  704260.1
# 7     2  5  563654.6
# 8     3  4 1982326.1
# 9     3  5 1883584.1
# 10    4  5  403183.0