如何计算当前行与上一行之间的正弦距离?

时间:2019-02-20 17:26:38

标签: python pandas pandas-groupby haversine

我有一个看起来像这样的df,我在其中按ID分组

 id     lat          lon
 1       NaN         NaN
 1       40.121      23.749
 1      -56.154     -39.572
 1       21.908      17.537
 1       31.221     -36.186
 1      -56.655      0.016
 2       NaN         NaN
 2      -36.438      14.874
 2      -21.422      81.271
 2       43.961     -95.551
 3       NaN         NaN
 3       79.821     -56.781

使用Haversine函数,我想计算当前行到上一行的距离。因此,将使用

计算新列的第一个条目

lat 1 = 40.121

lon 1 = 23.749

lat 2 = -56.154

lon 2 = -39.572

1 个答案:

答案 0 :(得分:0)

改编自this答案。链接的答案显示了如何计算每行与经度/纬度的某个固定值之间的距离-我的修改使其适用于您的情况。

首先,使用shift在同一行上获取所需的所有值:

df['lon2'] = df['lon'].shift(-1)
df['lat2'] = df['lat'].shift(-1)

给予:

    id     lat     lon    lat2    lon2
0    1     NaN     NaN  40.121  23.749
1    1  40.121  23.749 -56.154 -39.572
2    1 -56.154 -39.572  21.908  17.537
3    1  21.908  17.537  31.221 -36.186
4    1  31.221 -36.186 -56.655   0.016
5    1 -56.655   0.016     NaN     NaN
6    2     NaN     NaN -36.438  14.874
7    2 -36.438  14.874 -21.422  81.271
8    2 -21.422  81.271  43.961 -95.551
9    2  43.961 -95.551     NaN     NaN
10   3     NaN     NaN  79.821 -56.781
11   3  79.821 -56.781     NaN     NaN

然后定义距离计算功能:

from numpy import cos, sin, arcsin, sqrt
from math import radians

def haversine(row):
    lon1 = row['lon']
    lat1 = row['lat']
    lon2 = row['lon2']
    lat2 = row['lat2']
    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
    dlon = lon2 - lon1 
    dlat = lat2 - lat1 
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * arcsin(sqrt(a)) 
    km = 6367 * c
    return km

并使用apply将其应用于您的数据:

df['distance'] = df.apply(haversine, axis=1)

给予:

    id     lat     lon    lat2    lon2      distance
0    1     NaN     NaN  40.121  23.749           NaN
1    1  40.121  23.749 -56.154 -39.572  12237.017692
2    1 -56.154 -39.572  21.908  17.537  10187.684397
3    1  21.908  17.537  31.221 -36.186   5387.540299
4    1  31.221 -36.186 -56.655   0.016  10343.267833
5    1 -56.655   0.016     NaN     NaN           NaN
6    2     NaN     NaN -36.438  14.874           NaN
7    2 -36.438  14.874 -21.422  81.271   6543.302199
8    2 -21.422  81.271  43.961 -95.551  17480.809345
9    2  43.961 -95.551     NaN     NaN           NaN
10   3     NaN     NaN  79.821 -56.781           NaN
11   3  79.821 -56.781     NaN     NaN           NaN

我相信可以显示出您正在寻找的结果(我测试了第一个,似乎是正确的)。

如果愿意,一旦计算完成,您就可以摆脱两个副纬度/经度列:

df.drop(['lat2', 'lon2'], axis=1, inplace=True)

我应该注意,该解决方案不会为您提供最快的计算速度。请查看我链接的答案的下半部分,以探讨如何在此处将性能放在首位的情况下可以改进它,尽管需要对其进行调整。