Question

我是Python新手，想重建这个example。我有关于NYC出租车接送和下车的经度和纬度数据，但是，我需要将数据更改为Web Mercartor格式（这在上面的示例中找不到）。我找到了一个函数，它可以取一对经度和纬度值，并将其更改为Web Mercartor格式，取自here，它看起来如下：

import math
def toWGS84(xLon, yLat):
    # Check if coordinate out of range for Latitude/Longitude
    if (abs(xLon) < 180) and (abs(yLat) > 90):
        return

    # Check if coordinate out of range for Web Mercator
    # 20037508.3427892 is full extent of Web Mercator
    if (abs(xLon) > 20037508.3427892) or (abs(yLat) > 20037508.3427892):
        return

    semimajorAxis = 6378137.0  # WGS84 spheriod semimajor axis

    latitude = (1.5707963267948966 - (2.0 * math.atan(math.exp((-1.0 * yLat) / semimajorAxis)))) * (180/math.pi)
    longitude = ((xLon / semimajorAxis) * 57.295779513082323) - ((math.floor((((xLon / semimajorAxis) * 57.295779513082323) + 180.0) / 360.0)) * 360.0)

    return [longitude, latitude]



def toWebMercator(xLon, yLat):
    # Check if coordinate out of range for Latitude/Longitude
    if (abs(xLon) > 180) and (abs(yLat) > 90):
        return

    semimajorAxis = 6378137.0  # WGS84 spheriod semimajor axis
    east = xLon * 0.017453292519943295
    north = yLat * 0.017453292519943295

    northing = 3189068.5 * math.log((1.0 + math.sin(north)) / (1.0 - math.sin(north)))
    easting = semimajorAxis * east

    return [easting, northing]

def main():
    print(toWebMercator(-105.816001, 40.067633))
    print(toWGS84(-11779383.349100526, 4875775.395628653))

if __name__ == '__main__':
    main()

如何将此数据应用于我的pandas Dataframe中的每对long / lat坐标并将输出保存在相同的pandasDF中？

df.tail()
            |    longitude     |    latitude
____________|__________________|______________
11135465    |    -73.986893    |    40.761093  
1113546     |    -73.979645    |    40.747814  
11135467    |    -74.001244    |    40.743172  
11135468    |    -73.997818    |    40.726055  
...

Answer 1

对于大小合适的数据集，最有帮助的是理解如何以pandas方式执行操作。与内置的矢量化方法相比，迭代行将产生可怕的性能。

import pandas as pd
import numpy as np

df = pd.read_csv('/yellow_tripdata_2016-06.csv')
df.head(5)

VendorID    tpep_pickup_datetime    tpep_dropoff_datetime   passenger_count trip_distance   pickup_longitude    pickup_latitude RatecodeID  store_and_fwd_flag  dropoff_longitude   dropoff_latitude    payment_type    fare_amount extra   mta_tax tip_amount  tolls_amount    improvement_surcharge   total_amount
0   2   2016-06-09 21:06:36 2016-06-09 21:13:08 2   0.79    -73.983360  40.760937   1   N   -73.977463  40.753979   2   6.0 0.5 0.5 0.00    0.0 0.3 7.30
1   2   2016-06-09 21:06:36 2016-06-09 21:35:11 1   5.22    -73.981720  40.736668   1   N   -73.981636  40.670242   1   22.0    0.5 0.5 4.00    0.0 0.3 27.30
2   2   2016-06-09 21:06:36 2016-06-09 21:13:10 1   1.26    -73.994316  40.751072   1   N   -74.004234  40.742168   1   6.5 0.5 0.5 1.56    0.0 0.3 9.36
3   2   2016-06-09 21:06:36 2016-06-09 21:36:10 1   7.39    -73.982361  40.773891   1   N   -73.929466  40.851540   1   26.0    0.5 0.5 1.00    0.0 0.3 28.30
4   2   2016-06-09 21:06:36 2016-06-09 21:23:23 1   3.10    -73.987106  40.733173   1   N   -73.985909  40.766445   1   13.5    0.5 0.5 2.96    0.0 0.3 17.76

此数据集有11,135,470行，这不是“大数据”，但不小。不是编写函数并将其应用于每一行，而是通过对单个列执行部分函数来获得更多性能。我会改变这个功能：

def toWebMercator(xLon, yLat):
    # Check if coordinate out of range for Latitude/Longitude
    if (abs(xLon) > 180) and (abs(yLat) > 90):
        return

    semimajorAxis = 6378137.0  # WGS84 spheriod semimajor axis
    east = xLon * 0.017453292519943295
    north = yLat * 0.017453292519943295

    northing = 3189068.5 * math.log((1.0 + math.sin(north)) / (1.0 - math.sin(north)))
    easting = semimajorAxis * east

    return [easting, northing]

进入这个：

SEMIMAJORAXIS = 6378137.0 # typed in all caps since this is a static value
df['pickup_east'] = df['pickup_longitude'] * 0.017453292519943295 # takes all pickup longitude values, multiples them, then saves as a new column named pickup_east.
df['pickup_north'] = df['pickup_latitude'] * 0.017453292519943295
# numpy functions allow you to calculate an entire column's worth of values by simply passing in the column. 
df['pickup_northing'] = 3189068.5 * np.log((1.0 + np.sin(df['pickup_north'])) / (1.0 - np.sin(df['pickup_north']))) 
df['pickup_easting'] = SEMIMAJORAXIS * df['pickup_east']

然后，您有pickup_easting和pickup_northing列，其中包含计算值。

对于我的笔记本电脑，这需要：

CPU times: user 1.01 s, sys: 286 ms, total: 1.3 s
Wall time: 763 ms

对于所有11米行。 15分钟 - ＆gt;秒。

我摆脱了对价值观的检查 - 你可以这样做：

df = df[(df['pickup_longitude'].abs() <= 180) & (df['pickup_latitude'].abs() <= 90)]

这使用布尔索引，它再次比循环快几个数量级。

Answer 2

尝试：

df[['longitude', 'latitude']].apply(
    lambda x: pd.Series(toWebMercator(*x), ['xLon', 'yLay']),
    axis=1
)

Answer 3

如果您希望保留一种可读的数学函数，并轻松转换当前函数，请使用eval：

df.eval("""
northing = 3189068.5 * log((1.0 + sin(latitude * 0.017453292519943295)) / (1.0 - sin(latitude * 0.017453292519943295)))
easting = 6378137.0 * longitude * 0.017453292519943295""", inplace=False)
Out[51]: 
         id  longitude   latitude      northing       easting
0  11135465 -73.986893  40.761093  4.977167e+06 -8.236183e+06
1   1113546 -73.979645  40.747814  4.975215e+06 -8.235376e+06
2  11135467 -74.001244  40.743172  4.974533e+06 -8.237781e+06
3  11135468 -73.997818  40.726055  4.972018e+06 -8.237399e+06

由于您无法使用if语句，因此您必须对语法进行一些处理，但在调用eval之前，您可以轻松过滤出边界外数据。如果要直接分配新列，也可以使用inplace=True。

如果您对保持数学语法感兴趣并且正在搜索全速，那么numpy答案可能会更快地执行。

将函数应用于Pandas Dataframe中的每一行

3 个答案: