在Python pandas数据帧上并行化反向地理编码功能

时间:2018-04-29 22:09:58

标签: python pandas numpy dataframe multiprocessing

我有一个Pandas数据帧。在2列中,我有某些点的纬度和经度。在另一个名为city的列中,我想要给出给定行的纬度和经度所属城市的名称。

我已将数据框上的纬度和经度列切成numpy数组。然后我使用多处理库来创建一个小的并行映射函数,它接受numpy数组,拆分它,将给定拆分的每个函数应用程序映射到我的计算机中的每个核心,以便最终它可以加入中间结果。

但是我无法正确地这样做。由于我对Python有点新,我想知道是否有更好的(甚至是标准的)方法来做到这一点。

我的代码如下:

def reverse_code( latitude, longitude ):
    g = geocoder.google([latitude, longitude], method="reverse")
    return g.city

def parallelize( data, func):
    data_split = np.array_split(np.array_split(data,2), partitions)
    pool = Pool(cores)
    data = pd.concat(pool.map(func, data_split))
    pool.close()
    pool.join()
    return data

cores = cpu_count()
partitions = cores
distritos = df[["latitud", "longitud"]].as_matrix

parallelize(distritos, reverse_code)

执行代码后,我收到以下错误:

---------------------------------------------------------------------------
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
C:\ProgramData\Anaconda3\envs\jeptest\lib\site-packages\numpy\lib\shape_base.py in array_split(ary, indices_or_sections, axis)
    457     try:
--> 458         Ntotal = ary.shape[axis]
    459     except AttributeError:

AttributeError: 'function' object has no attribute 'shape'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-26-2af0d007a2f8> in <module>()
----> 1 parallelize(distritos, reverse_code)

<ipython-input-25-d2c492561bd2> in parallelize(data, func)
      1 def parallelize( data, func):
----> 2     data_split = np.array_split(np.array_split(data,2), partitions)
      3     pool = Pool(cores)
      4     data = pd.concat(pool.map(func, data_split))
      5     pool.close()

C:\ProgramData\Anaconda3\envs\jeptest\lib\site-packages\numpy\lib\shape_base.py in array_split(ary, indices_or_sections, axis)
    458         Ntotal = ary.shape[axis]
    459     except AttributeError:
--> 460         Ntotal = len(ary)
    461     try:
    462         # handle scalar case.

TypeError: object of type 'method' has no len()

0 个答案:

没有答案