循环遍历数据框并为每个用户组填充 url 请求

时间:2021-04-06 15:48:41

标签: python pandas loops python-requests

我有一个带有 GPS 点的 Pandas 数据框,如下所示:

Option Explicit

Public Sub TransferData()
    Dim InputSheet As Worksheet ' set data input sheet
    Set InputSheet = ThisWorkbook.Worksheets("Input")
    
    Dim InputRange As Range ' define input range
    Set InputRange = InputSheet.Range("B6:G6") ' I recomend a named range instead!
    
    Dim TargetSheet As Worksheet
    Set TargetSheet = ThisWorkbook.Worksheets("Target") ' Define your Target Workbooks("Main File.xlsm").Worksheets("DataBase")
    
    
    Const TargetStartCol As Long = 2        ' start pasting in this column in target sheet
    Const PrimaryKeyCol As Long = 1         ' this is the unique primary key in the input range (means first column of B6:G6 is primary key)
    
    Dim InsertRow As Long ' this will be the row to insert
    ' first we try to find a row with the same primary key to replace
    On Error Resume Next ' next row will error if no match is found, so hide error messages
    ' match primary key of data input with target
    InsertRow = Application.WorksheetFunction.Match(InputRange.Cells(1, 1), TargetSheet.Columns(TargetStartCol + PrimaryKeyCol - 1), 0)
    On Error GoTo 0 're-enable error messages!
    
    If InsertRow = 0 Then ' if no matching primary key was found
        ' insert in the next empty row in the end
        InsertRow = TargetSheet.Cells(TargetSheet.Rows.Count, TargetStartCol + PrimaryKeyCol - 1).End(xlUp).Row + 1
    End If
    
    ' copy values to target row
    TargetSheet.Cells(InsertRow, TargetStartCol).Resize(ColumnSize:=InputRange.Columns.Count).Value = InputRange.Value
End Sub

使用下面的函数,我可以将所有这些坐标从 df 直接提供给 (OSRM) 请求以映射匹配这些 GPS 点

    import pandas as pd
    d = {'user': ['A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'C', 'C'], 'lat': [ 37.75243634842733, 37.75344580658182, 37.75405656449232, 37.753649393112181,37.75409897804892, 37.753937806404586, 37.72767062183685, 37.72710631810977, 37.72605407110467, 37.71141865080228, 37.712199505873926, 37.713285899241896, 37.71428740401767, 37.712810604103346], 'lon': [-122.41924881935118, -122.42006421089171, -122.419216632843, -122.41784334182738, -122.4169099330902, -122.41549372673035, -122.3878937959671, -122.3884356021881, -122.38841414451599, -122.44688630104064, -122.44474053382874, -122.44361400604248, -122.44260549545288, -122.44156479835509]}
    df = pd.DataFrame(data=d)
    

    user    lat         lon
0   A       37.752436   -122.419249
1   A       37.753446   -122.420064
2   A       37.754057   -122.419217
3   A       37.753649   -122.417843
4   A       37.754099   -122.416910
5   A       37.753938   -122.415494
6   B       37.727671   -122.387894
7   B       37.727106   -122.388436
8   B       37.726054   -122.388414
9   C       37.711419   -122.446886
10  C       37.712200   -122.444741
11  C       37.713286   -122.443614
12  C       37.714287   -122.442605
13  C       37.712811   -122.441565

但是,由于 df 由不同用户生成的不同 GPS 轨迹组成,我想编写一个函数来循环遍历此数据帧并将相应的坐标集提供给每个用户组的请求,而不是一次全部提供。这样做的最佳方法是什么?

1 个答案:

答案 0 :(得分:1)

您可以在 groupby 列上 user 数据框,然后将 make_request 应用于每个组,并将输出保存到 output dict(以用户为键):

output = {}
for user, g in df.groupby('user'):
    output[user] = make_request(g[['lat', 'lon']].values)
相关问题