Question

我正在处理一个脚本，该脚本接收一个地址并吐出两个值：坐标（作为列表）和结果（地理编码是否成功。这样可以正常工作，但是因为返回了数据作为一个列表，我必须根据该列表的索引分配新列，它可以工作，但会返回一个警告：

A value is trying to be set on a copy of a slice from a DataFrame

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy.

编辑：为了清楚起见，我想我从该页面了解到我应该使用.loc来访问嵌套值。我的问题更多的是直接从函数生成两列，而不是稍后需要挖掘信息的这种解决方法。

我想知道解决这类问题的正确方法，因为我在这个项目中实际上有两次这个问题。

问题的实际细节并不重要，所以这里是我如何接近它的一个简单例子：

def geo(address):
    location = geocode(address)
    result = location.result
    coords = location.coords
    return coords, result

df['output'] = df['address'].apply(geo)

因为这会在我的df列中产生一个嵌套列表，然后我将其提取到新列中：

df['coordinates'] = None
df['gps_status'] = None

for index, row in df.iterrows():
    df['coordinates'][index] = df['output'][index][0]
    df['gps_status'][index] = df['output'][index][1]

我再次收到警告：

A value is trying to be set on a copy of a slice from a DataFrame

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

任何有关正确方法的建议都将受到赞赏。

Answer 1

你的功能应该返回一个系列：

def geo(address):
    location = geocode(address)
    result = location.result
    coords = location.coords
    return pd.Series([coords, result], ['coordinates', 'gps_status'])

df['output'] = df['address'].apply(geo)

尽管如此，这可能更好地写成merge。

Answer 2

通常你想避免使用iterrows（），因为一次操作整个列的速度更快。您可以将输出结果直接分配给新列。

import pandas as pd

def geo(x):
    return x*2, x*3

df = pd.DataFrame({'address':[1,2,3]})

output = df['address'].apply(geo)

df['a'] = [x[0] for x in output]
df['b'] = [x[1] for x in output]

给你

   address  a  b
0        1  2  3
1        2  4  6
2        3  6  9

没有复制警告。

使用.apply

2 个答案: