Pandas:根据更大的Dataframe更新Dataframe的系列

时间:2017-02-19 17:06:15

标签: python pandas dataframe

我正在尝试用熊猫来实现一些看似简单的东西,但是经过几次不合理的测试后我才陷入困境。

这就是事情。我有一个Dataframe(让我们称之为街道)只有两个系列:街道名称和与之相关的性别:

     name                             gender
0    Abraham Lincoln Avenue           undefined
1    Donald Trump Dead End            undefined
2    Hillary Clinton Street           undefined
...
1754 Ziggy Marley Boulevard           undefined

另一方面,我有另一个Dataframe(我们称之为fnames),非常庞大。它有四个系列:

       gender   gender_detail  main_gender      first_name
0      F        Female         Female           Aaf
1      F        Female         Female           Aafke
2      F        Female         Female           Aafkea
3      M        Male           Male             Aafko
...
40211  F        Female         Female           Zyta

所以就像你一定猜到的那样,我会用'first_name'系列fnames检查一下这些名字是否出现在街道的'name'系列中。

如果找到了第一个名字,我会在街道上更新“性别”系列,其中fnames'系列的相关值称为“性别”。如果没有,我让'undefined'。

显然,由于Dataframes的大小,我不能使用两个for循环......是否有任何快速解决方案可以实现这一目标?

例如,我是否应该创建一个只有名字作为键,性别作为值才能提高效率的词典?

PS:我不知道它是否可以简化问题,但我的两个Dataframe按字母顺序排序!

1 个答案:

答案 0 :(得分:2)

是的,我认为您可以使用dict name whitespace分割列str[0] map NaNprint (df1) name gender 0 Abraham Lincoln Avenue undefined 1 Donald Trump Dead End undefined 2 Hillary Clinton Street undefined 3 Aaf Street undefined 1754 Ziggy Marley Boulevard undefined print (df2) gender gender_detail main_gender first_name 0 F Female Female Aaf 1 F Female Female Aafke 2 F Female Female Aafkea 3 F Female Female Aafko 40211 F Female Female Zyta 选择第一个值},最后由split替换d = df2.set_index('first_name')['gender'].to_dict() print (d) {'Zyta': 'F', 'Aaf': 'F', 'Aafkea': 'F', 'Aafke': 'F', 'Aafko': 'F'} print (df1['name'].str.split().str[0]) 0 Abraham 1 Donald 2 Hillary 3 Aaf 1754 Ziggy Name: name, dtype: object df1['gender'] = df1['name'].str.split().str[0].map(d).fillna('undefined') print (df1) name gender 0 Abraham Lincoln Avenue undefined 1 Donald Trump Dead End undefined 2 Hillary Clinton Street undefined 3 Aaf Street F 1754 Ziggy Marley Boulevard undefined

[StructLayout(LayoutKind.Sequential, CharSet = CharSet.Ansi, Pack = 1)]
public struct DM
{
    [MarshalAs(UnmanagedType.CustomMarshaler, MarshalTypeRef = typeof(AnsiNullTerminatedString))]
    public string shader;

    [MarshalAs(UnmanagedType.CustomMarshaler, MarshalTypeRef = typeof(AnsiNullTerminatedString))]
    public string texture;

    public uint flags;
    public float m_min_scale;
    public float m_max_scale;
    public uint num_vertices;
    public uint num_indices;

    [MarshalAs(UnmanagedType.ByValArray, ArraySubType = UnmanagedType.Struct, SizeParamIndex = 5)]
    public DMVertex[] vb;

    [MarshalAs(UnmanagedType.ByValArray, ArraySubType = UnmanagedType.U2, SizeParamIndex = 6)]
    public ushort[] ib;
}

[StructLayout(LayoutKind.Sequential, Pack = 4)]
public struct DMVertex
{
    public Vector3 point;
    public Vector2 texcoord;
}
public static T MarshalStruct<T>(byte[] data) where T : struct
{
    GCHandle handle = GCHandle.Alloc(data, GCHandleType.Pinned);
    T temp = (T)Marshal.PtrToStructure(handle.AddrOfPinnedObject(), typeof(T));
    handle.Free();
    return temp;
}
相关问题