如何根据唯一记录和非na值在pyspark数据框中的列上进行迭代

时间:2019-05-13 04:15:07

标签: pyspark

我在python中有以下代码

for i in (map.area.unique()):

   # Select all the map records from the currently processed area
   f_0 = f_map[(f_map['area'] == i )]
   m_0 = m_map[(m_map['area'] == i) | (m_map['area'] == "Unknown")]

我正在pyspark中重写它。但是第三行抛出异常。谁能指出我做错了什么。

地图数据框为:

             play_id    calendar_period            telephone  area
 1:         286178          201811                03235095  510
 2:         286179          201811                03235113  500

f_map:

       id        value area type
1: 227149 385911000059  510  mob
2: 122270 385911000661  100  fix
m_map:
       id area type
1: 227149 590  mob
2: 122270 190  fix

输出应该是:

       id        value    area type
1: 227149 385994266007 Unknown  mob
2: 122270 385989281716 Unknown  mob

1 个答案:

答案 0 :(得分:2)

我认为问题出在最后一行。如果我正确地理解了您的问题,那么应该是您要找的东西:

 temp1 = sampdf[(sampdf['area'] == i) | (sampdf['area'] == "Unknown")]