按重复索引删除行

时间:2018-01-05 16:46:09

标签: python pandas dataframe

我有一个数据框,我需要根据计数器删除行。

数据框如下所示:

Counter({1: 1, 2: 1, 3: 1, 4: 2, 5: 1})

此示例的计数器的键值等于索引值,值等于需要为该索引删除的一个或多个行。

for k,v in count.iteritems():
    del t.ix[k][:v]

我试图使用循环删除行,但我收到错误。

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-102-33c0a6ba6f58> in <module>()
----> 1 del t.ix[k][:v]
      2 

C:\Anaconda2\lib\site-packages\pandas\core\generic.pyc in __delitem__(self, key)
   1788             # there was no match, this call should raise the appropriate
   1789             # exception:
-> 1790             self._data.delete(key)
   1791 
   1792         # delete from the caches

C:\Anaconda2\lib\site-packages\pandas\core\internals.pyc in delete(self, item)
   3647         Delete selected item (items if non-unique) in-place.
   3648         """
-> 3649         indexer = self.items.get_loc(item)
   3650 
   3651         is_deleted = np.zeros(self.shape[0], dtype=np.bool_)

C:\Anaconda2\lib\site-packages\pandas\core\indexes\base.pyc in get_loc(self, key, method, tolerance)
   2391             key = _values_from_object(key)
   2392             try:
-> 2393                 return self._engine.get_loc(key)
   2394             except KeyError:
   2395                 return self._engine.get_loc(self._maybe_cast_indexer(key))

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5239)()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:4792)()

TypeError: 'slice(None, 2, None)' is an invalid key

这是错误:

    column1     column2
id      
4   0.274887    0.072359
4   0.331148    0.317341
5   0.717883    0.763131

如何完成此任务以获得最终的df,如下所示:

glFinish()

2 个答案:

答案 0 :(得分:1)

如果您想避免在数据帧上循环,可以使用merge来查找要删除的行:

df = df.reset_index()
df['grp_counter'] = df.groupby('id').cumcount()+1

   id   column1   column2  grp_counter
0   1  0.974600  0.400304            1
1   2  0.499050  0.546998            1
2   3  0.245399  0.675422            1
3   4  0.109111  0.664372            1
4   4  0.715271  0.169065            2
5   4  0.274887  0.072359            3
6   4  0.331148  0.317341            4
7   5  0.404076  0.347777            1
8   5  0.717883  0.763131            2

selector = pd.Series({1: 1, 2: 1, 3: 1, 4: 2, 5: 1}).rename('count_select').reset_index()
selector['keep'] = False 
df = df[df.merge(selector, left_on=['id','grp_counter'], right_on=['index','count_select'], how='outer')['keep'].fillna(True)]
df = df.drop('grp_counter', axis=1).set_index('id')

     column1   column2
id                    
4   0.109111  0.664372
4   0.274887  0.072359
4   0.331148  0.317341
5   0.717883  0.763131

答案 1 :(得分:0)

在DataFrame上使用<input type="button" value="Registrar Modificaciones" id="btn-modificar" class="btn btn-primary pull-right" /> $("#btn-modificar").click(function (e) { e.preventDefault(); $("#fechaInicio").datepicker("hide"); $("#fechaFin").datepicker("hide"); this.focus(); $("#idNIT").val(proveedor); 对我来说很奇怪,所以我想尽可能避免使用它。为了解决这个问题,我建议找到给定密钥的所有行并保留最后的del个条目,然后删除其余的。

rows.shape[0] - v