在熊猫分组中删除类别为0的类别

时间:2018-06-29 15:27:49

标签: python pandas dataframe

我想删除熊猫value_counts function()之后计数为0的类别

我的数据如下:

categories: 
Index(['Average', 'Good',  'Poor', ,'VeryGood', 'VeryPoor'],
  dtype='object')

 Output of value counts:

  score     Frequency
   VG        21
   G         15
   A         63
   P         27
   VP        0

我的结果应为

  score     Frequency
   VG        21
   G         15
   A         63
   P         27

我想将其存储在数据框中并绘制条形图。我不想在图中显示VP,因为它的计数为0,因此消除了该类别

我的代码:

          quality_scores=quality.SCORE.value_counts()
          quality_scores=pd.Series.to_frame(quality_scores)
          quality_scores=quality_scores.rename(columns={'SCORE': 
            'Frequency'})
          quality_scores['Score']=quality_scores.index
          quality_scores=quality_scores.reset_index(drop=True)


          quality_scores = quality_scores[quality_scores.Frequency != 0]
          quality_scores

我正在根据评论编辑答案:

打印数据框时,我得到正确的答案。但是,当我使用quality_scores ['Score']。cat.categories检查类别时,我仍然看到不应显示的VP类别。

此外,在图形中,我不希望看到VP类别,而是将其显示在轴上。

以下是该图的代码:

           plt.figure(figsize=(15,7))

           quality_graph=sns.barplot(y=quality_scores["Frequency"],
           x=quality_scores["Score"])

           quality_graph.set_xlabel('Frequency')

           quality_graph.set_title('Score Distribution of Quality 
           Measure:',fontsize=25)

           plt.savefig('graphs\\Quality_Measure.png')

如果您看到图表上有许多空白类别。这实际上不在quality_scores数据框中。 enter image description here

2 个答案:

答案 0 :(得分:0)

请记住,情况很重要:“分数”和“分数”不同。您创建了两列,一列称为“ SCORE”,另一列称为“ Score”。

我运行了以下代码,它按预期运行。

import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sns
grades = ['VG','G','A','P','VP']
counts = [21,15,63,27,0] 

d = { 'Score' : grades, 'Frequency': counts }
quality_scores = pd.DataFrame(data = d)
quality_scores=quality_scores.reset_index(drop=True)
quality_scores = quality_scores[quality_scores.Frequency != 0]

plt.figure(figsize=(15,7))
quality_graph=sns.barplot(y=quality_scores['Frequency'], x=quality_scores['Score'])
quality_graph.set_xlabel('Frequency')
quality_graph.set_title('Score Distribution of Quality Measure:',fontsize=25)
plt.savefig('Quality_Measure.png')

答案 1 :(得分:0)

这是因为VP仍然是该系列的属性。从熊猫0.23开始,您可以将observed=True传递到groupby中以从数据中删除未观察到的类别:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html

相关问题