Question

我有一个这样的数据框

   Year  Month ProductCategory  Sales(In ThousandDollars)
0    2009      1   WomenClothing                     1755.0
1    2009      1     MenClothing                      524.0
2    2009      1   OtherClothing                      936.0
3    2009      2   WomenClothing                     1729.0
4    2009      2     MenClothing                      496.0
5    2009      2   OtherClothing                      859.0
6    2009      3   WomenClothing                     2256.0
7    2009      3     MenClothing                      542.0
8    2009      3   OtherClothing                      921.0
9    2009      4   WomenClothing                     2662.0
10   2009      4     MenClothing                      669.0
11   2009      4   OtherClothing                      914.0
12   2009      5   WomenClothing                     2732.0
13   2009      5     MenClothing                      650.0
14   2009      5   OtherClothing                      989.0
15   2009      6   WomenClothing                     2220.0
16   2009      6     MenClothing                      607.0
17   2009      6   OtherClothing                      932.0
18   2009      7   WomenClothing                     2164.0
19   2009      7     MenClothing                      575.0
20   2009      7   OtherClothing                      901.0
21   2009      8   WomenClothing                     2371.0
22   2009      8     MenClothing                      551.0
23   2009      8   OtherClothing                      865.0
24   2009      9   WomenClothing                     2421.0
25   2009      9     MenClothing                      579.0
26   2009      9   OtherClothing                      819.0
27   2009     10   WomenClothing                     2579.0
28   2009     10     MenClothing                      610.0
29   2009     10   OtherClothing                      914.0

一年中的每个月都有3个不同的产品类别（女士服装，男士服装，其他服装），因此表示每个月我们有3行。我想取每个月的Sales列的平均值，即每3行的平均值，并将其作为每个月的一个值，这样我就可以减少行数。也就是说，最后，我只希望一年中的每个月都有一行。

就像这样：

  Year  Month              Average Sale of each month
0    2009      1                      1071.66
3    2009      2                      1028.0
6    2009      3                      1239.66
10   2009      4                      1415.0

Answer 1

您可以使用：

df.groupby(['Year','Month'])['Sales(In ThousandDollars)'].mean().reset_index()

   Year  Month  Sales(In ThousandDollars)
0  2009      1                1071.666667
1  2009      2                1028.000000
2  2009      3                1239.666667
3  2009      4                1415.000000
4  2009      5                1457.000000
5  2009      6                1253.000000
6  2009      7                1213.333333
7  2009      8                1262.333333
8  2009      9                1273.000000
9  2009     10                1367.666667

Answer 2

您可以利用索引进行分组。看起来像这样：

df.groupby(df.index // 3).mean()

如果您的月份列与您在一年中的每个月总是有3行一致，则可以按年份和月份分组以获得相同的结果。

这给您：

        Year    Month   Sales
0   2009    1   1071.666667
1   2009    2   1028.000000
2   2009    3   1239.666667
3   2009    4   1415.000000
4   2009    5   1457.000000
5   2009    6   1253.000000
6   2009    7   1213.333333
7   2009    8   1262.333333
8   2009    9   1273.000000
9   2009    10  1367.666667

如何获取特定列中每3行的平均值？

2 个答案: