Python - CSV - 按列id计算列值的平均值

时间:2018-06-05 10:47:32

标签: python-3.x pandas csv

我有一个非常大的CSV文件,我设法按列ID排序,但我无法计算具有该列ID的平均列值。

88741,42.84286022,16.41829224,1
88797,42.78081536,16.40743455,1
88797,42.78081536,16.21153455,1
88823,42.51512511,16.43304948,2
88885,42.88204193,16.12412548,2
87227,42.88204193,16.64223948,3
and so on...

我需要在没有SchoolCode列的情况下获得新的csv,并且每个群集的Lat和Long平均值。而且,数字应该是相同的。我试过熊猫它会把这个错误抛给我。

输出应该是这样的:

Lat,Long,Cluster
<average_lat_forCluster1>,<average_long_forCluster1>,1
<average_lat_forCluster2>,<average_long_forCluster2>,2
<average_lat_forCluster3>,<average_long_forCluster3>,3
and so on...

我的代码:

import pandas as pd

df = pd.read_csv('SortedCluster.csv', names=[
             'SchoolCode', 'Lat', 'Long', 'Cluster'])
df2 = df.groupby('Cluster')['Lat','Long'].mean()
df2.to_csv('AverageOutput.csv')

错误:

    Traceback (most recent call last):
  File "averager.py", line 6, in <module>
    df2 = df.groupby('Cluster')['Lat','Long'].mean()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 1306, in mean
    return self._cython_agg_general('mean', **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 3974, in _cython_agg_general
    how, alt=alt, numeric_only=numeric_only, min_count=min_count)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 4046, in _cython_agg_blocks
    raise DataError('No numeric types to aggregate')
pandas.core.base.DataError: No numeric types to aggregate

1 个答案:

答案 0 :(得分:0)

我认为如有必要,首先需要将值转换为数字:

if (IPV4Interfaces != null)
{
    List<UnicastIPAddressInformation> RoutableIpAddresses =
        IPV4Interfaces.Where(IF => IF.NetworkInterfaceType == NetworkInterfaceType.Wireless80211)
                      .Select(IF => IF.GetIPProperties().UnicastAddresses.Last())
                      .Where(UniIP => UniIP.IsDnsEligible).ToList();
}

然后按群组汇总df[['Lat','Long']] = df[['Lat','Long']].apply(pd.to_numeric, errors='coerce')

mean