汇总每日数据以计算月平均值

时间:2012-03-30 20:25:14

标签: python

您好,我是Python的新用户,我遇到的问题是我想象的是一项相当基本的任务。

我有几个(> 50个)csv文件,其中包含每日雪深数据。我想迭代csv文件并计算每月雪深的方法。数据示例:

Date,SD
1/1/2000,36
1/2/2000,36
1/3/2000,38
1/4/2000,40
2/1/2000,48
2/2/2000,48

换句话说,我想计算每月雪深平均值,并将输出写入新的csv文件。我能够为我的数据修改不同的代码示例,但是我收到了使用Date作为我的词典中的键值的关键错误。

有什么建议吗?

到目前为止

代码:

from __future__ import division
import csv
from collections  import defaultdict

def default_factory():
    return [0, None, None, 0]

reader = csv.DictReader(open(r'C:\SandBox\VALIDATION\TestTable.csv'))

dates = defaultdict(default_factory)
for row in reader:
    sd = int(row["SD"])
    dates[row["Dates"]][0] += sd
    max = dates[row["Dates"]][1]
    dates[row["Dates"]][1] = amount if max is None else amount if amount > max else max
    min = dates[row["Date"]][2]
    dates[row["Dates"]][2] = amount if min is None else amount if amount < min else min
    dates[row["Dates"]][3] += 1

for date in dates:
    dates[date][3] = dates[date][0]/dates[date][3]

writer = csv.writer(open(r'C:\SandBox\VALIDATION\TestAvg.csv', 'w', newline = ''))
writer.writerow(["Date", "SD", "max", "min", "mean"])
writer.writerows([date] + dates[date] for date in dates)

编辑:只是为了澄清,我试图达到月总平均值,即1月平均值,2月平均值等等...不计算单个日期的平均值。

3 个答案:

答案 0 :(得分:0)

您使用Dates作为列名称的某些地方(例如max = dates[row["Dates"]][1]),以及其他地方Date(例如min = dates[row["Date"]][2]),从您的数据示例看起来像Date是列名?所以,如果你在任何地方使用相同的名称,那就应该没问题,例如。

s="""Date,Snowdepth
1/1/2000,36
1/2/2000,36
1/3/2000,38
1/4/2000,40
2/1/2000,48
2/2/2000,48"""

import StringIO
import csv
reader = csv.DictReader(StringIO.StringIO(s))

for row in reader:
    print row['Date']

输出:

1/1/2000
1/2/2000
1/3/2000
1/4/2000
2/1/2000
2/2/2000

答案 1 :(得分:0)

from __future__ import division
import csv
from collections  import defaultdict

def default_factory():
    return [0, None, None, 0]

reader = csv.DictReader(open(r'snow_data.csv'))

dates = defaultdict(default_factory)

for row in reader:
    amount = int(row["Snowdepth"])
    dates[row["Date"]][0] += amount
    max = dates[row["Date"]][1]
    dates[row["Date"]][1] = amount if max is None else amount if amount > max else max
    min = dates[row["Date"]][2]
    dates[row["Date"]][2] = amount if min is None else amoun if amount < min else min
    dates[row["Date"]][3] += 1


for date in dates:
    dates[date][3] = dates[date][0]/dates[date][3]

writer = csv.writer(open(r'TestAvg.csv', 'w'))
writer.writerow(["Date", "Snowdepth", "max", "min", "mean"])
writer.writerows([date] + dates[date] for date in dates)

我修改了代码以便在任何地方使用DateSnowdepth,这就是您的示例csv提供的内容。此外,您有一个变量amount,其意图是sd,否则未定义金额。我到处都是amount

除非您有一个日期的多个条目,否则不会给出非常令人兴奋的结果。

例如,以下是样本csv的输出:

Date,Snowdepth,max,min,mean

1/3/2000,38,38,38,38.0

2/2/2000,48,48,48,48.0

2/1/2000,48,48,48,48.0

1/4/2000,40,40,40,40.0

1/1/2000,36,36,36,36.0

1/2/2000,36,36,36,36.0

答案 2 :(得分:0)

您可能希望使用字典来使代码更具可读性。

from __future__ import division
import csv
from collections  import defaultdict

def default_factory():
   return { "sum": 0, "max": None, "min": None, "count": 0}

reader = csv.DictReader(open(r'sd.csv'))

dates = defaultdict(default_factory)
rows = []
for row in reader:
    date = row["Date"]
    sd = int(row["Snowdepth"])
    rows.append([date, sd])
    month = date.split("/")[0]
    r = dates[month]
    r["sum"] += sd
    max = r["max"]
    r["max"] = sd if max is None else sd if sd > max else max
    min = r["min"]
    r["min"] = sd if min is None else sd if sd < min else min
    r["count"] += 1

for date in dates:
    r = dates[date]
    r["avg"] = r["sum"]/r["count"]

writer = csv.writer(open(r'TestAvg.csv', 'w'))
writer.writerow(["Date", "SD", "max", "min", "mean"])
for row in rows:
    r = dates[row[0].split("/")[0]]
    writer.writerow(row + [r["max"], r["min"], r["avg"]])