我想编写一个脚本来计算直接和漫射辐射的30分钟平均值(即12:00,12:30,1:00 ......)。在计算30分钟平均值之后,我需要将数据分成季节(DJF)(MAM)(JJA)(SON)。应省略等于= -99999的值。
这是前几行数据。这是一个非常大的文件,整整多年。
DATE month day year EST Direct NIP Diffuse PSP (sband corr)
4/1/2004 4 1 2004 5:55 0.01967 1.5687
4/1/2004 4 1 2004 6:00 0.2295 5.3946
4/1/2004 4 1 2004 6:05 0.59015 13.0295
4/1/2004 4 1 2004 6:10 0.78686 23.0043
4/1/2004 4 1 2004 6:15 0.60982 20.827
4/1/2004 4 1 2004 6:20 0.80655 23.199
4/1/2004 4 1 2004 6:25 0.81309 26.951
4/1/2004 4 1 2004 6:30 0.77375 31.0062
4/1/2004 4 1 2004 6:35 0.55081 35.04
4/1/2004 4 1 2004 6:40 0.24262 41.1042
4/1/2004 4 1 2004 6:45 0.39999 46.6218
4/1/2004 4 1 2004 6:50 0.26229 52.7591
4/1/2004 4 1 2004 6:55 0.26885 67.9498
关于如何解决这个问题的任何想法?感谢您的支持。
编辑:到目前为止,这是我的代码。它始终计算所有辐射。请注意,这是业余的,因为我在教自己如何编码。谢谢
import csv
import openpyxl
import matplotlib as mpl
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
from datetime import datetime
x = [datetime(year = 2004, month = 4, day = 1),
datetime(year = 2014, month = 11, day = 18)]
y = []
x2 = []
y2 = []
with open('tenyeardata.csv', 'r') as csvfile:
data = csv.reader(csvfile)
firstline = True
for row in data:
if firstline: #skip first line
firstline = False
continue
x.append(int(row[1]))
y.append(float(row[5]))
x2.append(int(row[3]))
y2.append(float(row[6]))
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax1.set_title("North Carolina Radiation (Direct and Diffuse)")
ax1.set_xlabel('time (hours)')
ax1.set_ylabel('SW (W m-2)')
print x[:10]
print y[:10]
ax1.plot(y, c='r', label='Direct')
ax1.plot(y2, c='b', label = 'Diffuse')
ax1.axis([-1, 568217, 0, 1100])
leg = ax1.legend()
plt.axis([-1, 568217, 0, 1100])
plt.show()
答案 0 :(得分:0)
考虑计算您需要的地块尺寸:每小时的日期/时间和季节。然后为绘图运行groupby()
平均聚合:
from io import StringIO
import pandas as pd
import numpy as np
import time, datetime
data = '''DATE,month,day,year,EST,Direct NIP,Diffuse PSP (sband corr)
4/1/2004,4,1,2004,5:55,0.01967,1.5687
4/1/2004,4,1,2004,6:00,0.2295,5.3946
4/1/2004,4,1,2004,6:05,0.59015,13.0295
4/1/2004,4,1,2004,6:10,0.78686,23.0043
4/1/2004,4,1,2004,6:15,0.60982,20.827
4/1/2004,4,1,2004,6:20,0.80655,23.199
4/1/2004,4,1,2004,6:25,0.81309,26.951
4/1/2004,4,1,2004,6:30,0.77375,31.0062
4/1/2004,4,1,2004,6:35,0.55081,35.04
4/1/2004,4,1,2004,6:40,0.24262,41.1042
4/1/2004,4,1,2004,6:45,0.39999,46.6218
4/1/2004,4,1,2004,6:50,0.26229,52.7591
4/1/2004,4,1,2004,6:55,0.26885,67.9498'''
df = pd.read_csv(StringIO(data))
# ADDED DATE/TIME FIELDS
df['DATE'] = pd.to_datetime(df['DATE'] + ' ' + df['EST'], format='%m/%d/%Y %H:%M')
df['MONTH'] = df['DATE'].dt.month
# EVERY HALF HOUR BLOCKS
df['HALF_HOUR_DATE'] = df['DATE'].apply(lambda dt: datetime.datetime(dt.year, dt.month, dt.day, dt.hour, 30*(dt.minute // 30)))
df['HALF_HOUR_TIME'] = df apply(lambda x: x.strftime('%H:%M'))
# SEASON CONDITIONAL CALCULATION
df['SEASON'] = np.where(df['MONTH'].isin([12,1,2]), 'DJF',
np.where(df['MONTH'].isin([3,4,5]), 'MAM',
np.where(df['MONTH'].isin([6,7,8]), 'JJA',
np.where(df['MONTH'].isin([9,10,11]), 'SON', None))))
# AGGREGATE DATA
aggdf = df[['SEASON', 'HALF_HOUR_DATE', 'Direct NIP', 'Diffuse PSP (sband corr)']].\
groupby(['SEASON','HALF_HOUR_DATE']).mean()
<强>输出强>
更新了数据框
# DATE month day year EST Direct NIP Diffuse PSP (sband corr) MONTH HALF_HOUR_DATE HALF_HOUR_TIME SEASON
# 0 2004-04-01 05:55:00 4 1 2004 5:55 0.01967 1.5687 4 2004-04-01 05:30:00 05:30 MAM
# 1 2004-04-01 06:00:00 4 1 2004 6:00 0.22950 5.3946 4 2004-04-01 06:00:00 06:00 MAM
# 2 2004-04-01 06:05:00 4 1 2004 6:05 0.59015 13.0295 4 2004-04-01 06:00:00 06:00 MAM
# 3 2004-04-01 06:10:00 4 1 2004 6:10 0.78686 23.0043 4 2004-04-01 06:00:00 06:00 MAM
# 4 2004-04-01 06:15:00 4 1 2004 6:15 0.60982 20.8270 4 2004-04-01 06:00:00 06:00 MAM
# 5 2004-04-01 06:20:00 4 1 2004 6:20 0.80655 23.1990 4 2004-04-01 06:00:00 06:00 MAM
# 6 2004-04-01 06:25:00 4 1 2004 6:25 0.81309 26.9510 4 2004-04-01 06:00:00 06:00 MAM
# 7 2004-04-01 06:30:00 4 1 2004 6:30 0.77375 31.0062 4 2004-04-01 06:30:00 06:30 MAM
# 8 2004-04-01 06:35:00 4 1 2004 6:35 0.55081 35.0400 4 2004-04-01 06:30:00 06:30 MAM
# 9 2004-04-01 06:40:00 4 1 2004 6:40 0.24262 41.1042 4 2004-04-01 06:30:00 06:30 MAM
# 10 2004-04-01 06:45:00 4 1 2004 6:45 0.39999 46.6218 4 2004-04-01 06:30:00 06:30 MAM
# 11 2004-04-01 06:50:00 4 1 2004 6:50 0.26229 52.7591 4 2004-04-01 06:30:00 06:30 MAM
# 12 2004-04-01 06:55:00 4 1 2004 6:55 0.26885 67.9498 4 2004-04-01 06:30:00 06:30 MAM
聚合的Groupby Dataframe
# Direct NIP Diffuse PSP (sband corr)
# SEASON HALF_HOUR_DATE
# MAM 2004-04-01 05:30:00 0.019670 1.568700
# 2004-04-01 06:00:00 0.639328 18.734233
# 2004-04-01 06:30:00 0.416385 45.746850