计算python的30分钟平均值和季节性平均值?

时间:2016-07-15 16:12:21

标签: python excel csv

我想编写一个脚本来计算直接和漫射辐射的30分钟平均值(即12:00,12:30,1:00 ......)。在计算30分钟平均值之后,我需要将数据分成季节(DJF)(MAM)(JJA)(SON)。应省略等于= -99999的值。

这是前几行数据。这是一个非常大的文件,整整多年。

DATE month day year EST Direct NIP Diffuse PSP (sband corr) 4/1/2004 4 1 2004 5:55 0.01967 1.5687 4/1/2004 4 1 2004 6:00 0.2295 5.3946 4/1/2004 4 1 2004 6:05 0.59015 13.0295 4/1/2004 4 1 2004 6:10 0.78686 23.0043 4/1/2004 4 1 2004 6:15 0.60982 20.827 4/1/2004 4 1 2004 6:20 0.80655 23.199 4/1/2004 4 1 2004 6:25 0.81309 26.951 4/1/2004 4 1 2004 6:30 0.77375 31.0062 4/1/2004 4 1 2004 6:35 0.55081 35.04 4/1/2004 4 1 2004 6:40 0.24262 41.1042 4/1/2004 4 1 2004 6:45 0.39999 46.6218 4/1/2004 4 1 2004 6:50 0.26229 52.7591 4/1/2004 4 1 2004 6:55 0.26885 67.9498

关于如何解决这个问题的任何想法?感谢您的支持。

编辑:到目前为止,这是我的代码。它始终计算所有辐射。请注意,这是业余的,因为我在教自己如何编码。谢谢

import csv
import openpyxl
import matplotlib as mpl
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
from datetime import datetime

x = [datetime(year = 2004, month = 4, day = 1),
     datetime(year = 2014, month = 11, day = 18)]
y = []
x2 = []
y2 = []

with open('tenyeardata.csv', 'r') as csvfile:
    data = csv.reader(csvfile)

    firstline = True
    for row in data:
        if firstline:    #skip first line
            firstline = False
            continue

        x.append(int(row[1]))
        y.append(float(row[5]))
        x2.append(int(row[3]))
        y2.append(float(row[6]))


fig = plt.figure()

ax1 = fig.add_subplot(111)

ax1.set_title("North Carolina Radiation (Direct and Diffuse)")    
ax1.set_xlabel('time (hours)')
ax1.set_ylabel('SW (W m-2)')
print x[:10]
print y[:10]
ax1.plot(y, c='r', label='Direct')
ax1.plot(y2, c='b', label = 'Diffuse')
ax1.axis([-1, 568217, 0, 1100])
leg = ax1.legend()
plt.axis([-1, 568217, 0, 1100])
plt.show()

1 个答案:

答案 0 :(得分:0)

考虑计算您需要的地块尺寸:每小时的日期/时间和季节。然后为绘图运行groupby()平均聚合:

from io import StringIO
import pandas as pd
import numpy as np
import time, datetime

data = '''DATE,month,day,year,EST,Direct NIP,Diffuse PSP (sband corr)
4/1/2004,4,1,2004,5:55,0.01967,1.5687
4/1/2004,4,1,2004,6:00,0.2295,5.3946
4/1/2004,4,1,2004,6:05,0.59015,13.0295
4/1/2004,4,1,2004,6:10,0.78686,23.0043
4/1/2004,4,1,2004,6:15,0.60982,20.827
4/1/2004,4,1,2004,6:20,0.80655,23.199
4/1/2004,4,1,2004,6:25,0.81309,26.951
4/1/2004,4,1,2004,6:30,0.77375,31.0062
4/1/2004,4,1,2004,6:35,0.55081,35.04
4/1/2004,4,1,2004,6:40,0.24262,41.1042
4/1/2004,4,1,2004,6:45,0.39999,46.6218
4/1/2004,4,1,2004,6:50,0.26229,52.7591
4/1/2004,4,1,2004,6:55,0.26885,67.9498'''

df = pd.read_csv(StringIO(data))

# ADDED DATE/TIME FIELDS
df['DATE'] = pd.to_datetime(df['DATE'] + ' ' + df['EST'], format='%m/%d/%Y %H:%M')
df['MONTH'] = df['DATE'].dt.month

# EVERY HALF HOUR BLOCKS
df['HALF_HOUR_DATE'] = df['DATE'].apply(lambda dt: datetime.datetime(dt.year, dt.month, dt.day, dt.hour, 30*(dt.minute // 30)))
df['HALF_HOUR_TIME'] = df  apply(lambda x: x.strftime('%H:%M'))

# SEASON CONDITIONAL CALCULATION
df['SEASON'] = np.where(df['MONTH'].isin([12,1,2]), 'DJF',
                        np.where(df['MONTH'].isin([3,4,5]), 'MAM',
                                 np.where(df['MONTH'].isin([6,7,8]), 'JJA',
                                          np.where(df['MONTH'].isin([9,10,11]), 'SON', None))))

# AGGREGATE DATA           
aggdf = df[['SEASON', 'HALF_HOUR_DATE', 'Direct NIP', 'Diffuse PSP (sband corr)']].\
               groupby(['SEASON','HALF_HOUR_DATE']).mean()

<强>输出

更新了数据框

#                   DATE  month  day  year   EST  Direct NIP  Diffuse PSP (sband corr)  MONTH      HALF_HOUR_DATE HALF_HOUR_TIME SEASON
# 0  2004-04-01 05:55:00      4    1  2004  5:55     0.01967                    1.5687      4 2004-04-01 05:30:00          05:30    MAM
# 1  2004-04-01 06:00:00      4    1  2004  6:00     0.22950                    5.3946      4 2004-04-01 06:00:00          06:00    MAM
# 2  2004-04-01 06:05:00      4    1  2004  6:05     0.59015                   13.0295      4 2004-04-01 06:00:00          06:00    MAM
# 3  2004-04-01 06:10:00      4    1  2004  6:10     0.78686                   23.0043      4 2004-04-01 06:00:00          06:00    MAM
# 4  2004-04-01 06:15:00      4    1  2004  6:15     0.60982                   20.8270      4 2004-04-01 06:00:00          06:00    MAM
# 5  2004-04-01 06:20:00      4    1  2004  6:20     0.80655                   23.1990      4 2004-04-01 06:00:00          06:00    MAM
# 6  2004-04-01 06:25:00      4    1  2004  6:25     0.81309                   26.9510      4 2004-04-01 06:00:00          06:00    MAM
# 7  2004-04-01 06:30:00      4    1  2004  6:30     0.77375                   31.0062      4 2004-04-01 06:30:00          06:30    MAM
# 8  2004-04-01 06:35:00      4    1  2004  6:35     0.55081                   35.0400      4 2004-04-01 06:30:00          06:30    MAM
# 9  2004-04-01 06:40:00      4    1  2004  6:40     0.24262                   41.1042      4 2004-04-01 06:30:00          06:30    MAM
# 10 2004-04-01 06:45:00      4    1  2004  6:45     0.39999                   46.6218      4 2004-04-01 06:30:00          06:30    MAM
# 11 2004-04-01 06:50:00      4    1  2004  6:50     0.26229                   52.7591      4 2004-04-01 06:30:00          06:30    MAM
# 12 2004-04-01 06:55:00      4    1  2004  6:55     0.26885                   67.9498      4 2004-04-01 06:30:00          06:30    MAM

聚合的Groupby Dataframe

#                             Direct NIP  Diffuse PSP (sband corr)
# SEASON HALF_HOUR_DATE                                           
# MAM    2004-04-01 05:30:00    0.019670                  1.568700
#        2004-04-01 06:00:00    0.639328                 18.734233
#        2004-04-01 06:30:00    0.416385                 45.746850
相关问题