从蒙版二维数组中提取平均值

时间:2019-03-28 04:31:30

标签: python numpy dataset netcdf

我想从纬度/经度/电导率网格中提取一个12ºx12º的区域,并计算该区域的平均电导率值。我可以成功地在纬度/经度网格上应用蒙版,但是对于电导率网格,同样的方法不起作用。

我已经尝试过使用for循环屏蔽,现在我正在使用numpy.ma.masked_where函数。我可以成功地绘制蒙版结果(即,当我绘制全局图时,可以看到该区域是提取的),但是计算出的平均电导率值对应于非蒙版数据。

我做了一个简单的示例,说明了我想做什么:

x = np.linspace(1, 10, 10)
y = np.linspace(1, 10, 10)

xm = np.median(x)
ym = np.median(y)

x = ma.masked_outside(x, xm-3, xm+3)
y = ma.masked_outside(x, ym-3, ym+3)
x = np.ma.filled(x.astype(float), np.nan)
y = np.ma.filled(y.astype(float), np.nan)

x, y = np.meshgrid(x, y)

z = 2*x + 3*y

z = np.ma.masked_where(np.ma.getmask(x), z)

plt.pcolor(x, y, z)
plt.colorbar()

print('Maximum z:', np.nanmax(z))
print('Minimum z:', np.nanmin(z))
print('Mean z:', np.nanmean(z))

我的代码是:

def Observatory_Cond_Plot(filename, ndcfile, obslon, obslat, obsname, date):

files = np.array(sorted(glob.glob(filename))) #sort txt files containing the 2-D conductivitiy arrays]

filenames = ['January', 'February', 'March', 'April', 'May', 'June', 
             'July', 'August', 'September', 'October', 'November', 'December'] #used for naming output plots and files

for i, fx in zip(filenames, files):

    ndcdata = Dataset(ndcfile) #load netcdf file

    lat = ndcdata.variables['latitude'][:] #import latitude data

    long = ndcdata.variables['longitude'][:] #import longitude data

    cond = np.genfromtxt(fx)

    cond, long = shiftgrid(180., cond, long, start=False) 

    #Mask lat and long arrays and fill masks with nan values

    lat = ma.masked_outside(lat, obslat-12, obslat+12)
    long = ma.masked_outside(long, obslon-12, obslon+12)
    lat = np.ma.filled(lat.astype(float), np.nan)
    long = np.ma.filled(long.astype(float), np.nan)

    longrid, latgrid = np.meshgrid(long, lat)

    cond = np.ma.masked_where(np.ma.getmask(longrid), cond)
    cond = np.ma.filled(cond.astype(float), np.nan)

    condmean = np.nanmean(cond)

    print('Mean Conductivity is:', condmean)
    print('Minimum conductivity is:', np.nanmin(cond))
    print('Maximum conductivity is:', np.nanmax(cond))

之后,其余代码仅绘制数据

我的结果是:

平均电导率为:3.5241649673154587 最小电导率是:0.497494528344129 最大电导率是:5.997825822915771

但是,从tmy图中可以明显看出,该区域的电导率不应低于3.2 S / m。另外,打印纬度,经度和温度网格:

长:

[[nan nan nan ... nan nan nan]
 [nan nan nan ... nan nan nan]
 [nan nan nan ... nan nan nan]
 ...
 [nan nan nan ... nan nan nan]
 [nan nan nan ... nan nan nan]
 [nan nan nan ... nan nan nan]]

lat:

[[nan nan nan ... nan nan nan]
[nan nan nan ... nan nan nan]
[nan nan nan ... nan nan nan]
...
[nan nan nan ... nan nan nan]
[nan nan nan ... nan nan nan]
[nan nan nan ... nan nan nan]]

cond:

[[       nan        nan        nan ...        nan        nan        nan]
[       nan        nan        nan ...        nan        nan        nan]
 [2.86749432 2.86743283 2.86746221 ... 2.87797247 2.87265508 2.87239185]
 ...
 [       nan        nan        nan ...        nan        nan        nan]
 [       nan        nan        nan ...        nan        nan        nan]
 [       nan        nan        nan ...        nan        nan        nan]]

似乎面罩无法正常工作。

2 个答案:

答案 0 :(得分:1)

问题是np.ma.filled的调用将去屏蔽 long变量。另外,np.meshgrid不会保留掩码。

您可以在创建后直接保存蒙版,也可以从蒙版创建网格。我相应地修改了您的示例。可以看到,所有版本的numpy mean都考虑了掩码。我必须调整上限(更改为2),因为平均值已经相等。

x = np.linspace(1, 10, 10)
y = np.linspace(1, 10, 10)

xm = np.median(x)
ym = np.median(y)

# Note: changed limits
x = np.ma.masked_outside(x, xm-3, xm+2)
y = np.ma.masked_outside(x, ym-3, ym+2)
xmask = np.ma.getmask(x)
ymask = np.ma.getmask(y)

x, y = np.meshgrid(x, y)
xmask, ymask = np.meshgrid(xmask, ymask)

z = 2*x + 3*y


z1 = np.ma.masked_where(np.ma.getmask(x), z)
z2 = np.ma.masked_where(xmask | ymask, z)
print(z1)
print(z2)

print('Type z1, z2:', type(z1), type(z2))
print('Maximum z1, z2:', np.nanmax(z1), np.nanmax(z2))
print('Minimum z1, z2:', np.nanmin(z1), np.nanmin(z2))
print('Mean z1, z2:', np.mean(z1), np.mean(z2) )
print('nan Mean z1, z2:', np.nanmean(z1), np.nanmean(z2) )
print('masked Mean z1, z2:', z1.mean(), z2.mean())

答案 1 :(得分:0)

请注意,如果要对经纬度网格进行平均,那么任何简单的均值计算(求和除以总数),例如np.mean都不会给出正确的答案,因为面积随移动而变化走向两极。您需要获取加权平均值,并按cos(lat)加权。

正如您说的那样,您具有netcdf格式的数据,希望您能允许我从命令行使用实用程序气候数据运算符(cdo)提出一种替代解决方案(在ubuntu上,您可以使用sudo apt install cdo进行安装)

提取感兴趣区域:

cdo sellonlatbox,lon1,lon2,lat1,lat2 infile.nc outfile.nc

然后您可以使用

算出正确的加权平均值
cdo fldmean infile.nc outfile.nc

您可以像这样通过管道将两者结合在一起:

cdo fldmean -sellonlatbox,lon1,lon2,lat1,lat2 infile.nc outfile.nc