我正在处理天气数据,但我仍在学习如何有效地使用熊猫...我有一个熊猫数据集,其中有一排风速和风向已格式化。问题是路线的字符串格式。风速和方向列df ['WindDirSpeed']的当前格式如下:
IssueDatetime Regions \
0 2018-01-01 06:00:00 SOUTH COAST
1 2018-01-01 06:00:00 SOUTH COAST
2 2018-01-01 06:00:00 SOUTH COAST
3 2018-01-01 06:00:00 SOUTH COAST
4 2018-01-01 06:00:00 EAST COAST-CAPE ST FRANCIS AND SOUTH
... ... ...
12833 2018-12-30 06:00:00 SOUTHEASTERN GRAND BANKS
12834 2018-12-30 06:00:00 SOUTHEASTERN GRAND BANKS
12835 2018-12-30 06:00:00 SOUTHEASTERN GRAND BANKS
12836 2018-12-30 06:00:00 SOUTHWESTERN GRAND BANKS
12837 2018-12-30 06:00:00 SOUTHWESTERN GRAND BANKS
forecastTime forecastHour WindDirSpeed
0 2018-01-01 06:00:00 0.0 SW35
1 2018-01-01 15:00:00 9.0 SW25
2 2018-01-02 08:00:00 26.0 SW15-20
3 2018-01-02 15:00:00 33.0 VRB10-15
4 2018-01-01 06:00:00 0.0 SW35
... ... ... ...
12833 2018-12-30 06:00:00 0.0 W25
12834 2018-12-30 09:00:00 3.0 W25
12835 2018-12-30 18:00:00 12.0 NW35
12836 2018-12-30 06:00:00 0.0 W25
12837 2018-12-30 12:00:00 6.0 NW30
我试图编写一个函数来提取方向并仅创建具有风向的新行:
def find_windDir(row):
directions = ['VRB', 'N', 'NE', 'E', 'SE', 'S', 'SW', 'W', 'NW']
for d in directions:
if d in row['WindDirSpeed']:
row['dir'] = d
row['WindSpeed'] = row['WindDirSpeed'].replace(d,'')
return row
不幸的是,这不起作用,因为“行中”会找到方向字符串的所有变体。
理想情况下,我需要将数据集与风速和风向分别放在不同的列中:
Dir WindSpeed
SW 35
SW 25
SW 15-20
答案 0 :(得分:2)
尝试一下:
df['Dir'] = df['WindDirSpeed'].str.extract(r'([A-Z]*)')
df['WindSpeed'] = df['WindDirSpeed'].str.extract(r'([0-9]+\-[0-9]+|[0-9]+)')
print(df)
输出:
forecastTime forecastHour WindDirSpeed Dir WindSpeed
2018-01-01 06:00:00 0.0 SW35 SW 35
2018-01-01 15:00:00 9.0 SW25 SW 25
2018-01-02 08:00:00 26.0 SW15-20 SW 15-20
2018-01-02 15:00:00 33.0 VRB10-15 VRB 10-15
2018-01-01 06:00:00 0.0 SW35 SW 35
2018-12-30 06:00:00 0.0 W25 W 25
2018-12-30 09:00:00 3.0 W25 W 25
2018-12-30 18:00:00 12.0 NW35 NW 35
2018-12-30 06:00:00 0.0 W25 W 25
2018-12-30 12:00:00 6.0 NW30 NW 30