Question

我有一排这样的Python数据框：

index  Train_station

0      Adenauerplatz 52° 29′ 59″ N, 13° 18′ 26″ O
1      Afrikanische Straße 52° 33′ 38″ N, 13° 20′ 3″ O
2      Alexanderplatz 52° 31′ 17″ N, 13° 24′ 48″ O

我想将其分为三列：火车站，纬度和经度。数据框应如下所示：

index  Train_station         Latitude       Longitude

0      Adenauerplatz         52° 29′ 59″ N  13° 18′ 26″ O
1      Afrikanische Straße   52° 33′ 38″ N  13° 20′ 3″ O
2      Alexanderplatz        52° 31′ 17″ N  13° 24′ 48″ O

我尝试使用 df [[''Latitude'，'Longitude']] = df.Train_station.str.split（'，'，expand = True），但它仅在纬度之间分配和经度坐标。如何用定义的多个条件拆分一列？

我已经考虑过从左开始检查字符串，然后在遇到整数或定义的字符串时将其拆分的方法，但到目前为止，该方法没有找到答案。

Answer 1

您可以利用0 0 0 0 1 0 1 1 1 0 0 0 1 1 0 1 0 0 0 0 1 1 0 0 0方法来分隔字符串中的值。

使用0 0 0 0 -1 0 -1 -1 -1 0 0 0 -1 -1 0 -1 0 0 0 0 -1 -1 0 0 0为每个所需的列名称创建新的数据框列。

.split()

您在上面看到的是对原始数据帧的重新创建，然后用.apply()和import pandas as pd data = ["Adenauerplatz 52° 29′ 59″ N, 13° 18′ 26″ O", "Afrikanische Straße 52° 33′ 38″ N, 13° 20′ 3″ O", "Alexanderplatz 52° 31′ 17″ N, 13° 24′ 48″ O"] df = pd.DataFrame(data, columns=['Train_station']) def train_station(x): x = x.split(' ', 1) return x[0] def latitude(x): x = x.split(' ', 1) x = x[1].split(', ', 1) return x[0] def longitude(x): x = x.split(' ', 1) x = x[1].split(', ', 1) return x[1] df['Latitude'] = df['Train_station'].apply(latitude) df['Longitude'] = df['Train_station'].apply(longitude) df['Train_station'] = df['Train_station'].apply(train_station) print(df)修改

输出：

.split()

Answer 2

df = df.Train_station.str.split(r'(.*?)(\d+°[^,]+),(.*)', expand=True)
print(df.loc[:, 1:3].rename(columns={1:'Train_station', 2:'Latitude', 3:'Longitude'}) )

打印：

          Train_station       Latitude       Longitude
0        Adenauerplatz   52° 29′ 59″ N   13° 18′ 26″ O
1  Afrikanische Straße   52° 33′ 38″ N    13° 20′ 3″ O
2       Alexanderplatz   52° 31′ 17″ N   13° 24′ 48″ O

编辑：谢谢@ALollz，您可以使用str.extract()：

df = df.Train_station.str.extract(r'(?P<Train_station>.*?)(?P<Latitude>\d+°[^,]+),(?P<Longitude>.*)', expand=True)
print(df)

Answer 3

您可以尝试以下操作：

df['Latitude']=df['Train_station'].apply(lambda x: ' '.join([i for i in x.split(' ') if any((lett.replace(',','') in '°′″') for lett in i)]).split(',')[0])
df['Longitude']=df['Train_station'].apply(lambda x: ' '.join([i for i in x.split(' ') if any((lett.replace(',','') in '°′″O') for lett in i)]).split(',')[1])
df['Train_station']=df['Train_station'].apply(lambda x: ''.join([i for i in x.split(' ') if not any((lett.replace(',','') in '°′″') for lett in i) ]))

输出：

               Train_station       Latitude       Longitude
0          Adenauerplatz          52° 29′ 59″ N   13° 18′ 26″ O
1    Afrikanische Straße          52° 33′ 38″ N    13° 20′ 3″ O
2         Alexanderplatz          52° 31′ 17″ N   13° 24′ 48″ O

Answer 4

与@Andrej Kesely的行为类似。

import numpy as np
import pandas as pd

df2=df.Train_station.str.split('(?<=[a-z])(\s)(?![A-Z])|(?<=[A-Z]\,)(\s)|(?<=[A-Z])(\s)', expand=True).replace(' ', np.NaN).dropna(axis='columns')
df2.columns=['Train_station', 'Latitude', 'Longitude']
print(df2)

     Train_station          Latitude      Longitude
0        Adenauerplatz    52° 29′ 59″ N,  13° 18′ 26″ O
1  Afrikanische Straße    52° 33′ 38″ N,   13° 20′ 3″ O
2       Alexanderplatz    52° 31′ 17″ N,  13° 24′ 48″ O

说明

(?<=[a-z])(\s)(?![A-Z])-在小写字母后按空格分割，但后跟大写字母。

OR

(?<=[A-Z]\,)(\s)用大写字母后跟空格，然后用逗号

OR

(?<=[A-Z])(\s)按大写字母后的空格

在Python中将一列拆分为多列

4 个答案: