基于两个数据帧(Pandas)之间的匹配列更新列

时间:2017-08-08 23:44:35

标签: python pandas dataframe

我有一个名为pinkH1_ppm.txt的文件,如下所示:

2.H8 7.61004 0.3
1.H8 8.13712 0.3
3.H6 7.53261 0.3
4.H8 7.49932 0.3
5.H6 7.72158 0.3
7.H8 8.16859 0.3
6.H6 7.70272 0.3
9.H8 8.1053 0.3
8.H6 7.65014 0.3
10.H6 7.5231 0.3
11.H6 7.58213 0.3
12.H6 7.72805 0.3
13.H6 8.02977 0.3
14.H6 7.69624 0.3
15.H8 7.82994 0.3
17.H8 7.24899 0.3
18.H6 7.6439 0.3
20.H8 7.78512 0.3
19.H8 7.65501 0.3
22.H8 7.47677 0.3
23.H6 7.7306 0.3
24.H6 7.80104 0.3
25.H8 7.67295 0.3
26.H6 7.67463 0.3
27.H6 7.64807 0.3
1.H1' 5.8202 0.3
2.H1' 5.90291 0.3
4.H1' 5.74125 0.3
3.H1' 5.54935 0.3
6.H1' 5.54297 0.3
8.H1' 5.238 0.3
11.H1' 5.50093 0.3
10.H1' 5.426 0.3
14.H1' 5.96177 0.3
15.H1' 5.959 0.3
17.H1' 5.75214 0.3
19.H1' 5.681 0.3
22.H1' 5.523 0.3
24.H1' 5.55313 0.3
25.H1' 5.70819 0.3
27.H1' 5.74236 0.3
26.H1' 5.48061 0.3

我有另一个名为pinkH2_ppm.txt的文件,如下所示:

5.H8 7.72158 0.3
2.H8 7.70272 0.3
7.H8 8.16859 0.3
8.H6 7.65014 0.3
9.H8 8.1053 0.3
10.H6 7.5231 0.3
12.H6 7.72805 0.3
13.H6 8.02977 0.3
14.H6 7.69624 0.3
17.H8 7.24899 0.3
16.H8 8.27957 0.3
18.H6 7.6439 0.3
19.H8 7.65501 0.3
20.H8 7.78512 0.3
21.H8 8.06057 0.3
22.H8 7.47677 0.3
23.H6 7.7306 0.3
24.H6 7.80104 0.3
5.H2' 4.2621 0.3
7.H2' 4.54158 0.3
9.H2' 4.50708 0.3
12.H2' 3.76928 0.3
13.H2' 4.67514 0.3
16.H1' 4.52918 0.3
18.H2' 4.71109 0.3
20.H2' 4.63392 0.3
21.H2' 4.65975 0.3
23.H2' 4.27267 0.3

如何检查pinkH1_ppm.txt的第一列值是否等于pinkH2_ppm.txt的第一列值,如果它们相等,则将pinkH2_ppm.txt中第二列的值替换为值pinkH1_ppm.txt中的第二列?

例如,pinkH1_ppm.txt的第一列和第一行中的条目与pinkH2_ppm.txt的第一列和第二行中的条目匹配。由于2.H8是相同的,我想将pinkH2_ppm.txt中的7.70272替换为来自pinkH1_ppm.txt的7.61004,但我不确定如何使用pandas中的ix索引器来完成它。

这是我的代码:

import pandas as pd
import os
import sys
import re

filename = 'pinkH1_ppm.txt'
ppmColor = 'pinkH2_ppm.txt'


df = pd.read_csv(filename, sep = r'\s+', header=None)
df=df.ix[:, [0,1]]
color = pd.read_csv(ppmColor, sep = r'\s+', header=None, names = ('Atom','ppm','x'))

df.set_index(0,inplace=True)
color.set_index('Atom',inplace=True)
color.update(df)

color.to_csv(ppmColor,sep=" ", header = False)

1 个答案:

答案 0 :(得分:1)

filename = 'pinkH1_ppm.txt'
ppmColor = 'pinkH2_ppm.txt'

df = pd.read_csv(filename, sep = r'\s+', header=None, names=('Atom','ppm', 'x'))
color = pd.read_csv(ppmColor, sep = r'\s+', header=None, names=('Atom','ppm', 'x')) 

color = pd.merge(color, df.loc[:, ['Atom','ppm']], how='left', on='Atom')

合并后,因为有两列具有相同的名称' ppm'他们被改为' ppm_x'和' ppm_y'

l = color[~color.loc[:,'ppm_y'].isnull()].index.tolist()
color.loc[l,'ppm_x'] = color.loc[l,'ppm_y']
color.drop('ppm_y',axis =1,inplace=True)
color.rename(index=str,columns={"ppm_x": "ppm"})