我有以下代码:
import pandas as pd
df = pd.DataFrame(
{'Index' : ['1', '2', '5','7', '8', '9', '10'],
'Vals' : [1, 2, 3, 4, np.nan, np.nan, 5]})
这给了我:
Index Vals
0 1 1.0
1 2 2.0
2 5 3.0
3 7 4.0
4 8 NaN
5 9 NaN
6 10 5.0
但我想要的是这样的:
Index Vals
0 1 1.000000
1 2 2.000000
2 3 NaN
3 4 NaN
4 5 3.000000
5 6 NaN
6 7 4.000000
7 8 NaN
8 9 NaN
9 10 5.000000
我尝试通过创建具有连续索引的新数据框来实现此目的。然后我想分配我已经拥有的值,但如何?到目前为止我唯一能做的就是:
clean_data = pd.DataFrame({'Index' : range(1,11)})
这给了我:
Index
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
9 10
答案 0 :(得分:3)
因此,对于您的示例,它将如下所示:
import pandas as pd
import numpy as np
df = pd.DataFrame(
{'Index' : ['1', '2', '5','7', '8', '9', '10'],
'Vals' : [1, 2, 3, 4, np.nan, np.nan, 5]})
df['Index'] = df['Index'].astype(int)
clean_data = pd.DataFrame({'Index' : range(1,11)})
result = clean_data.merge(df,on="Index",how='outer')
结果是:
Index Vals
0 1 1.0
1 2 2.0
2 3 NaN
3 4 NaN
4 5 3.0
5 6 NaN
6 7 4.0
7 8 NaN
8 9 NaN
9 10 5.0
答案 1 :(得分:1)
您可以将Index
列放在索引中(在转换为整数后),选择行1
到10
(这将创建相应的NaN
s)并重置索引。
import numpy as np
import pandas as pd
df = pd.DataFrame(
{'Index' : ['1', '2', '5','7', '8', '9', '10'],
'Vals' : [1, 2, 3, 4, np.nan, np.nan, 5]})
df['Index'] = df['Index'].astype(int)
df = df.set_index('Index').loc[range(1, 11)].reset_index()
输出:
Index Vals
0 1 1.0
1 2 2.0
2 3 NaN
3 4 NaN
4 5 3.0
5 6 NaN
6 7 4.0
7 8 NaN
8 9 NaN
9 10 5.0