我正在迭代一个数据帧并尝试在第一个数据帧中的一行是特定值时从不同的数据帧追加新值。
考虑以下两个数据帧:
print(full_df)
AccessName PolicyArn
0 arn:aws:glue:sample arn:aws:iam::971340810992:policy/service-role/...
1 arn:aws:glue:sample2 arn:aws:iam::971340810992:policy/service-role/...
2 --- arn:aws:iam::971340810992:policy/service-role/...
3 arn:aws:s3:::sample3 arn:aws:iam::971340810992:policy/service-role/...
print(side_df)
AccessName
0 sample-test
1 query_sample
2 us-east-1-sample
如果 AccessName
中的 full_df
是某个值,则将 side_df
附加到 full_df
,使第二行的值始终为 arn
对于所有行。
arn = 'fixed_value'
for index, row in full_df.iterrows():
if row['AccessName'] == '---':
#Here I don't know how I'd define the code to append the side_df values:
#full_df['AccessName'] = side_df['AccessName']
#full_df['PolicyArn'] = arn
是否最好在此时再做一个 if 语句并迭代 side_df 值并逐行追加?
这个for函数嵌套在实际代码中,arn
是动态生成的。
所需的输出:
AccessName PolicyArn
0 arn:aws:glue:sample arn:aws:iam::971340810992:policy/service-role/...
1 arn:aws:glue:sample2 arn:aws:iam::971340810992:policy/service-role/...
2 --- arn:aws:iam::971340810992:policy/service-role/...
3 sample-test fixed_value
4 query_sample fixed_value
5 us-east-1-sample fixed_value
6 arn:aws:s3:::sample3 arn:aws:iam::971340810992:policy/service-role/...
编写此代码的最佳方法是什么?
答案 0 :(得分:1)
您可以重新索引并附加数据帧,然后 setScrollEnabled
如下:
初始化:
fillna()
解决方案
df = pd.read_csv(io.StringIO(''' AccessName PolicyArn
0 arn:aws:glue:sample arn:aws:iam::971340810992:policy/service-role/...
1 arn:aws:glue:sample2 arn:aws:iam::971340810992:policy/service-role/...
2 --- arn:aws:iam::971340810992:policy/service-role/...
3 arn:aws:s3:::sample3 arn:aws:iam::971340810992:policy/service-role/...
'''),sep='\s+')
df2 = pd.read_csv(io.StringIO(''' AccessName
0 sample-test
1 query_sample
2 us-east-1-sample'''),sep='\s+')
重要提示:
通常对于大型数据集,不建议寻找涉及迭代的解决方案,尽量寻找矢量化解决方案,并避免使用 id = df.index[df['AccessName'] == '---'][0] +1
start, end = id + df2.shape[0],df2.shape[0] + df.shape[0]
df.index = np.append(df.index[:id],np.arange(start,end)) # index : [0, 1, 2, 6]
df2.index = np.arange(id,id +df2.shape[0]) # index [3, 4, 5]
solution_df = df.append(df2).sort_index().fillna('fixed_value')
solution_df
>>> AccessName PolicyArn
0 arn:aws:glue:sample arn:aws:iam::971340810992:policy/service-role/...
1 arn:aws:glue:sample2 arn:aws:iam::971340810992:policy/service-role/...
2 --- arn:aws:iam::971340810992:policy/service-role/...
3 sample-test fixed_value
4 query_sample fixed_value
5 us-east-1-sample fixed_value
6 arn:aws:s3:::sample3 arn:aws:iam::971340810992:policy/service-role/...
和 .iterrows()
之类的方法。祝你好运!
答案 1 :(得分:1)
首先让我们构建您的 side_df
:
side_df = pd.DataFrame([['sample-test'], ['query_sample'], ['us-east-1-sample']]
, columns=['AccessName'])
fixed_series = pd.Series(['fixed_value'] * len(side_df), name='PolicyArn').to_frame()
side_df_extended = pd.concat([side_df, fixed_series], axis=1)
print(side_df_extended)
AccessName PolicyArn
0 sample-test fixed_value
1 query_sample fixed_value
2 us-east-1-sample fixed_value
假设 full_df
如下:
AccessName PolicyArn
0 arn:aws:glue:sample arn:aws:iam::971340810992:policy/service-role/
1 arn:aws:glue:sample2 arn:aws:iam::971340810992:policy/service-role/
2 arn:aws:s3:::sample3 arn:aws:iam::971340810992:policy/service-role/
3 arn:aws:glue:sample4 arn:aws:iam::971340810992:policy/service-role/
4 arn:aws:s3:::sample5 arn:aws:iam::971340810992:policy/service-role/
现在让我们获取具有您的条件的行的索引,例如:
indices = full_df['AccessName'] == 'arn:aws:glue:sample2'
rows = full_df[indices].index.tolist()
rows
[1]
现在,您想在条件出现后附加 side_df
:
final_df = pd.concat([full_df.iloc[:(rows[0] + 1)], side_df_extended, full_df.iloc[(rows[0] + 1):]], ignore_index=True)
final_df
AccessName PolicyArn
0 arn:aws:glue:sample arn:aws:iam::971340810992:policy/service-role/
1 arn:aws:glue:sample2 arn:aws:iam::971340810992:policy/service-role/
2 sample-test fixed_value
3 query_sample fixed_value
4 us-east-1-sample fixed_value
5 arn:aws:s3:::sample3 arn:aws:iam::971340810992:policy/service-role/
6 arn:aws:glue:sample4 arn:aws:iam::971340810992:policy/service-role/
7 arn:aws:s3:::sample5 arn:aws:iam::971340810992:policy/service-role/