如何通过遍历dataframe列为每个列创建一个新的dataframe?

时间:2019-06-05 15:41:12

标签: python python-3.x pandas

我有一个名为data的数据框,其中包含以下列:

'ContextID', 'strategyname', 'Date', 'Time_ms', 'Time_Elapsed', 
'StepID', 'WfrCntSinceLastClean', 'Ar_Flow_sccm', 'BacksGas_Flow_sccm', 
'BacksGas_Prs_Torr', 'EscAct_Curr_A', 'EscAct_Volt_V',
'EscRF_P2P_Volt_V', 'Mano100mTorr_Prs_Torr'

Ar_Flow_sccm中的列均为参数。

我想为每个参数创建一个数据框,并且数据框的列必须为ContextID, the parameter column ,StepID, Time_Elapsed

我确实为此编写了一个函数,如下所示:

def param(df, col_name):
    d = df.loc[:, ['ContextID', col_name, 'StepID', 'Time_Elapsed']]
    return d

当我这样做

BacksGas_Flow_sccm  = param(data, 'BacksGas_Flow_sccm')

我得到一个名为BacksGas_Flow_sccm的数据框,其列为 ContextID, BacksGas_Flow_sccm ,StepID, Time_Elapsed

我可以对所有参数列执行此操作,但是有一种简单的方法可以完成此操作吗?也许通过使用类似的

for col in data.columns[7:]:
    'create the dataframes of the col'

编辑:我数据框的一部分:

 ContextID   strategyname   Date   Time_ms    Time_Elapsed   StepID    WfrCntSinceLastCount    Ar_Flow_sccm     BacksGas_Flow_sccm     BascksGas_Prs_Torr    EscAct_Curr_A    EscAct_Volt_V    EscRF_P2P_Volt_V         Mano100mTorr_Prs_Torr
    7289973 Speed2_Gas_Basics   2018-07-09  0 days 09:12:48.502000000   0.0 1   0   49.560546875    1.953125    1.00000001335143e-10    0.122100122272968   1.22100126743317    12.4542121887207    0.00263671879656613
    7289973 Speed2_Gas_Basics   2018-07-09  0 days 09:12:48.603000000   0.101   2   0   49.560546875    2.05078125  0.00244140625   0.0 0.0 12.4542121887207    0.00234375009313226
    7289973 Speed2_Gas_Basics   2018-07-09  0 days 09:12:48.934000000   0.43200000000000005 2   0   99.853515625    2.05078125  0.00244140625   0.0 0.0 12.4542121887207    0.00234375009313226
    7289973 Speed2_Gas_Basics   2018-07-09  0 days 09:12:49.924000000   1.4220000000000002  2   0   351.318359375   2.05078125  0.00244140625   0.122100122272968   2.44200253486633    12.4542121887207    0.00380859384313226
    7289973 Speed2_Gas_Basics   2018-07-09  0 days 09:12:50.924000000   2.422   2   0   382.8125    1.953125    1.00000001335143e-10    0.122100122272968   0.0 12.4542121887207    0.004321289248764511
    7289973 Speed2_Gas_Basics   2018-07-09  0 days 09:12:51.924000000   3.422   2   0   382.8125    1.7578125   1.00000001335143e-10    0.122100122272968   1.8315018415451 13.1868133544922    0.004321289248764511
    7289973 Speed2_Gas_Basics   2018-07-09  0 days 09:12:52.934000000   4.432   2   0   382.8125    1.7578125   1.00000001335143e-10    0.122100122272968   0.0 12.4542121887207    0.004321289248764511
    7289973 Speed2_Gas_Basics   2018-07-09  0 days 09:12:54.440000000   5.938000000000001   2   0   382.8125    1.85546875  1.00000001335143e-10    0.122100122272968   0.610500633716583   12.4542121887207    0.004321289248764511
    7289973 Speed2_Gas_Basics   2018-07-09  0 days 09:12:54.992000000   6.49    2   0   382.8125    1.7578125   1.00000001335143e-10    0.122100122272968   0.0 12.4542121887207    0.004321289248764511
    7289973 Speed2_Gas_Basics   2018-07-09  0 days 09:12:56.430000000   7.928000000000001   5   0   382.8125    9.08203125  0.13671875  0.122100122272968   1.8315018415451 12.4542121887207    0.00437011709436774
    7289973 Speed2_Gas_Basics   2018-07-09  0 days 09:12:57.440000000   8.938   5   0   382.8125    46.19140625 2.109375    0.122100122272968   3.05250310897827    12.4542121887207    0.00437011709436774
    7289973 Speed2_Gas_Basics   2018-07-09  0 days 09:12:58.440000000   9.938   5   0   382.8125    46.19140625 2.109375    0.122100122272968   0.610500633716583   13.1868133544922    0.00437011709436774

1 个答案:

答案 0 :(得分:1)

IIUC,您可以将功能更改为:

def param(df, col_name):
    d= (df.loc[:, ['ContextID']+
        [col_name]+['StepID', 'Time_Elapsed']])
    return d

然后使用get_loc()

创建数据帧的字典
d={'df_{}'.format(i):param(df,i) 
        for e,i in enumerate(df.iloc[:,df.columns.get_loc('Ar_Flow_sccm'):].columns)}
print(d)

这会将数据帧保存在字典中。键将被命名为df_Ar_Flow_sccm,依此类推..并且值将具有一个df列,例如:['ContextID', 'Ar_Flow_sccm', 'StepID', 'Time_Elapsed']

您可以调用每个dict键以查看df示例:

print(d['df_Ar_Flow_sccm'])

注意df.columns.get_loc('Ar_Flow_sccm')返回7