根据以前条件在另一列中添加行

时间:2017-09-14 23:49:29

标签: python pandas

我是pandas模块的新手。并对数据操作提出了一个简单的问题:

假设我有一个表格如下:

Tool | WeekNumber | Status | Percentage
-----|------------|--------|------------
  M1 |     1      |   good |     85
  M1 |     4      |   bad  |     75
  M1 |     7      |   good |     90

根据状态中的条件,我想添加百分比。

例如:

  1. 如果状态为" good",则后续周数的后续行应全部为100,即下一行应为第2周和第3周,100%

  2. 如果状态为“错误”,则下周数字的百分比应为0,即第5周和第6周为0。

  3. 我对如何处理条件有所了解,但不知道添加行:

    import os, re
    import pandas as pd
    df = pd.read_excel("test.xlsx")
    
    add_rows = []
    for elem in df.Status:
        if elem == "good":
            add_rows.append(100)
        if elem == "bad":
            add_rows.append(0)
    
    df.Percent = pd.Series(add_rows)
    

    但是,这只根据条件给出了三个值,并更改了特定周数的值。但我想要以下内容:

    Tool | WeekNumber | Status | Percentage
    -----|------------|--------|------------
      M1 |     1      |   good |     85
      M1 |     2      |   good |     100
      M1 |     3      |   good |     100
      M1 |     4      |   bad  |     75
      M1 |     5      |   bad  |      0
      M1 |     6      |   bad  |      0
      M1 |     7      |   good |     90
    

5 个答案:

答案 0 :(得分:2)

这是另一个

val = pd.DataFrame({'WeekNumber':np.arange(df['WeekNumber'].min(), df['WeekNumber'].max()+ 1, 1)})
new_df = df.merge(val, on='WeekNumber', how = 'outer').sort_values(by = 'WeekNumber').reset_index(drop = True)
new_df[['Tool', 'Status']] = new_df[['Tool', 'Status']].ffill()
new_df['Percentage'] = np.where((new_df['Status'] == 'good') & 
new_df['Percentage'].isnull(), 100, new_df['Percentage'])
new_df['Percentage'] = new_df['Percentage'].fillna(0)

你得到了

    Tool    WeekNumber  Status  Percentage
0   M1      1           good    85.0
1   M1      2           good    100.0
2   M1      3           good    100.0
3   M1      4           bad     75.0
4   M1      5           bad     0.0
5   M1      6           bad     0.0
6   M1      7           good    90.0

答案 1 :(得分:0)

您可以使用.iterrows()遍历每一行。

for index, row in df.iterrows():
    print row.Status

>>> good
>>> bad
>>> good

如果我需要使用一些粗略的代码,我会使用我的代码:

new_index = 0
new_dict = {}
for index, row in df.iterrows():
    use_index = index + new_index

    new_row[use_index] = {}

    new_row[use_index]= {
        'Tool': row.Tool,
        'WeekNumber': row.WeekNumber,
        'Status': row.Status,
        'Percentage': row.Percentage,
    }

    if row.Percentage == 100:
        for n in range(2):
            add_index = index + 1 + new_index

            new_dict[add_index] = {}

            new_row[add_index]= {
                'Tool': 'M1',
                'WeekNumber': row.WeekNumber + n,
                'Status': 'good',
                'Percentage': 100,
            }

            new_index += 1

df = pd.DataFrame(new_dict)

答案 2 :(得分:0)

你的答案是这样的:

add_rows = []
for index, elem in enumerate(df.Status):
    if elem == "good":

        # assuming data is sorted by 'WeekNumber'
        add_rows.append({'Tool': 'M1', 'WeekNumber': index + 2}) # etc
        add_rows.append({'Tool': 'M1', 'WeekNumber': index + 3}) # etc

more_data = pd.DataFrame(add_rows)
df = pd.concat([df, more_data]).sort_values(by='WeekNumber')

答案 3 :(得分:0)

试试这个?

df=df.set_index('WeekNumber').reindex(range(1,8))
df.Tool.fillna('M1',inplace=True)
df.Status=df.Status.ffill()
df.Percentage.fillna(0,inplace=True)
df.Percentage=np.where((df.Status=='good')&(df.Percentage==0),100,df.Percentage)
df.reset_index()


Out[80]: 
   WeekNumber Tool Status  Percentage
0           1   M1   good        85.0
1           2   M1   good       100.0
2           3   M1   good       100.0
3           4   M1    bad        75.0
4           5   M1    bad         0.0

答案 4 :(得分:0)

您可以先使用set_indexreindex扩展数据框,并填写NaNTool中的Status

In [814]: dff = (df.set_index('WeekNumber')
                   .reindex(range(df.WeekNumber.min(), df.WeekNumber.max()+1))
                   .assign(Tool=lambda x: x.Tool.ffill(),
                           Status=lambda x: x.Status.ffill()))

In [815]: dff
Out[815]:
           Tool Status  Percentage
WeekNumber
1            M1   good        85.0
2            M1   good         NaN
3            M1   good         NaN
4            M1    bad        75.0
5            M1    bad         NaN
6            M1    bad         NaN
7            M1   good        90.0

然后,有条件地填写Percentage

In [816]: dff.loc[(dff.Status == 'good') & dff.Percentage.isnull(), 'Percentage'] = 100

In [817]: dff.loc[(dff.Status == 'bad') & dff.Percentage.isnull(), 'Percentage'] = 0

最后使用reset_index()

In [818]: dff.reset_index()
Out[818]:
   WeekNumber Tool Status  Percentage
0           1   M1   good        85.0
1           2   M1   good       100.0
2           3   M1   good       100.0
3           4   M1    bad        75.0
4           5   M1    bad         0.0
5           6   M1    bad         0.0
6           7   M1   good        90.0