使用两个现有列创建并填充Pandas数据框列

时间:2018-08-30 05:05:47

标签: python pandas

我的数据框有4列,外观如下。

我所拥有的:

ID  start_date  end_date    active
1,111   6/30/2015   8/6/1904    1 to 10
1,111   6/28/2016   3/30/1905   1 to 10
1,111   7/31/2017   6/6/1905    1 to 10
1,111   7/31/2018   6/6/1905    1 to 9
1,111   5/31/2019   12/4/1904   1 to 9
3,033   3/31/2015   5/18/1908   3 to 7
3,033   3/31/2016   11/24/1905  3 to 7
3,033   3/31/2017   1/20/1906   3 to 7
3,033   3/31/2018   1/8/1906    2 to 7
3,033   4/4/2019    2200,0  2 to 8

我想基于“活动”列的值再生成10个列,如下所示。有没有一种方法可以有效地填充它。

我想实现的目标

ID  start_date  end_date    active  Type 1  Type 2  Type 3  Type 4  Type 5  Type 6  Type 7  Type 8  Type 9  Type 10
1,111   6/30/2015   8/6/1904    1 to 10 1   1   1   1   1   1   1   1   1   1
1,111   6/28/2016   3/30/1905   1 to 10 1   1   1   1   1   1   1   1   1   1
1,111   7/31/2017   6/6/1905    1 to 10 1   1   1   1   1   1   1   1   1   1
1,111   7/31/2018   6/6/1905    1 to 9  1   1   1   1   1   1   1   1   1   
1,111   5/31/2019   12/4/1904   1 to 9  1   1   1   1   1   1   1   1   1   
3,033   3/31/2015   5/18/1908   3 to 7          1   1   1   1   1           
3,033   3/31/2016   11/24/1905  3 to 7          1   1   1   1   1           
3,033   3/31/2017   1/20/1906   3 to 7          1   1   1   1   1           
3,033   3/31/2018   1/8/1906    2 to 7      1   1   1   1   1   1           
3,033   4/4/2019    2200,0  2 to 8      1   1   1   1   1   1   1       

2 个答案:

答案 0 :(得分:2)

np.arange使用自定义功能:

def f(x):
    a = list(map(int, x.split(' to ')))
    return pd.Series(1, index= np.arange(a[0], a[1] + 1))

df = df.join(df['active'].apply(f).add_prefix('Type '))
print (df)
      ID start_date    end_date   active  Type 1  Type 2  Type 3  Type 4  \
0  1,111  6/30/2015    8/6/1904  1 to 10     1.0     1.0     1.0     1.0   
1  1,111  6/28/2016   3/30/1905  1 to 10     1.0     1.0     1.0     1.0   
2  1,111  7/31/2017    6/6/1905  1 to 10     1.0     1.0     1.0     1.0   
3  1,111  7/31/2018    6/6/1905   1 to 9     1.0     1.0     1.0     1.0   
4  1,111  5/31/2019   12/4/1904   1 to 9     1.0     1.0     1.0     1.0   
5  3,033  3/31/2015   5/18/1908   3 to 7     NaN     NaN     1.0     1.0   
6  3,033  3/31/2016  11/24/1905   3 to 7     NaN     NaN     1.0     1.0   
7  3,033  3/31/2017   1/20/1906   3 to 7     NaN     NaN     1.0     1.0   
8  3,033  3/31/2018    1/8/1906   2 to 7     NaN     1.0     1.0     1.0   
9  3,033   4/4/2019      2200,0   2 to 8     NaN     1.0     1.0     1.0   

   Type 5  Type 6  Type 7  Type 8  Type 9  Type 10  
0     1.0     1.0     1.0     1.0     1.0      1.0  
1     1.0     1.0     1.0     1.0     1.0      1.0  
2     1.0     1.0     1.0     1.0     1.0      1.0  
3     1.0     1.0     1.0     1.0     1.0      NaN  
4     1.0     1.0     1.0     1.0     1.0      NaN  
5     1.0     1.0     1.0     NaN     NaN      NaN  
6     1.0     1.0     1.0     NaN     NaN      NaN  
7     1.0     1.0     1.0     NaN     NaN      NaN  
8     1.0     1.0     1.0     NaN     NaN      NaN  
9     1.0     1.0     1.0     1.0     NaN      NaN   

类似:

def f(x):
    a = list(map(int, x.split(' to ')))
    return pd.Series(1, index= np.arange(a[0], a[1] + 1))

df = df.join(df['active'].apply(f).add_prefix('Type ').fillna(0).astype(int))
print (df)
      ID start_date    end_date   active  Type 1  Type 2  Type 3  Type 4  \
0  1,111  6/30/2015    8/6/1904  1 to 10       1       1       1       1   
1  1,111  6/28/2016   3/30/1905  1 to 10       1       1       1       1   
2  1,111  7/31/2017    6/6/1905  1 to 10       1       1       1       1   
3  1,111  7/31/2018    6/6/1905   1 to 9       1       1       1       1   
4  1,111  5/31/2019   12/4/1904   1 to 9       1       1       1       1   
5  3,033  3/31/2015   5/18/1908   3 to 7       0       0       1       1   
6  3,033  3/31/2016  11/24/1905   3 to 7       0       0       1       1   
7  3,033  3/31/2017   1/20/1906   3 to 7       0       0       1       1   
8  3,033  3/31/2018    1/8/1906   2 to 7       0       1       1       1   
9  3,033   4/4/2019      2200,0   2 to 8       0       1       1       1   

   Type 5  Type 6  Type 7  Type 8  Type 9  Type 10  
0       1       1       1       1       1        1  
1       1       1       1       1       1        1  
2       1       1       1       1       1        1  
3       1       1       1       1       1        0  
4       1       1       1       1       1        0  
5       1       1       1       0       0        0  
6       1       1       1       0       0        0  
7       1       1       1       0       0        0  
8       1       1       1       0       0        0  
9       1       1       1       1       0        0  

另一种非循环解决方案-想法是删除重复项,使用get_dummiesreindex创建新行以添加缺少的列,最后通过多个cumsum ed值添加1

df1 = (df.set_index('active', drop=False)
        .pop('active')
        .drop_duplicates()
        .str.get_dummies(' to '))

df1.columns = df1.columns.astype(int)
df1 = df1.reindex(columns=np.arange(df1.columns.min(),df1.columns.max() + 1), fill_value=0)
df1 = (df1.cumsum(axis=1) * df1.iloc[:, ::-1].cumsum(axis=1)).clip_upper(1)
print (df1)
         1   2   3   4   5   6   7   8   9   10
active                                         
1 to 10   1   1   1   1   1   1   1   1   1   1
1 to 9    1   1   1   1   1   1   1   1   1   0
3 to 7    0   0   1   1   1   1   1   0   0   0
2 to 7    0   1   1   1   1   1   1   0   0   0
2 to 8    0   1   1   1   1   1   1   1   0   0

df = df.join(df1.add_prefix('Type '), on='active')
print (df)

      ID start_date    end_date   active  Type 1  Type 2  Type 3  Type 4  \
0  1,111  6/30/2015    8/6/1904  1 to 10       1       1       1       1   
1  1,111  6/28/2016   3/30/1905  1 to 10       1       1       1       1   
2  1,111  7/31/2017    6/6/1905  1 to 10       1       1       1       1   
3  1,111  7/31/2018    6/6/1905   1 to 9       1       1       1       1   
4  1,111  5/31/2019   12/4/1904   1 to 9       1       1       1       1   
5  3,033  3/31/2015   5/18/1908   3 to 7       0       0       1       1   
6  3,033  3/31/2016  11/24/1905   3 to 7       0       0       1       1   
7  3,033  3/31/2017   1/20/1906   3 to 7       0       0       1       1   
8  3,033  3/31/2018    1/8/1906   2 to 7       0       1       1       1   
9  3,033   4/4/2019      2200,0   2 to 8       0       1       1       1   

   Type 5  Type 6  Type 7  Type 8  Type 9  Type 10  
0       1       1       1       1       1        1  
1       1       1       1       1       1        1  
2       1       1       1       1       1        1  
3       1       1       1       1       1        0  
4       1       1       1       1       1        0  
5       1       1       1       0       0        0  
6       1       1       1       0       0        0  
7       1       1       1       0       0        0  
8       1       1       1       0       0        0  
9       1       1       1       1       0        0  

答案 1 :(得分:2)

def f(s):
  a, b = map(int, s.split('to'))
  return '|'.join(map(str, range(a, b + 1)))

df.drop('active', 1).join(df.active.apply(f).str.get_dummies().add_prefix('Type '))

      ID start_date    end_date  Type 1  Type 10  Type 2  Type 3  Type 4  Type 5  Type 6  Type 7  Type 8  Type 9
0  1,111  6/30/2015    8/6/1904       1        1       1       1       1       1       1       1       1       1
1  1,111  6/28/2016   3/30/1905       1        1       1       1       1       1       1       1       1       1
2  1,111  7/31/2017    6/6/1905       1        1       1       1       1       1       1       1       1       1
3  1,111  7/31/2018    6/6/1905       1        0       1       1       1       1       1       1       1       1
4  1,111  5/31/2019   12/4/1904       1        0       1       1       1       1       1       1       1       1
5  3,033  3/31/2015   5/18/1908       0        0       0       1       1       1       1       1       0       0
6  3,033  3/31/2016  11/24/1905       0        0       0       1       1       1       1       1       0       0
7  3,033  3/31/2017   1/20/1906       0        0       0       1       1       1       1       1       0       0
8  3,033  3/31/2018    1/8/1906       0        0       1       1       1       1       1       1       0       0
9  3,033   4/4/2019      2200,0       0        0       1       1       1       1       1       1       1       0