Question

考虑这个简单的例子

df = pd.DataFrame({'dt_one': ['2015-01-01', '2016-02-02'],
              'dt_two': ['2015-01-01', '2016-02-02'],
              'other_col': [1, 2]})

df    
Out[30]: 
       dt_one      dt_two  other_col
0  2015-01-01  2015-01-01          1
1  2016-02-02  2016-02-02          2

我想将pd.to_datetime应用于包含dt_

的所有列

我可以使用filter

轻松完成此操作

df.filter(regex = 'dt_').apply(lambda x: pd.to_datetime(x))
Out[33]: 
      dt_one     dt_two
0 2015-01-01 2015-01-01
1 2016-02-02 2016-02-02

但是，如何在原始数据框中分配这些值？做：

df.filter(regex = 'dt_') = df.filter(regex = 'dt_').apply(lambda x: pd.to_datetime(x))
  File "<ipython-input-34-412d88939494>", line 1
    df.filter(regex = 'dt_') = df.filter(regex = 'dt_').apply(lambda x: pd.to_datetime(x))
SyntaxError: can't assign to function call

不起作用

谢谢！

Answer 1

该方法不起作用，因为df.filter(regex='dt_')是修改过的副本。要为多个列分配数据，您需要从基于索引的选择的实际数据框中选择列，或使用assign将其分配到位。

因此，在过滤之后获取列并在分配之前进行布尔索引，即

df[df.filter(regex = 'dt_').columns] = df.filter(regex = 'dt_').apply(lambda x: pd.to_datetime(x))

     dt_one     dt_two  other_col
0 2015-01-01 2015-01-01          1
1 2016-02-02 2016-02-02          2

Answer 2

您需要分配到已过滤的列：

cols =  df.filter(regex = 'dt_').columns         
df[cols] = df[cols].apply(lambda x: pd.to_datetime(x))
print (df)
      dt_one     dt_two  other_col
0 2015-01-01 2015-01-01          1
1 2016-02-02 2016-02-02          2

或指定由mask选择的列：

m =  df.columns.str.contains('dt_')    
df.loc[:,m] = df.loc[:,m].apply(lambda x: pd.to_datetime(x))
print (df)
      dt_one     dt_two  other_col
0 2015-01-01 2015-01-01          1
1 2016-02-02 2016-02-02          2

Answer 3

您可以对assign使用“解包”：

df_out = df.assign(**df.filter(regex = 'dt_').apply(lambda x: pd.to_datetime(x)))

      dt_one     dt_two  other_col
0 2015-01-01 2015-01-01          1
1 2016-02-02 2016-02-02          2

信息：

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 3 columns):
dt_one       2 non-null datetime64[ns]
dt_two       2 non-null datetime64[ns]
other_col    2 non-null int64
dtypes: datetime64[ns](2), int64(1)
memory usage: 128.0 bytes

Answer 4

您可以像这样分配值。

df['dt_one'],df['dt_two']=df.filter(regex = 'dt_').apply(lambda x: pd.to_datetime(x)).values


df.dtypes
Out[215]: 
dt_one       datetime64[ns]
dt_two       datetime64[ns]
other_col             int64
dtype: object

如何一次修改多个列？

4 个答案: