pandas dataframes乘法有或没有广播

时间:2017-08-06 22:03:56

标签: python pandas dataframe numpy-broadcasting

I have 2 dataframes:

>>> type(c)
Out[118]: pandas.core.frame.DataFrame
>>> type(N)
Out[119]: pandas.core.frame.DataFrame

>>> c
Out[114]: 
                       t
2017-06-01 01:06:00 1.00
2017-06-01 01:13:00 1.00
2017-06-01 02:09:00 1.00
2017-06-26 22:47:00 1.00

>>> N
Out[115]: 
                       0    1
2017-06-01 01:06:00 1.00 1.00
2017-06-01 01:13:00 1.00 1.00
2017-06-01 02:09:00 1.00 1.00
2017-06-26 22:47:00 1.00 1.00

我需要将这些相乘以得到一个4,2数据帧,它是N元素的每一列与C的乘法。我尝试了以下4种方法,但没有运气:

>>> N.multiply(c, axis='index')
Out[116]: 
                      0   1   t
2017-06-01 01:06:00 nan nan nan
2017-06-01 01:13:00 nan nan nan
2017-06-01 02:09:00 nan nan nan
2017-06-26 22:47:00 nan nan nan

>>> c[:]*N
Out[98]: 
                      0   1   t
2017-06-01 01:06:00 nan nan nan
2017-06-01 01:13:00 nan nan nan
2017-06-01 02:09:00 nan nan nan
2017-06-26 22:47:00 nan nan nan

>>> c*N
Out[99]: 
                      0   1   t
2017-06-01 01:06:00 nan nan nan
2017-06-01 01:13:00 nan nan nan
2017-06-01 02:09:00 nan nan nan
2017-06-26 22:47:00 nan nan nan

>>> c[:, None]*N
Traceback (most recent call last):

  File "C:\...pandas\core\frame.py", line 1797, in __getitem__
    return self._getitem_column(key)
  File "C:\...core\frame.py", line 1804, in _getitem_column
    return self._get_item_cache(key)
  File "C:\...core\generic.py", line 1082, in _get_item_cache
    res = cache.get(item)
TypeError: unhashable type

有没有办法,无论有无广播,都可以轻松完成这项工作?

1 个答案:

答案 0 :(得分:3)

问题是你传递了一个DataFrame,所以它也试图匹配列名。如果您对列t进行切片,它将成为一个系列,它将适当地广播:

N.mul(c['t'], axis=0)
Out: 
                       0    1
2017-06-01 01:06:00  1.0  1.0
2017-06-01 01:13:00  1.0  1.0
2017-06-01 02:09:00  1.0  1.0
2017-06-26 22:47:00  1.0  1.0

对于numpy数组,您不需要指定任何内容。对于(4,2)和(4,1)形状,numpy将看到具有相同长度的轴并相应地进行广播。

考虑以下DataFrame:

N
Out: 
                       0    1
2017-06-01 01:06:00  1.0  2.0
2017-06-01 01:13:00  6.0  5.0
2017-06-01 02:09:00  4.0  3.0
2017-06-26 22:47:00  4.0  7.0


c
Out: 
                       t
2017-06-01 01:06:00  6.0
2017-06-01 01:13:00  2.0
2017-06-01 02:09:00  8.0
2017-06-26 22:47:00  2.0

您可以使用.values属性访问基础数组

N.values * c.values
Out: 
array([[  6.,  12.],
       [ 12.,  10.],
       [ 32.,  24.],
       [  8.,  14.]])

会给你与

相同的结果
N.mul(c['t'], axis=0)
Out: 
                        0     1
2017-06-01 01:06:00   6.0  12.0
2017-06-01 01:13:00  12.0  10.0
2017-06-01 02:09:00  32.0  24.0
2017-06-26 22:47:00   8.0  14.0

但是由于整个操作都是笨拙的,你会失去标签。