Pandas,将聚合数据帧转换为元组列表

时间:2016-10-07 19:13:40

标签: pandas python-2.x

我正在尝试从熊猫list获取tuples DataFrame。我更习惯于apache-spark之类的其他API,其中DataFrame有一个名为collect的方法,但我搜索了一下并找到了this approach。但结果不是我的预期,我认为这是因为DataFrame汇总了数据。有没有简单的方法呢?

让我说明我的问题:

print(df)

#date       user            Cost       
#2016-10-01 xxxx        0.598111
#           yyyy        0.598150
#           zzzz       13.537223
#2016-10-02 xxxx        0.624247
#           yyyy        0.624302
#           zzzz       14.651441

print(df.values)

#[[  0.59811124]
# [  0.59814985]
# [ 13.53722286]
# [  0.62424731]
# [  0.62430216]
# [ 14.65144134]]

#I was expecting something like this:
[("2016-10-01", "xxxx", 0.598111), 
 ("2016-10-01", "yyyy", 0.598150), 
 ("2016-10-01", "zzzz", 13.537223)
 ("2016-10-02", "xxxx", 0.624247), 
 ("2016-10-02", "yyyy", 0.624302), 
 ("2016-10-02", "zzzz", 14.651441)]

修改

我尝试了@Dervin的建议,但结果并不令人满意。

collected = [for tuple(x) in df.values]

collected

[(0.59811124000000004,), (0.59814985000000032,), (13.53722285999994,),
 (0.62424731000000044,), (0.62430216000000027,), (14.651441339999931,), 
 (0.62414758000000026,), (0.62423407000000042,), (14.655454959999938,)]

1 个答案:

答案 0 :(得分:2)

这是您在那里获得的分层索引,因此首先您可以执行此SO question中的内容,然后执行[tuple(x) for x in df1.to_records(index=False)]之类的操作。例如:

 df1 = pd.DataFrame(np.random.randn(10, 4), columns=['a', 'b', 'c', 'd'])

In [12]: df1
Out[12]: 
          a         b         c         d
0  0.076626 -0.761338  0.150755 -0.428466
1  0.956445  0.769947 -1.433933  1.034086
2 -0.211886 -1.324807 -0.736709 -0.767971
...

In [13]: [tuple(x) for x in df1.to_records(index=False)]
Out[13]: 
[(0.076625682946709128,
  -0.76133754774190276,
  0.15075466312259322,
  -0.42846644471544015),
 (0.95644517961731257,
  0.76994677126920497,
  -1.4339326896803839,
  1.0340857719122247),
 (-0.21188555188408928,
  -1.3248066626301633,
  -0.73670886051415208,
  -0.76797061516159393),
...