Python:合并/连接两个数据帧

时间:2014-02-22 00:22:43

标签: python join merge pandas

我正在尝试合并/加入两个数据帧,每个数据帧有三个键(Age,Gender和Signed_In)。两个数据帧都具有相同的父级,由groupby创建,但具有唯一值列。

似乎合并/连接应该是无痛的,因为在两个数据帧之间共享唯一的组合键。我在尝试“合并”和“加入”时必须考虑到一些简单的错误,但不能为我的生活解决它。

times = pd.read_csv('nytimes.csv')

# Produces times_mean table consisting of two value columns, avg_impressions and avg_clicks
times_mean = times.groupby(['Age','Gender','Signed_In']).mean()
times_mean.columns = ['avg_impressions', 'avg_clicks']

# Produces times_max table consisting of two value columns, max_impressions and max_clicks
times_max = times.groupby(['Age','Gender','Signed_In']).max()
times_max.columns = ['max_impressions', 'max_clicks']

# Following intended to produce combined table with four value columns
times_join = times_mean.join(times_max, on = ['Age', 'Gender', 'Signed_In'])
times_join2 = pd.merge(times_mean, times_max, on=['Age', 'Gender', 'Signed_In'])

1 个答案:

答案 0 :(得分:0)

加入等效结构on时,您不需要MultiIndex kwarg

以下是一个证明这一点的例子:

import numpy as np
import pandas

a = np.random.normal(size=10)
b = a + 10
index = pandas.MultiIndex.from_product([['A', 'B'], list('abcde')])

df_a = pandas.DataFrame(a, index=index, columns=['colA'])
df_b = pandas.DataFrame(b, index=index, columns=['colB'])

df_a.join(df_b)

这给了我:

    colA       colB
A a -1.525376   8.474624
  b  0.778333  10.778333
  c  1.153172  11.153172
  d  0.966560  10.966560
  e  0.089765  10.089765
B a  0.717717  10.717717
  b  0.305545  10.305545
  c  0.123548  10.123548
  d -1.018660   8.981340
  e -0.635103   9.364897