Question

假设我有两个数据框d1和d2

d1 = pd.DataFrame(np.ones((3, 3), dtype=int), list('abc'), [0, 1, 2])
d2 = pd.DataFrame(np.zeros((3, 2), dtype=int), list('abc'), [3, 4])

交织两个数据帧列的简单且通用的方法是什么。我们可以假设d2中的列数总是比d1中的列数少一个。并且，指数是相同的。

我想要这个：

pd.concat([d1[0], d2[3], d1[1], d2[4], d1[2]], axis=1)

   0  3  1  4  2
a  1  0  1  0  1
b  1  0  1  0  1
c  1  0  1  0  1

Answer 1

使用pd.concat合并数据框，并toolz.interleave重新排列列：

from toolz import interleave

pd.concat([d1, d2], axis=1)[list(interleave([d1, d2]))]

结果输出符合预期：

   0  3  1  4  2
a  1  0  1  0  1
b  1  0  1  0  1
c  1  0  1  0  1

Answer 2

这是一种NumPy方法 -

def numpy_interweave(d1, d2):
    c1 = list(d1.columns)
    c2 = list(d2.columns)
    N = (len(c1)+len(c2))
    cols = [None]*N
    cols[::2] = c1
    cols[1::2] = c2

    out_dtype = np.result_type(d1.values.dtype, d2.values.dtype)
    out = np.empty((d1.shape[0],N),dtype=out_dtype)
    out[:,::2] = d1.values
    out[:,1::2] = d2.values

    df_out = pd.DataFrame(out, columns=cols, index=d1.index)
    return df_out

示例运行 -

In [346]: d1
Out[346]: 
   x  y  z
a  6  7  4
b  3  5  6
c  4  6  2

In [347]: d2
Out[347]: 
   p  q
a  4  2
b  7  7
c  7  2

In [348]: numpy_interweave(d1, d2)
Out[348]: 
   x  p  y  q  z
a  6  4  7  2  4
b  3  7  5  7  6
c  4  7  6  2  2

Answer 3

交织列：

c = np.empty((d1.columns.size + d2.columns.size,), dtype=object)
c[0::2], c[1::2] = d1.columns, d2.columns

现在，使用布尔索引进行连接并重新排序：

d1.join(d2)[c]

   0  3  1  4  2
a  1  0  1  0  1
b  1  0  1  0  1
c  1  0  1  0  1

处理多个数据帧时，您可能更喜欢pd.concat。

Answer 4

编写一个函数来抽象出泛型合并重新排序

from itertools import zip_longest
def weave(df1, df2):
  col1 = df1.columns
  col2 = df2.columns
  weaved =  [col for zipped in zip_longest(col1,col2) 
                 for col in zipped
                 if col is not None]
  return pd.concat([df1, df2], axis=1)[weaved]

weave(d1, d2)
# Output:
   0  3  1  4  2
a  1  0  1  0  1
b  1  0  1  0  1
c  1  0  1  0  1

Answer 5

我们可以使用itertools.zip_longest：

In [75]: from itertools import zip_longest

In [76]: cols = pd.Series(np.concatenate(list(zip_longest(d1.columns, d2.columns)))).dropna()

In [77]: cols
Out[77]:
0    0
1    3
2    1
3    4
4    2
dtype: object

In [78]: df = pd.concat([d1, d2], axis=1)[cols]

In [79]: df
Out[79]:
   0  3  1  4  2
a  1  0  1  0  1
b  1  0  1  0  1
c  1  0  1  0  1

Answer 6

我的解决方案是使用pd.DataFrame.insert确保从后面插入

df = d1.copy()
for i in range(d2.shape[1], 0, -1):
    df.insert(i, d2.columns[i - 1], d2.iloc[:, i - 1])

df

   0  3  1  4  2
a  1  0  1  0  1
b  1  0  1  0  1
c  1  0  1  0  1

Answer 7

roundrobin itertools配方具有交错特性。此选项提供从Python docs直接实施配方或导入为您实现配方的more_itertools的第三方包之间的选择：

from more_itertools import roundrobin

pd.concat([d1, d2], axis=1)[list(roundrobin(d1, d2))]

# Output
   0  3  1  4  2
a  1  0  1  0  1
b  1  0  1  0  1
c  1  0  1  0  1

受@ root＆＃39的回答启发，列索引是交错的，用于对连接的DataFrame进行切片。

交织两个数据帧

7 个答案: