相当于pandas中的str.join

时间:2016-04-07 02:09:43

标签: python pandas

是否有一种干净的方式来连接类似于' '.join成语的任意数量的字符串系列?如果我事先知道我想要的列,我可以做

import pandas as pd
df = pd.DataFrame([['word1','word2', 'word3']])
df[0] + ' ' + df[1] + ' ' + df[2]

0    word1 word2 word3

但是,我不知道将其推广到任意列列表的好方法。我提出的最好的是

cols = [0,1,2]
df[cols[0]].str.cat(df[cols[1:]].values.transpose(), sep = ' ')
0    word1 word2 word3

但我有点讨厌这个解决方案。也许有一种方法可以使用+的重载来实现它?

2 个答案:

答案 0 :(得分:3)

If you don't mind about space at the end of your rows you could use sum which is a bit faster then manually typing df[0] + ' ' + df[1] + ' ' + df[2]:

In [25]: (df + ' ').sum(axis=1)
Out[25]:
0    word1 word2 word3
dtype: object

Hovewer, if you need to strip last space then it becomes slower:

In [26]: (df + ' ').sum(axis=1).str.strip()
Out[26]:
0    word1 word2 word3
dtype: object   

Timing:

In [34]: %timeit (df + ' ').sum(axis=1)
1000 loops, best of 3: 368 us per loop

In [38]: %timeit df[0] + ' ' + df[1] + ' ' + df[2]
1000 loops, best of 3: 482 us per loop

In [40]: %timeit (df + ' ').sum(axis=1).str.strip()
1000 loops, best of 3: 556 us per loop

In [47]: %timeit df[cols[0]].str.cat(df[cols[1:]].values.transpose(), sep = ' ')
1000 loops, best of 3: 870 us per loop

In [49]: %timeit df[[0,1,2]].apply(' '.join, axis=1)
1000 loops, best of 3: 937 us per loop

答案 1 :(得分:1)

选择列后,您可以apply axis=1(此处我会手动指定它们,但您可以使用cols代替):

>>> df = pd.DataFrame([['word1','word2', 'word3']])
>>> df
       0      1      2
0  word1  word2  word3
>>> df[[0,1,2]].apply(' '.join, axis=1)
0    word1 word2 word3
dtype: object