Question

关注this recipe。我试图通过包含字符串'+'的列名来过滤数据帧。这是一个例子：

B = pd.DataFrame([[1, 5, 2], [2, 4, 4], [3, 3, 1], [4, 2, 2], [5, 1, 4]],
                columns=['A', '+B', '+C'], index=[1, 2, 3, 4, 5])

所以我想要一个只有'+ B'和'+ C'列的数据帧C.

C = B.filter(regex='+')

然而我收到错误：

File "c:\users\hernan\anaconda\lib\site-packages\pandas\core\generic.py", line 1888, in filter
matcher = re.compile(regex)
File "c:\users\hernan\anaconda\lib\re.py", line 190, in compile
return _compile(pattern, flags)
File "c:\users\hernan\anaconda\lib\re.py", line 244, in _compile
raise error, v # invalid expression
error: nothing to repeat

配方说它是Python 3.我使用python 2.7。但是，我认为这不是问题所在。

埃尔南

Answer 1

+在正则表达式中具有特殊含义（请参阅here）。您可以使用\转义它：

>>> C = B.filter(regex='\+')
>>> C
   +B  +C
1   5   2
2   4   4
3   3   1
4   2   2
5   1   4

或者，既然您关心的是+的存在，您可以改为使用like参数：

>>> C = B.filter(like="+")
>>> C
   +B  +C
1   5   2
2   4   4
3   3   1
4   2   2
5   1   4

如何通过＆＃39; str＆＃39;过滤pandas中的数据帧在列名？

1 个答案: