Question

我有一个如下所示的熊猫数据框。我想将所有文本都转换为小写。如何在python中执行此操作？

数据帧样本

[Nah I don't think he goes to usf, he lives around here though]                                                                                                                                                                                                                          

[Even my brother is not like to speak with me., They treat me like aids patent.]                                                                                                                                                                                                      

[I HAVE A DATE ON SUNDAY WITH WILL!, !]                                                                                                                                                                                                                                                  

[As per your request 'Melle Melle (Oru Minnaminunginte Nurungu Vettam)' has been set as your callertune for all Callers., Press *9 to copy your friends Callertune]                                                                                                                      

[WINNER!!, As a valued network customer you have been selected to receivea £900 prize reward!, To claim call 09061701461., Claim code KL341., Valid 12 hours only.]

我尝试过的

def toLowercase(fullCorpus):
   lowerCased = [sentences.lower()for sentences in fullCorpus['sentTokenized']]
   return lowerCased

我收到此错误

lowerCased = [sentences.lower()for sentences in fullCorpus['sentTokenized']]
AttributeError: 'list' object has no attribute 'lower'

Answer 1

很简单：

df.applymap(str.lower)

或

df['col'].apply(str.lower)
df['col'].map(str.lower)

好的，您在行中有列表。然后：

df['col'].map(lambda x: list(map(str.lower, x)))

Answer 2

您可以尝试使用apply和map：

def toLowercase(fullCorpus):
   lowerCased = fullCorpus['sentTokenized'].apply(lambda row:list(map(str.lower, row)))
   return lowerCased

Answer 3

也可以将其设置为string，使用str.lower并返回列表。

import ast
df.sentTokenized.astype(str).str.lower().transform(ast.literal_eval)

Answer 4

还有一种不错的方法来使用numpy：

fullCorpus['sentTokenized'] = [np.char.lower(x) for x in fullCorpus['sentTokenized']]

大熊猫数据框中列表中的小写句子

4 个答案: