大熊猫数据框中列表中的小写句子

时间:2018-08-04 18:33:41

标签: python pandas nlp

我有一个如下所示的熊猫数据框。我想将所有文本都转换为小写。如何在python中执行此操作?

  

数据帧样本

[Nah I don't think he goes to usf, he lives around here though]                                                                                                                                                                                                                          

[Even my brother is not like to speak with me., They treat me like aids patent.]                                                                                                                                                                                                      

[I HAVE A DATE ON SUNDAY WITH WILL!, !]                                                                                                                                                                                                                                                  

[As per your request 'Melle Melle (Oru Minnaminunginte Nurungu Vettam)' has been set as your callertune for all Callers., Press *9 to copy your friends Callertune]                                                                                                                      

[WINNER!!, As a valued network customer you have been selected to receivea £900 prize reward!, To claim call 09061701461., Claim code KL341., Valid 12 hours only.]
  

我尝试过的

def toLowercase(fullCorpus):
   lowerCased = [sentences.lower()for sentences in fullCorpus['sentTokenized']]
   return lowerCased
  

我收到此错误

lowerCased = [sentences.lower()for sentences in fullCorpus['sentTokenized']]
AttributeError: 'list' object has no attribute 'lower'

4 个答案:

答案 0 :(得分:1)

很简单:

df.applymap(str.lower)

df['col'].apply(str.lower)
df['col'].map(str.lower)

好的,您在行中有列表。然后:

df['col'].map(lambda x: list(map(str.lower, x)))

答案 1 :(得分:1)

您可以尝试使用applymap

def toLowercase(fullCorpus):
   lowerCased = fullCorpus['sentTokenized'].apply(lambda row:list(map(str.lower, row)))
   return lowerCased

答案 2 :(得分:1)

也可以将其设置为string,使用str.lower并返回列表。

import ast
df.sentTokenized.astype(str).str.lower().transform(ast.literal_eval)

答案 3 :(得分:0)

还有一种不错的方法来使用numpy

fullCorpus['sentTokenized'] = [np.char.lower(x) for x in fullCorpus['sentTokenized']]
相关问题