让Python逐字逐句而不是逐字逐句?

时间:2014-08-12 05:41:49

标签: python string tuples text-processing

我有一系列字符串,我希望Python在创建元组时逐句采用它。例如:

string = [("I am a good boy"), ("I am a good girl")]
tuple = [("I am a good boy", -1), ("I am a good girl", -1)]

但显然它正在做:

tuple = [("I", -1), ("am", -1), ("a", -1), ("good", -1), ("boy", -1).....]

出了什么问题,如何解决?

import re

def cleanedthings(trainset):
    cleanedtrain = []
    specialch = "!@#$%^&*-=_+:;\".,/?`~][}{|)("
    for line in trainset:
        for word in line.split():
            lowword = word.lower()
            for ch in specialch:
                if ch in lowword:
                    lowword = lowword.replace(ch,"")
            if len(lowword) >= 3:
                cleanedtrain.append(lowword)
    return cleanedtrain

poslinesTrain = [('I just wanted to drop you a note to let you know how happy I am with my cabinet'), ('The end result is a truly amazing transformation!'), ('Who can I thank for this?'), ('For without his artistry and craftmanship this transformation would not have been possible.')]

neglinesTrain = [('I have no family and no friends, very little food, no viable job and very poor future prospects.'), ('I have therefore decided that there is no further point in continuing my life.'), ('It is my intention to drive to a secluded area, near my home, feed the car exhaust into the car, take some sleeping pills and use the remaining gas in the car to end my life.')]

poslinesTest = [('Another excellent resource from Teacher\'s Clubhouse!'), ('This cake tastes awesome! It\'s almost like I\'m in heaven already oh God!'), ('Don\'t worry too much, I\'ll always be here for you when you need me. We will be playing games or watching movies together everytime to get your mind off things!'), ('Hey, this is just a simple note for you to tell you that you\'re such a great friend to be around. You\'re always being the listening ear to us, and giving us good advices. Thanks!')]

neglinesTest = [('Mum, I could write you for days, but I know nothing would actually make a difference to you.'), ('You are much too ignorant and self-concerned to even attempt to listen or understand. Everyone knows that.'), ('If I were, your BITCHY comments that I\'m assuming were your attempt to help, wouldn\'t have.'), ('If I have stayed another minute I would have painted the walls and stained the carpets with my blood, so you could clean it up... I wish I were never born.')]

clpostrain = cleanedthings(poslinesTrain)
clnegtrain = cleanedthings(neglinesTrain)

clpostest = cleanedthings(poslinesTest)
clnegtest = cleanedthings(neglinesTest)


trainset= [(x,1) for x in clpostrain] + [(x,-1) for x in clnegtrain]
testset= [(x,1) for x in clpostest] + [(x,-1) for x in clnegtest]

print testset

1 个答案:

答案 0 :(得分:2)

您通过单词而不是句子加入了最终结果。为每个句子添加变量将修复您的错误

def cleanedthings(trainset):
    cleanedtrain = []
    specialch = "!@#$%^&*-=_+:;\".,/?`~][}{|)("
    for line in trainset:
        #will append the clean word of the current sentence in this var
        sentence = []
        for word in line.split():
            lowword = word.lower()
            for ch in specialch:
                if ch in lowword:
                    lowword = lowword.replace(ch,"")
            if len(lowword) >= 3:
                sentence.append(lowword)
        #once we check all words, recreate the sentence joining by white space 
        #and append to the list of cleaned sentences
        cleanedtrain.append(' '.join(sentence))
    return cleanedtrain