阅读由换行符和解析分隔的句子

时间:2014-04-16 10:42:53

标签: python

鉴于我有使用换行符分隔的标记化句子,并且我有2列代表标记的实际和预测标记。我想循环遍历这些令牌中的每一个,并找出错误的预测,例如实际标签不等于预测标签

#word actual predicted

James PERSON PERSON
Washington PERSON LOCATION     
went O O
home O LOCATION

He O O
took O O
Elsie PERSON PERSON
along O O

>James Washington went home: Incorrect
>He took Elsie along: Correct

2 个答案:

答案 0 :(得分:0)

Python字符串具有强大的解析功能,您可以在此处使用。我使用Python 3.3做到了这一点,但它也适用于任何其他版本。

thistext = '''James PERSON PERSON
Washington PERSON LOCATION     
went O O
home O LOCATION

He O O
took O O
Elsie PERSON PERSON
along O O
'''

def check_text(text):
    lines = text.split('\n')
    correct = [True] #a bool wrapped in a list,we can modify it from a nested function
    words = []

    def print_result():
        if words:
            print( ' '.join(words), ": ", "Correct" if correct[0] else "Incorrect" )
        #words.clear()
        del words[:]        
        correct[0] = True

    for line in lines:
        if line.strip():  # check if the line is empty
            word, a, b = line.split()
            if a != b:
                correct[0] = False
            words.append(word)
        else:
            print_result();

    print_result()

check_text(thistext)

答案 1 :(得分:0)

除了我previous answer我使用的all()和列表理解:

from itertools import groupby

d = {True: 'Correct', False: 'Incorrect'}
with open('text1.txt') as f:
    for k, g in groupby(f, key=str.isspace):
        if not k:
            # Split each line in the current group at whitespaces
            data = [line.split() for line in g] 
            # If for each line the second column is equal to third then `all()` will
            # return True.
            predicts_matched = all(line[1] == line[2] for line in data)
            print ('{}: {}'.format(' '.join(x[0] for x in data), d[predicts_matched]))

<强>输出:

James Washington went home: Incorrect
He took Elsie along: Correct