Question

大家好，我有两个文件File1和File2，它们有以下数据。

File1:

 TOPIC:topic_0 30063951.0
 2 19195200.0

 1 7586580.0

 3 2622580.0

TOPIC:topic_1 17201790.0
1 15428200.0

2 917930.0

10 670854.0

依此类推。有15个主题，每个主题都有各自的权重。像2,1,3这样的第一列是在file2中具有相应单词的数字。例如，

File 2 has:

   1 i

   2 new

   3 percent

   4 people 

   5 year

   6 two

   7 million

   8 president

   9 last

   10 government

等等。大约有10,470行单词。所以，简而言之，我应该在file1的第一列而不是行号中有相应的单词。我的输出应该是：

TOPIC:topic_0 30063951.0

new 19195200.0

i 7586580.0

percent 2622580.0

TOPIC:topic_1 17201790.0

i 15428200.0

new 917930.0

government 670854.0

我的代码：

import sys
d1 = {}
n = 1

with open("ap_vocab.txt") as in_file2:
     for line2 in in_file2:
            #print n, line2
            d1[n] = line2[:-1]
            n = n + 1

with open("ap_top_t15.txt") as in_file:
     for line1 in in_file:
            columns = line1.split(' ')
            firstwords = columns[0]
            #print firstwords[:-8]
            if firstwords[:-8] == 'TOPIC':
                    print columns[0], columns[1]
            elif firstwords[:-8] != '\n':
                    num = columns[0]
                    print d1[n], columns[1]

当我输入print d1 [2]时，此代码正在运行，列[1]为file2提供所有行的第二个单词。但是当打印上面的代码时，它会给出错误

KeyError: 10472

file2中有10472行单词。请帮助我解决这个问题。提前谢谢！

Answer 1

在您的第一个for循环中，n随每行递增，直到达到最终值10472.您只需将d1[n]的值设置为 10471 之后放置增量一样，您为给定的d1设置n，并使用以下两行：

d1[n] = line2[:-1] n = n + 1

然后就行了

print d1[n], columns[1]

在您的第二个for循环中（对于in_file），您正在尝试访问d1[10472]，这显然不存在。此外，您将d1定义为空字典，然后尝试访问它，就像它是一个列表一样，这样即使您修复了增量，也无法像这样访问它。您必须使用包含d1 = []的列表，或者必须实现OrderedDict，以便您可以访问“last”键，因为词典通常在Python中无序。

你可以：
更改增量，以便执行在d1位置设置d1[10472]的值，或者只需设置 for循环后的最后一个位置的值。

根据您尝试打印的内容，您可以使用

替换最后一行
print d1[-1], columns[1]

打印出您当前设置的最终索引位置的值。

从相应的行号打印单词

1 个答案: