Question

我想比较两个字符串列表，即textSplitted和column1。

目前我正在遍历这两个列表，如果不一样，则column2和column3应该在其中包含连字符（ - ）。如果相同，则column2和column3的值应保留在该位置。

note1：column1，column2，column3最初具有相同的长度。

note2：column1永远不会包含textSplitted没有的元素。

textSplitted = ['wow','this','is','some','nice','text']
column1 = ['this','is','some','text']
column2 = ['A','B','C','D']
column3 = ['Q1','Q2','Q3','Q4',]

i = 0
j = 0

for item in textSplitted:
    if textSplitted[i] == column1[j]:
        i+=1
        j+=1
    elif textSplitted[i] != column1[j]:
        column2.insert(j,"-")
        column3.insert(j,"-")
        i+=1

print(textSplitted)
print(column2)
print(column3)

这会产生输出：

['wow', 'this', 'is', 'some', 'nice', 'text']
['-', 'A', 'B', '-', 'C', 'D']
['-', 'Q1', 'Q2', '-', 'Q3', 'Q4']

但我想实现：

['wow', 'this', 'is', 'some', 'nice', 'text']
['-', 'A', 'B', 'C', '-', 'D']
['-', 'Q1', 'Q2', 'Q3', '-', 'Q4']

注意：如果我要向textSplitted添加额外元素，则输出结果为：列表索引超出范围错误。但是，如果column1是'out of'比较，那么textSplitted中的剩余元素应该在column2和column3中得到相应的连字符（ - ）。 E.g：

['wow', 'this', 'is', 'some', 'nice', 'text','yes','indeed']
['-', 'A', 'B', 'C', '-', 'D','-','-']
['-', 'Q1', 'Q2', 'Q3', '-', 'Q4','-','-']

Answer 1

这应该这样做：

textSplitted = ['wow','this','is','some','nice','text','yes','indeed']
column1 = ['this','is','some','text']
column2 = ['A','B','C','D']
column3 = ['Q1','Q2','Q3','Q4',]

i = 0
j = 0

while j < len(column1):
    if textSplitted[i] == column1[j]:
        i+=1
        j+=1
    elif textSplitted[i] != column1[j]:
        column2.insert(i,"-")
        column3.insert(i,"-")
        i+=1

while i< len(textSplitted):
    column2.append("-")
    column3.append("-")
    i+=1

print(textSplitted)
print(column2)
print(column3)

打印：

['wow', 'this', 'is', 'some', 'nice', 'text', 'yes', 'indeed']
['-', 'A', 'B', 'C', '-', 'D', '-', '-']
['-', 'Q1', 'Q2', 'Q3', '-', 'Q4', '-', '-']

Answer 2

这可能是也可能不是要求，但是如果来自column1的元素在textSplitted中出现多次，则发布的解决方案（当我查看它们时，可能它们现在已更新）将失败，例如：

textSplitted = ['wow','this','is','some','nice','text','yes','indeed','it','is']
column1 = ['this','is','some','text']

output will be:

['wow', 'this', 'is', 'some', 'nice', 'text', 'yes', 'indeed', 'it', 'is']
['-', 'A', 'B', 'C', '-', 'D', '-', '-', '-', '-']
['-', 'Q1', 'Q2', 'Q3', '-', 'Q4', '-', '-', '-', '-']

failing to pick up the repeated 'is'.

下面修复了潜在的问题：

textSplitted = ['wow','this','is','some','nice','text','yes','indeed','it','is']
column1 = ['this','is','some','text']
column2 = ['A','B','C','D']
column3 = ['Q1','Q2','Q3','Q4',]

a = list(map(lambda w: w if w in column1 else '-', textSplitted))
column2 = list(map(lambda w: w if w=='-' else column2[column1.index(w)], a))
column3 = list(map(lambda w: w if w=='-' else column3[column1.index(w)], a))

print(textSplitted)
print(column2)
print(column3)

['wow', 'this', 'is', 'some', 'nice', 'text', 'yes', 'indeed', 'it', 'is']
['-', 'A', 'B', 'C', '-', 'D', '-', '-', '-', 'B']
['-', 'Q1', 'Q2', 'Q3', '-', 'Q4', '-', '-', '-', 'Q2']

Answer 3

你可以做得更简单：

j = 0
for i, word in enumerate(textSplitted):
    if i >= len(column1):
        break
    if word != column1[i-j]:
        column2.insert(i, '-')
        column3.insert(i, '-')
        j+= 1

Answer 4

您必须在索引x处进行替换。

textSplitted = ['wow','this','is','some','nice','text']
column1 = ['this','is','some','text']
column2 = ['A','B','C','D']
column3 = ['Q1','Q2','Q3','Q4',]

i = 0
j = 0

for i in range(0, len(textSplitted)):
    print i,textSplitted[i], j,column1[j]
    if textSplitted[i] != column1[j]:
        column2.insert(i,"-")
        column3.insert(i,"-")
    else:
        j = j+1


print(textSplitted)
print(column2)
print(column3)

Answer 5

在这种情况下，我更倾向于使用映射方法。所以这里有一个不同的解决方案，具有以下优点：

您可以轻松地将mapper用于潜在的新列
它可以正确处理重复出现的单词
column1可以包含不在textSplitted中的单词（它们会导致' - '）

代码：

textSplitted = ['wow', 'this', 'is', 'some', 'nice', 'text','yes','indeed']
column1 = ['this','is','some','text']
column2 = ['A','B','C','D']
column3 = ['Q1','Q2','Q3','Q4',]

last_i = 0
mapper = []
for w in textSplitted:
    try:
        new_i = column1.index(w, last_i)
    except ValueError:
        mapper.append("-")
    else:
        mapper.append(new_i)
        last_i = new_i+1

# mapper = ["-", 0, 1, 2, "-", 3, "-", "-"]

print (textSplitted)
print ([column2[i] if i is not "-" else "-" for i in mapper])
print ([column3[i] if i is not "-" else "-" for i in mapper])

>>> 
['wow', 'this', 'is', 'some', 'nice', 'text', 'yes', 'indeed']
['-', 'A', 'B', 'C', '-', 'D', '-', '-']
['-', 'Q1', 'Q2', 'Q3', '-', 'Q4', '-', '-']

您可以尝试重复发生 - 避免第二个“文本”：

textSplitted = ['wow', 'this', 'is', 'some', 'nice', 'text','yes','indeed', 'text']
column1 = ['this','is','some','text']
column2 = ['A','B','C','D']
column3 = ['Q1','Q2','Q3','Q4',]
...
>>>
['wow', 'this', 'is', 'some', 'nice', 'text', 'yes', 'indeed', 'text']
['-', 'A', 'B', 'C', '-', 'D', '-', '-', '-']
['-', 'Q1', 'Q2', 'Q3', '-', 'Q4', '-', '-', '-']

甚至将第二个“文本”映射到正确的结果：

textSplitted = ['wow', 'this', 'is', 'some', 'nice', 'text','yes','indeed', 'text']
column1 = ['this','is','some','text', 'text']
column2 = ['A','B','C','D', 'E']
column3 = ['Q1','Q2','Q3','Q4','Q5']
...

>>>
['wow', 'this', 'is', 'some', 'nice', 'text', 'yes', 'indeed', 'text']
['-', 'A', 'B', 'C', '-', 'D', '-', '-', 'E']
['-', 'Q1', 'Q2', 'Q3', '-', 'Q4', '-', '-', 'Q5']

比较+插入字符串列表

5 个答案: