Question

我正在阅读.csv文件并将其保存到名为csvfile的矩阵中，矩阵内容如下所示（缩写为：有几十条记录）：

[[＆＃39; 411-440854-0＆＃39;，＆＃39; 411-440824-0＆＃39;，＆＃39; 411-441232-0＆＃39;，＆＃39; 394- 529791＆＃39;，＆＃39; 394-529729＆＃39;，＆＃39; 394-530626＆＃39;]，＆lt; ...＆gt;，[＆＃39; 394-1022430-0＆＃39;，＆＃39; 394-1022431-0＆＃39;，＆＃39; 394-1022432-0＆＃39;，＆＃39; ***另一个CN，其间有切换＆＃39;]，[＆＃39; 394-833938-0＆＃39;，＆＃39; 394-833939-0＆＃39;，＆＃39; 394-833940-0＆＃39;]，＆lt; ...＆gt;，[＆＃39; 394 -1021830-0＆＃39;，＆＃39; 394-1021831-0＆＃39;，＆＃39; 394-1021832-0＆＃39;，＆＃39; ***分段器结束连接＆＃39;]，[ ＆＃39; 394-1022736-0＆＃39;，＆＃39; 394-1022737-0＆＃39;，＆＃39; 394-1022738-0＆＃39;]，＆lt; ...＆gt;，[＆＃39; 394-1986420-0＆＃39;，＆＃39; 394-1986419-0＆＃39;，＆＃39; 394-1986416-0＆＃39;，＆＃39; ***奇怪的BN行检查＆＃ 39;]，[＆＃39; 394-1986411-0＆＃39;，＆＃39; 394-1986415-0＆＃39;，＆＃39; 394-1986413-0＆＃39;]，＆lt; ... ＆gt;，[＆＃39; 394-529865-0＆＃39;，＆＃39; 394-529686-0＆＃39;，＆＃39; 394-530875-0＆＃39;，＆＃39; ***分段器终端连接＆＃39;]，[＆＃39; 394-830900-0＆＃39;，＆＃39; 394-830904-0＆＃39;，＆＃39; 394-830902-0＆＃39;]，[ ＆＃39; 394-2350772-0＆＃39 ;, ＆＃39; 394-2350776-0＆＃39;，＆＃39; 394-2350774-0＆＃39;，＆＃39; ***分区器存在但没有结束时间＆＃39;]，＆lt; ...＆gt; ]

我正在将一个文本文件读入名为textfile的变量中，内容如下所示：

...
object underground_line {
    name SPU123-394-1021830-0-sectionalizer;
    phases AN;
    from SPU123-391-670003;
    to SPU123-395-899674_sectionalizernode;
    length 26.536;
    configuration SPU123-1/0CN15-AN;
}

object underground_line {
    name SPU123-394-1021831-0-sectionalizer;
    phases BN;
    from SPU123-391-670002;
    to SPU123-395-899675_sectionalizernode;
    length 17.902;
    configuration SPU123-1/0CN15-BN;
}

object underground_line {
    name SPU123-394-1028883-0-sectionalizer;
    phases CN;
    from SPU123-391-542651;
    to SPU123-395-907325_sectionalizernode;
    length 771.777;
    configuration SPU123-1CN15-CN;
}
...

我想查看name矩阵textfile中的SPU123-行的一部分（-0-sectionalizer之后和csvfile之前的任何内容）是否存在于counter = 0 for noline in textfile: if 'name SPU123-' in noline: if '-' in noline[23]: if ((noline[13:23] not in s[0]) and (noline[13:23] not in s[1]) and (noline[13:23] not in s[2]) for s in csvfile): counter = counter+1 else: if ((noline[13:24] not in s[0]) and (noline[13:24] not in s[1]) and (noline[13:-24] not in s[2]) for s in csvfile): counter = counter+1 print counter矩阵中。如果它不存在，我想做一些事情（增加一个计数器），我尝试了几种方法，包括下面的内容：

if any((noline......)

这不起作用。我也在上面的代码示例中尝试了{{1}}，但它也没有用。

Answer 1

在列表s：

列表中检查字符串l

>>> l = [['str', 'foo'], ['bar', 'so']]

>>> s = 'foo'
>>> any(s in x for x in l)
True

>>> s = 'nope'
>>> any(s in x for x in l)
False

在您的代码中实现此功能（假设noline[13:23]是您想要搜索的字符串，然后如果它不在counter中则递增csvfile：

counter = 0
for noline in textfile:
    if 'name SPU123-' in noline:
        if '-' in noline[23]: noline[13:23]:
            if not any(noline[13:23] in x for x in csvfile) and not any(noline[13:23] + '-0' in x for x in csvfile):
                counter += 1
        else:
            if not any(noline[13:24] in x for x in csvfile) and not any(noline[13:24] + '-0' in x for x in csvfile):
                counter  += 1

Answer 2

由于矩阵包含大量值的负载，因此每次迭代都很慢。

将值组装到映射中（在这种情况下为set，因为没有关联数据），因为哈希表查找非常快：

s = {v for r in matrix for v in r if re.match(r'\d[-\d]+]\d$',v)} #or any filter more appropriate for your notion of valid identifiers

if noline[13:23] in s: #parsing the identifiers instead would be more fault-tolerant
   #do something

由于初步步骤，这只会超过一定规模的蛮力方法。

Answer 3

import re, itertools

展平csvfile - data是一个迭代器

data = itertools.chain.from_iterable(csvfile)

从数据中提取相关项目并将其设置为性能集合（避免多次迭代数据）

data_rex = re.compile(r'\d{3}-\d+')
data = {match.group() for match in itertools.imap(data_rex.match, data) if match}

量化不在数据中的名称。

def predicate(match, data = data):
    '''Return True if match not found in data'''
    return match.group(1) not in data

# after SPU123- and before -0-
name = re.compile(r'name SPU123-(\d{3}-\d+)-')
names = name.finditer(textfile)
# quantify
print sum(itertools.imap(predicate, names))

检查文件中的字符串是否存在于字符串列表列表中：python

3 个答案: