Question

我正在抓取一系列网页，并将其内容组织到内存中的知识库中。根据我的字符串输入，我需要执行不同的代码，这些代码是从网站的标题中抓取的。

tags = browser.find_elements_by_xpath("//div[@class='main-content-entry']/h2")
for tag in tags:
  heading = tag.get_attribute("textContent").lower().strip()
  content = tag.parent
  if heading.find("overview") != -1:
    # do this
  elif heading.find("takeaways") != -1:
    # do that
  # do more elifs
  else:
    # do something else

现在，我将其实现为if-elif-else语句。我在网站上看到了一些建议使用dict的答案，但是据我所知，这取决于输入是否与键完全匹配。但是，就我而言，由于网站所有者的不一致，并不总是可能完全匹配。

页面结构足够好，我知道标题名称是什么，因此我可以在代码中提前定义“键”。但是，在某些标题的上百个页面中，有错别字和轻微的变体。例如：

费用和资金
费用
费用与资金
证书
证书
证书和考试
考试和证书

目前，我能做的最好的事情是先扫描页面，确定整个标题集，然后手动定义要在我的代码中使用的子字符串，以避免重复。

考虑到上述情况，是否有更好的方法来迭代执行链式if-elif-else语句？

修改

在我的情况下，Replacements for switch statement in Python?中的建议答案不起作用。例如：

def do_this(heading):
  return {
    "overview": do_overview(),
    "fees": do_fees(),
    # ...
  }[heading]

这将是该问题的答案所建议的实现方式。但是，当do_fees()为heading，"fees & funding"，"fees"等时，如何返回"fees &funding"？如果键值是heading的子字符串，则需要执行正确的功能。

Answer 1

考虑到上述情况，是否有更好的方法来迭代执行链式if-elif-else语句？

您无需使用特定键直接从字典中查找值。您可以只使用字典来压缩您的解析逻辑：

mid > foo::min

Answer 2

如果要匹配键入的字符串，则需要对某些输入进行某种模糊匹配。但是，对于结构良好的代码，可以通过调整字典方法来获得switch语句的线性时间优势。（这仅在您有很多案件的情况下才重要）。

funcs = {
    "certificates": lambda: "certificates",
    "fees": lambda: "fees",
}

headings =['Fees & Funding', 'Fees', 'Fees &Funding', 'Certificates',
           'Certificate', 'Certificat & Exams', 'Exams & Certificates']

def do_this(heading):
    words = heading.lower().split()
    funcs_to_call = [funcs[word] for word in words if word in funcs]
    if len(funcs_to_call) == 1:
        return funcs_to_call[0]()
    elif len(funcs_to_call) == 0:
        return 'needs fuzzy matching'
    else:
        #alternatively treat it as being in multiple categories.
        raise ValueError("This fits more than one category")


for heading in headings:
    print(heading, parse(heading), sep = ': ')
#outputs:
Fees & Funding: fees
Fees: fees
Fees &Funding: fees
Certificates: certificates
Certificate: needs fuzzy matching
Certificat & Exams: needs fuzzy matching
Exams & Certificates: certificates

如果您能够预测将要面对的错字类型，则可以提前清理字符串以更精确地匹配，例如删除符号并使单词复数。

切换Python字符串输入的好选择是什么？

2 个答案: