空间命名实体识别不识别食品等产品实体

时间:2020-08-02 00:13:25

标签: python nlp spacy named-entity-recognition ner

我正在使用spaCy's Named Entity Recognition来找出句子中的美食单词。这是我的代码:

import spacy 
  
nlp = spacy.load('en_core_web_sm') 
  
sentence = "I like to eat pizza."
  
doc = nlp(sentence) 
  
for ent in doc.ents: 
    print(ent.text, ent.label_)

为什么不打印“比萨饼”?根据{{​​3}},食物属于PRODUCT实体类型,因此不应为ent.text打印“比萨饼”,而为PRODUCT打印ent.label

1 个答案:

答案 0 :(得分:0)

我遇到了同样的问题,并通过几个例子训练了 spacy。

所以,抓几句(3-4句也行),手动将产品提取到列表中,然后你就会有一个文本字典和产品列表。然后修改这段代码

def getSpans(ner_model=None, products=[], nameForNewLabel = 'PRODUCTS', doc=None):
    # create patterns
    patterns = [ner_model(products) for products in products] 
    # matches them, what about overlapping?
    matcher = PhraseMatcher(ner_model.vocab)
    matcher.add(nameForNewLabel, None, *patterns)  # add patterns to matcher
    matches = matcher(doc)
    # now create spans
    spans=[]
    for match_id, start, end in matches:
        # create a new Span for each match and use the match_id (PRODUCTS) as the label
        span = doc[start:end]  # The matched span
        print(span.text, span.start_char,span.end_char, span.label_, "'"+doc.text[span.start_char:span.end_char]+"'", doc.text[span.start_char:span.end_char] in products)
        # now create open span
        span = Span(doc, start, end, label=match_id)
        # add to spans
        spans.append(span)

    # filter spans for that company,description of company
    # Filter a sequence of Span objects and remove duplicates or overlaps. Useful for creating named entities (where one token can only be part of one entity) or 
    # when merging spans with Retokenizer.merge. When spans overlap, the (first) longest span is preferred over shorter spans.
    filtered_spans = filter_spans(spans)
    doc.ents = filtered_spans
    #create example and add to dataset list of examples to return
    eg=Example(doc,doc)
    return eg

哪里

doc = ner_model.make_doc(text)

ner_model = spacy.blank('en')  # create blank Language class

然后训练模型。一旦训练过,例如使用 batch_size = max(number examples) 的 200 个 epoch,你会看到它会起作用。

我无法分享我的全部代码,因为我将其用于私募股权 AI 公司的产品,但通过上述内容,我相信您可以做到。