如何使用TokensRegex链接规则?

时间:2015-07-31 13:40:07

标签: stanford-nlp



首先,感谢Angel Chang写了一个像TokensRegex这样伟大的工具!

我的用例如下:
我的测试规则集中有两个提取规则。他们两个都有"动作"字段指定为结果,并且都具有" Annotate"在动作列表中。 当匹配的第二个规则的表达式独立于第一个规则的结果时,它们工作得很好。但是当第二个规则的执行取决于第一个规则的结果时,事情就会崩溃。

一个具体的例子:
我有以下句子:" 共识估计要求每股收益为3.55美元,收入为305.1亿美元。"
" EPS"和"收入"已经有一个更基本的RegexNER注释器注释。如果满足某些条件,TokensRegex注释器的目标是增加NER注释 在这个简化的例子中,如果我们看到一个术语"估计"在术语" EPS"之前不久发生,我们想要重新标记令牌" EPS"使用" DN-EPS_EST" NER注释。那是我的第一条规则 第二条规则取决于第一条规则的结果 - 重新标注令牌"收入"如果它前面有一个令牌,其NER注释为" DN-EPS_EST" (是第一条规则的结果)。

所以我的TokensRegex规则如下:

{
ruleType:   "tokens",
pattern:    ( /[Ee]stimates?/ []{0,3} [{ner:"DN-EPS"}] ),
action:     ( Annotate($0[-1], "ner", "DN-EPS_EST") ) }
{
ruleType:   "tokens",
pattern:    ( [{ner:"DN-EPS_EST"}] /of/ [{ner:"MONEY"}]{1,3} /on/ [{ner:"DN-REVENUE"}] ),
action:     ( Annotate($0[-1], "ner", "DN-REVENUE_EST") ) }

第一条规则有效,但第二条规则不起作用。问题是什么?规则是否以错误的顺序执行?第一个规则的结果是否在第二个表达式中没有及时保持匹配?我使用了错误的字段或操作类型吗?我故意简化了这个例子中的模式匹配表达式,但也许我在"模式中仍然有错误#34;第二条规则的领域?

任何帮助都将不胜感激!我很难过。阅读网站上的所有文档,Javadocs和幻灯片,但是找不到具体的答案。

1 个答案:

答案 0 :(得分:4)

OK, after some additional tinkering and research I finally found the answer to my own question:
You have to apply the chained rules in stages, just ordering them "correctly" in the rules file is not sufficient.

TokensRegexAnnotator will DO NOTHING based on the dependent rule if its pattern mentions a token property that is being modified by the upstream rule and if the stage is the same (or unspecified). It will match neither the "before the 1st rule execution" state, nor the "after the 1st rule execution" state.
I tested the 2nd rule by itself by taking the 1st rule out of the equation altogether - it worked. This was necessary to ensure that the pattern expression was not faulty in the 2nd rule.
Then I re-introduced the 1st rule and tested the 2nd rule with two expressions: "before the 1st rule execition" state and "after the 1st rule execution" state - NOTHING HAPPENED IN BOTH CASES. Not sure why TokensRegexAnnotator was implemented this way, maybe the creators thought that no behavior is better than some default behavior...

At any rate, only after I read deeper into the "SequenceMatchRules" Javadoc, I found the "stage" field and attempted to apply it (although it does not say explicitly that you HAVE to apply it if you have a rule that uses output annotations from some other rule).

Here's how the working example looks like:

{   ruleType:   "tokens",
pattern:    ( /[Ee]stimates?/ []{0,3} [{ner:"DN-EPS"}] ),
action:     ( Annotate($0[-1], "ner", "DN-EPS_EST") ),
stage:      1   }

{   ruleType:   "tokens",
pattern:    ( [{ner:"DN-EPS_EST"}] /of/ [{ner:"MONEY"}]{1,3} /on/ [{ner:"DN-REVENUE"}] ),
action:     ( Annotate($0[-1], "ner", "DN-REVENUE_EST") ),
stage:      2   }

As you can see, the 2nd rule's pattern has a condition on an NER annotation that can be satisfied only after the 1st rule is executed and results are committed. In this example the 2nd rule is fired, as expected.

相关问题