Question

我训练了一个自定义的NER模型，以从法律引文中检测出法院的缩写。我给了2000个样本，如下面所附的数据集中所示，但是当我在样本文本上运行模型时，实际准确性太低。

我得到的输出是：

1.III. (wrong)
2.D.Me. (right)
3.702 (wrong)
4.[ed]

我训练有什么问题吗？该模型是否无法注释除数据集patterN之外的其余文本？数据集如下所示：

('Dolby  v. Dole Food Co.  896 F. Supp. 2d 556, 569 (D. Me. 2012)', {'entities': [(51,57, 'Court Abbr')]}),
('Commonwelth  v. Zook, 803 F.3d 694, 695 (D. Me. 2015)', {'entities': [(41,47, 'Court Abbr')]}),

示例文字：

"Harley-Davidson and Goodyear ﬁled motions to exclude Woehrle’s and Lee’s opinions, arguing they lacked the relia- bility required by Federal Rule of Evidence (III. 702) and Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579 (D. Me. 1993). Follow- ing a hearing, the district court agreed. The court concluded


that Woehrle’s opinion that manufacturing defects caused the tire to unseat from the rim upon being punctured “appear[ed] to be based on nothing more than his subjective belief and un- supported speculation"

由于数据集，空间定制NER准确性低？

0 个答案: