如何使用 AllenNLP 和 coref-spanbert-large 在没有 Internet 的情况下解决共指?

时间:2021-05-13 09:09:07

标签: allennlp coreference-resolution

想要使用 AllenNLP 和 coref-spanbert-large 模型在没有 Internet 的情况下解决共引用。 我尝试按照此处描述的方式进行操作https://demo.allennlp.org/coreference-resolution

我的代码:

from allennlp.predictors.predictor import Predictor
import allennlp_models.tagging

predictor = Predictor.from_path(r"C:\Users\aap\Desktop\coref-spanbert-large-2021.03.10.tar.gz")
example = 'Paul Allen was born on January 21, 1953, in Seattle, Washington, to Kenneth Sam Allen and Edna Faye Allen.Allen attended Lakeside School, a private school in Seattle, where he befriended Bill Gates, two years younger, with whom he shared an enthusiasm for computers.'
pred = predictor.predict(document=example)
coref_res = predictor.coref_resolved(example)
print(pred)
print(coref_res)

当我可以访问互联网时,代码可以正常工作。 但是当我无法访问互联网时,我会收到以下错误:

Traceback (most recent call last):
  File "C:/Users/aap/Desktop/CoreNLP/Coref_AllenNLP.py", line 14, in <module>
    predictor = Predictor.from_path(r"C:\Users\aap\Desktop\coref-spanbert-large-2021.03.10.tar.gz")
  File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\predictors\predictor.py", line 361, in from_path
    load_archive(archive_path, cuda_device=cuda_device, overrides=overrides),
  File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\models\archival.py", line 206, in load_archive
    config.duplicate(), serialization_dir
  File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\models\archival.py", line 232, in _load_dataset_readers
    dataset_reader_params, serialization_dir=serialization_dir
  File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\common\from_params.py", line 604, in from_params
    **extras,
  File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\common\from_params.py", line 632, in from_params
    kwargs = create_kwargs(constructor_to_inspect, cls, params, **extras)
  File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\common\from_params.py", line 200, in create_kwargs
    cls.__name__, param_name, annotation, param.default, params, **extras
  File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\common\from_params.py", line 307, in pop_and_construct_arg
    return construct_arg(class_name, name, popped_params, annotation, default, **extras)
  File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\common\from_params.py", line 391, in construct_arg
    **extras,
  File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\common\from_params.py", line 341, in construct_arg
    return annotation.from_params(params=popped_params, **subextras)
  File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\common\from_params.py", line 604, in from_params
    **extras,
  File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\common\from_params.py", line 634, in from_params
    return constructor_to_call(**kwargs)  # type: ignore
  File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\data\token_indexers\pretrained_transformer_mismatched_indexer.py", line 63, in __init__
    **kwargs,
  File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\data\token_indexers\pretrained_transformer_indexer.py", line 58, in __init__
    model_name, tokenizer_kwargs=tokenizer_kwargs
  File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\data\tokenizers\pretrained_transformer_tokenizer.py", line 71, in __init__
    model_name, add_special_tokens=False, **tokenizer_kwargs
  File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\common\cached_transformers.py", line 110, in get_tokenizer
    **kwargs,
  File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\transformers\models\auto\tokenization_auto.py", line 362, in from_pretrained
    config = AutoConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)
  File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\transformers\models\auto\configuration_auto.py", line 368, in from_pretrained
    config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\transformers\configuration_utils.py", line 424, in get_config_dict
    use_auth_token=use_auth_token,
  File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\transformers\file_utils.py", line 1087, in cached_path
    local_files_only=local_files_only,
  File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\transformers\file_utils.py", line 1268, in get_from_cache
    "Connection error, and we cannot find the requested files in the cached path."
ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.

Process finished with exit code 1

请说,我需要什么才能在没有互联网的情况下运行我的代码?

1 个答案:

答案 0 :(得分:0)

您将需要转换器模型的配置文件和词汇表的本地副本,以便标记器和标记索引器不需要下载这些:

from transformers import AutoConfig, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(transformer_model_name)
config = AutoConfig.from_pretrained(transformer_model_name)
tokenizer.save_pretrained(local_config_path)
config.to_json_file(local_config_path + "/config.json")

然后您需要将配置文件中的转换器模型名称覆盖到您保存这些内容的本地目录 (local_config_path):

predictor = Predictor.from_path(
    r"C:\Users\aap\Desktop\coref-spanbert-large-2021.03.10.tar.gz",
    overrides={
        "dataset_reader.token_indexers.tokens.model_name": local_config_path,
        "validation_dataset_reader.token_indexers.tokens.model_name": local_config_path,
        "model.text_field_embedder.tokens.model_name": local_config_path,
    },
)
相关问题