值错误:输入无效。应该是字符串、字符串列表/元组或整数列表/元组

时间:2021-05-27 11:53:16

标签: huggingface-transformers huggingface-datasets

from os import listdir
from os.path import isfile, join
from datasets import load_dataset
from transformers import BertTokenizer

test_files = [join('./test/', f) for f in listdir('./test') if isfile(join('./test', f))]

dataset = load_dataset('json', data_files={"test": test_files}, cache_dir="./.cache_dir")
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

def encode(batch):
  return tokenizer.encode_plus(batch["abstract"], max_length=32, add_special_tokens=True, pad_to_max_length=True,
                        return_attention_mask=True, return_token_type_ids=False, return_tensors="pt")

dataset.set_transform(encode)

当我运行这段代码时,我有

ValueError: Input is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers.

我有一个字符串列表,而不是一个字符串列表。以下是batch["article"]的内容:

[['eleven politicians from 7 parties made comments in letter to a newspaper .', "said dpp alison saunders had ` damaged public confidence ' in justice .", 'ms saunders ruled lord janner unfit to stand trial over child abuse claims .', 'the cps has pursued at least 19 suspected paedophiles with dementia .'], ['an increasing number of surveys claim to reveal what makes us happiest .', 'but are these generic lists really of any use to us ?', 'janet street-porter makes her own list - of things making her unhappy !'], ["author of ` into the wild ' spoke to five rape victims in missoula , montana .", "` missoula : rape and the justice system in a college town ' was released april 21 .", "three of five victims profiled in the book sat down with abc 's nightline wednesday night .", 'kelsey belnap , allison huguet and hillary mclaughlin said they had been raped by university of montana football players .', "huguet and mclaughlin 's attacker , beau donaldson , pleaded guilty to rape in 2012 and was sentenced to 10 years .", 'belnap claimed four players gang-raped her in 2010 , but prosecutors never charged them citing lack of probable cause .', 'mr krakauer wrote book after realizing close friend was a rape victim .'], ['tesco announced a record annual loss of £ 6.38 billion yesterday .', 'drop in sales , one-off costs and pensions blamed for financial loss .', 'supermarket giant now under pressure to close 200 stores nationwide .', 'here , retail industry veterans , plus mail writers , identify what went wrong .'], ...,  ['snp leader said alex salmond did not field questions over his family .', "said she was not ` moaning ' but also attacked criticism of women 's looks .", 'she made the remarks in latest programme profiling the main party leaders .', 'ms sturgeon also revealed her tv habits and recent image makeover .', 'she said she relaxed by eating steak and chips on a saturday night .']]

我该如何解决这个问题?

0 个答案:

没有答案
相关问题