Question

我想处理高达100K的字符串并将csv文件写入不同的列。（基本上试图克服32K的excel细胞限制）

以下是示例代码：

soup = BeautifulSoup(r.content, 'html5lib')
html = str(soup.select('div.DocumentText'))
if len(html) > 32000:
   #How to handle here and assign to different variable ex: html1, html2 is the question 
   x.writerow([html_1,......, html_5])

尝试实现的示例流程

废料网站
如果报废的数据字符大于32000且小于100K
将报废拆分为不同的变量
将每个变量写入CSV文件的不同列

Answer 1

也许你想尝试一下。它会将字符串拆分为32000的大小（如果需要，只需更改大小）并将它们放入列表中。

if len(html) > 32000:
    #How to handle here and assign to different variable ex: html1, html2 is the question
    output = [html[0+i:32000+i] for i in range(0, len(html), 32000)]
    x.writerow(output)

Answer 2

希望这有助于任何人...如果有更好的方式乐于听到...限制只能处理case_html（字符串）长度高达98K

def strhandler(case_html, length):
    string = case_html
    return (string[0+i:length+i] for i in range(0, len(string), length)) 

case_html = str(soup.find('div', class_='DocumentText').find_all(['p','center','small']))
char_count = len(c.case_html)
split_no = int(char_count/4)
print('Split this into no.of columns', split_no)
case_html_1, case_html_2, case_html_3, case_html_4, case_html_5 =  list(c.strhandler(case_html,split_no))
csv_writer.writerow([case_html_1, case_html_2, case_html_3, case_html_4, case_html_5,])

拆分废弃的html字符串

2 个答案: