输出为CSV时,为什么我的字符串上限为32,758个字符?

时间:2018-01-16 22:12:28

标签: python string python-2.7 ubuntu-16.04 export-to-csv

我正在运行一个Python 2.7.12程序来处理大量数据,我创建的其中一个字符串存储了大量数据,但是我注意到当我输出字符串时它的上限是32,758个字符CSV。

我在Ubuntu-16.04虚拟机上的开发服务器上运行我的脚本,可以访问20GB的RAM

为什么我的一个字符串上限为32,758?是否有解决方法或解决方法,以便我能够在我的字符串中存储更多?

import os
import pdfkit
import re
import requests
import urllib2
#pdfminer
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from pdfminer.pdfpage import PDFPage
from cStringIO import StringIO

#Opening my files
#with open("GoodData.csv", 'w') as output: this does the same thing as what I have currently
output = open("GoodData.csv", 'w')
output.write("Company|Classification|ID Number|Incorporation State/Country|Address|Link to Metadata|Link to Data|Data" + '\n')

count = 0
counter = 0

archive = open("archive.txt", 'w')
qwerty = open("ProblemLinks.txt", 'r')


for item in qwerty:
#for item in linkList:
    print(" ")
    print("Number of documents parsed: " + str(count))

    #This loop is for testing, to go to a specific link
    if counter == 0:
        #So I get the links out of this
        meta = metaData(item)

        pdfkit.from_url(meta[0], 'out.pdf')

        file = "/home/project/out.pdf"
        holder = convert_pdf_to_txt(file)

        if holder == None:
            output.write(''.join(['|'.join([str(meta[3]), str(meta[1]), str(meta[2]), str(meta[4]), str(meta[5]), str(item).rstrip(), str(meta[0]), "No risk data found"]), '\n']))
        else:
            output.write(''.join(['|'.join([str(meta[3]), str(meta[1]), str(meta[2]), str(meta[4]), str(meta[5]), str(item).rstrip(), str(meta[0]), holder]), '\n']))
        count = count + 1

    else:
        counter = counter + 1

我可以在解析完成之前打印holder并将整个文档存储在那里。

1 个答案:

答案 0 :(得分:0)

好吧,我明白了。

它与我如何输出文件或任何与我的代码,Excels错误

无关

显然,当我将CSV文件加载到Excel工作表时,它会将字符串切割成32位字符串。