将csv文件转换为json,并且在float值周围没有引号

时间:2019-02-08 00:00:31

标签: python json python-3.x csv

我有一些需要转换为json的csv文件。 csv中的某些float值是数字字符串(以保持尾随零)。转换为json时,所有键和值都用双引号引起来。我需要数字字符串浮点值不带引号,但保持尾随零。

以下是输入的csv文件的示例:

ACCOUNTNAMEDENORM,DELINQUENCYSTATUS,RETIRED,INVOICEDAYOFWEEK,ID,BEANVERSION,ACCOUNTTYPE,ORGANIZATIONTYPEDENORM,HIDDENTACCOUNTCONTAINERID,NEWPOLICYPAYMENTDISTRIBUTABLE,ACCOUNTNUMBER,PAYMENTMETHOD,INVOICEDELIVERYTYPE,DISTRIBUTIONLIMITTYPE,CLOSEDATE,FIRSTTWICEPERMTHINVOICEDOM,HELDFORINVOICESENDING,FEINDENORM,COLLECTING,ACCOUNTNUMBERDENORM,CHARGEHELD,PUBLICID
John Smith,2.0000000000,0.0000000000,5.0000000000,1234567.0000000000,69.0000000000,1.0000000000,,4321987.0000000000,1,000-000-000-00,10012.0000000000,10002.0000000000,3.0000000000,,1.0000000000,0,,0,000-000-000-00,0,bc:1234346

我得到的json输出是:

{"ACCOUNTNAMEDENORM":"John Smith","DELINQUENCYSTATUS":"2.0000000000","RETIRED":"0.0000000000","INVOICEDAYOFWEEK":"5.0000000000","ID":"1234567.0000000000","BEANVERSION":"69.0000000000","ACCOUNTTYPE":"1.0000000000","ORGANIZATIONTYPEDENORM":null,"HIDDENTACCOUNTCONTAINERID":"4321987.0000000000","NEWPOLICYPAYMENTDISTRIBUTABLE":"1","ACCOUNTNUMBER":"000-000-000-00","PAYMENTMETHOD":"12345.0000000000","INVOICEDELIVERYTYPE":"98765.0000000000","DISTRIBUTIONLIMITTYPE":"3.0000000000","CLOSEDATE":null,"FIRSTTWICEPERMTHINVOICEDOM":"1.0000000000","HELDFORINVOICESENDING":"0","FEINDENORM":null,"COLLECTING":"0","ACCOUNTNUMBERDENORM":"000-000-000-00","CHARGEHELD":"0","PUBLICID":"xx:1234346"}

这是我正在使用的代码:

import csv
import json

csvfile = open('output2.csv', 'r')
jsonfile = open('output2.json', 'w')

readHeaders = csv.reader(csvfile)
fieldnames = next(readHeaders)

reader = csv.DictReader(csvfile, fieldnames)

for row in reader:
    json.dump(row, jsonfile, separators=(',', ':'))
    jsonfile.write('\n')

我希望输出的浮点值不带引号,类似于以下内容:

{"ACCOUNTNAMEDENORM":"John Smith","DELINQUENCYSTATUS":2.0000000000,"RETIRED":0.0000000000,"INVOICEDAYOFWEEK":5.0000000000,"ID":1234567.0000000000,"BEANVERSION":69.0000000000,"ACCOUNTTYPE":1.0000000000,"ORGANIZATIONTYPEDENORM":null,"HIDDENTACCOUNTCONTAINERID":4321987.0000000000,"NEWPOLICYPAYMENTDISTRIBUTABLE":"1","ACCOUNTNUMBER":"000-000-000-00","PAYMENTMETHOD":12345.0000000000,"INVOICEDELIVERYTYPE":98765.0000000000,"DISTRIBUTIONLIMITTYPE":3.0000000000,"CLOSEDATE":null,"FIRSTTWICEPERMTHINVOICEDOM":1.0000000000,"HELDFORINVOICESENDING":"0","FEINDENORM":null,"COLLECTING":"0","ACCOUNTNUMBERDENORM":"000-000-000-00","CHARGEHELD":"0","PUBLICID":"xx:1234346"}

4 个答案:

答案 0 :(得分:1)

现在,根据您的评论,我可以更好地理解您的问题,这是一个完全不同的答案。请注意,它不使用json模块,而只是“手动”执行所需的处理。尽管它可能可以使用模块来完成,但是与下面使用的相对简单的逻辑相比,默认情况下,它可以完全不同地格式化它可以识别的Python数据类型的格式:

另一个注意事项:与您的代码一样,这会将csv文件的每一行转换为有效的JSON对象,并将每一行写入文件中的单独一行。但是,结果文件的内容将不是有效的JSON,因为所有这些单独的对象都需要用逗号分隔并放在[]括号中(即有效的JSON数组对象)。

import csv


with open('output2.csv', 'r', newline='') as csvfile, \
     open('output2.json', 'w') as jsonfile:

    for row in csv.DictReader(csvfile):
        newfmt = []
        for field, value in row.items():
            field = '"{}"'.format(field)
            try:
                float(value)
            except ValueError:
                value = 'null' if value == '' else '"{}"'.format(value)
            else:
                # Avoid changing integer values.
                try:
                    int(value)
                except ValueError:
                    pass
                else:
                    value = '"{}"'.format(value)

            newfmt.append((field, value))

        my_json = '{' + ','.join(':'.join(pair) for pair in newfmt) + '}'
        jsonfile.write(my_json + '\n')

这是写入文件的JSON:

{"ACCOUNTNAMEDENORM":"John Smith","DELINQUENCYSTATUS":2.0000000000,"RETIRED":0.0000000000,"INVOICEDAYOFWEEK":5.0000000000,"ID":1234567.0000000000,"BEANVERSION":69.0000000000,"ACCOUNTTYPE":1.0000000000,"ORGANIZATIONTYPEDENORM":null,"HIDDENTACCOUNTCONTAINERID":4321987.0000000000,"NEWPOLICYPAYMENTDISTRIBUTABLE":"1","ACCOUNTNUMBER":"000-000-000-00","PAYMENTMETHOD":12345.0000000000,"INVOICEDELIVERYTYPE":98765.0000000000,"DISTRIBUTIONLIMITTYPE":3.0000000000,"CLOSEDATE":null,"FIRSTTWICEPERMTHINVOICEDOM":1.0000000000,"HELDFORINVOICESENDING":"0","FEINDENORM":null,"COLLECTING":"0","ACCOUNTNUMBERDENORM":"000-000-000-00","CHARGEHELD":"0","PUBLICID":"bc:1234346"}

下面再次显示,并添加了空格:

{"ACCOUNTNAMEDENORM": "John Smith",
 "DELINQUENCYSTATUS": 2.0000000000,
 "RETIRED": 0.0000000000,
 "INVOICEDAYOFWEEK": 5.0000000000,
 "ID": 1234567.0000000000,
 "BEANVERSION": 69.0000000000,
 "ACCOUNTTYPE": 1.0000000000,
 "ORGANIZATIONTYPEDENORM": null,
 "HIDDENTACCOUNTCONTAINERID": 4321987.0000000000,
 "NEWPOLICYPAYMENTDISTRIBUTABLE": "1",
 "ACCOUNTNUMBER": "000-000-000-00",
 "PAYMENTMETHOD": 12345.0000000000,
 "INVOICEDELIVERYTYPE": 98765.0000000000,
 "DISTRIBUTIONLIMITTYPE": 3.0000000000,
 "CLOSEDATE": null,
 "FIRSTTWICEPERMTHINVOICEDOM": 1.0000000000,
 "HELDFORINVOICESENDING": "0",
 "FEINDENORM": null,
 "COLLECTING": "0",
 "ACCOUNTNUMBERDENORM": "000-000-000-00",
 "CHARGEHELD": "0",
 "PUBLICID": "bc:1234346"}

答案 1 :(得分:0)

可能有些矫kill过正,但是使用pandas会很简单:

validate_username(...)

答案 2 :(得分:0)

一种解决方案是使用正则表达式查看字符串值是否看起来像浮点数,然后将其转换为浮点数。

import re

null = None
j = {"ACCOUNTNAMEDENORM":"John Smith","DELINQUENCYSTATUS":"2.0000000000",
     "RETIRED":"0.0000000000","INVOICEDAYOFWEEK":"5.0000000000",
     "ID":"1234567.0000000000","BEANVERSION":"69.0000000000",
     "ACCOUNTTYPE":"1.0000000000","ORGANIZATIONTYPEDENORM":null,
     "HIDDENTACCOUNTCONTAINERID":"4321987.0000000000",
     "NEWPOLICYPAYMENTDISTRIBUTABLE":"1","ACCOUNTNUMBER":"000-000-000-00",
     "PAYMENTMETHOD":"12345.0000000000","INVOICEDELIVERYTYPE":"98765.0000000000",
     "DISTRIBUTIONLIMITTYPE":"3.0000000000","CLOSEDATE":null,
     "FIRSTTWICEPERMTHINVOICEDOM":"1.0000000000","HELDFORINVOICESENDING":"0",
     "FEINDENORM":null,"COLLECTING":"0","ACCOUNTNUMBERDENORM":"000-000-000-00",
     "CHARGEHELD":"0","PUBLICID":"xx:1234346"}

for key in j:
    if j[key] is not None:
        if re.match("^\d+?\.\d+?$", j[key]):
            j[key] = float(j[key])

我在这里使用null = None处理JSON中显示的“ null”。但是您可以在此处用要读取的每个CSV行替换“ j”,然后使用此更新行,然后用浮点数替换字符串将其写回。

如果可以将任何数字字符串转换为浮点数,那么可以跳过正则表达式(re.match()命令),并用j[key].isnumeric()替换它(如果您的Python版本可用)。

编辑:我不认为Python中的float会以您可能认为的方式处理“精度”。似乎2.0000000000被“截断”为2.0,但是我认为这更多是格式化和显示问题,而不是丢失信息。考虑以下示例:

>>> float(2.0000000000)
2.0
>>> float(2.00000000001)
2.00000000001
>>> float(1.00) == float(1.000000000)
True
>>> float(3.141) == float(3.140999999)
False
>>> float(3.141) == float(3.1409999999999999)
True
>>> print('%.10f' % 3.14)
3.1400000000

虽然可以使JSON具有这些零,但是在这种情况下,它只能将数字视为字符串,即格式化的字符串。

答案 3 :(得分:0)

哈,真的很有趣,我想和你找到相反的答案,就是结果带引号。

其实很容易自动去掉,去掉参数"separators=(',', ':')"即可。

对我来说,只需添加这个参数就可以了。