在协调总计时可能出现浮动错误?

时间:2016-01-14 19:03:11

标签: python floating-point python-2.6

我的问题是我有一个预处理程序从csv读取数据并在2个客户端给定的字段(文档计数和检查总计)上进行协调,然后解析数据并计算在比较两者之前进行对比以获得和解。

首先,这是我的导入:

from csv import reader, writer, QUOTE_MINIMAL
import logging
from os import getcwd, mkdir, path
from sys import argv
from datetime import date
from types import IntType, FloatType 

接下来,这是实际的对帐步骤:

def _recon_totals(self):
        """
        Reconcile the check total amount and document count and write out the file name,
        check numbers, vendor names, and timestamp to weekly report.
        """

        # Client totals
        client_doc_count = int(self.header_data[0][6])
        client_check_tot = float(self.header_data[0][7])
        # Double check variable typing for reconciliation totals.
        logging.info('Document count is: {0}'.format(client_doc_count))
        # doc_var_type = type(client_doc_count)
        # assert doc_var_type is IntType, 'Doc count is not an integer: {0}'.format(
        #    doc_var_type) 
        logging.info('Check Total is: {0}'.format(client_check_tot))
        # check_var_type = type(client_check_tot)
        # assert check_var_type is FloatType, 'Check tot is not a float: {0}'.format(
        #    check_var_type)

        # RRD totals
        rrd_doc_count = 0
        rrd_check_tot = 0.0

        with open(self.rpt_of, 'a') as rpt_outfile:
            for transact in self.transact_data:
                row_type = transact[0]
                logging.debug('Transaction type is: {0}'.format(row_type))

                if row_type == 'P':
                    # Reconciliation
                    rrd_doc_count += 1
                    trans_chk_amt = float(transact[12])
                    # trans_chk_type = type(trans_chk_amt)
                    # assert trans_chk_type is FloatType, 'Transaction Check Total is '\
                    #                                     'not a float: {0}'.format(
                    #                                         trans_chk_type)
                    rrd_check_tot += trans_chk_amt
                    # Reporting
                    vend_name = transact[2]
                    file_name = self.infile.split('/')[-1]
                    print('File name', file_name)
                    check_num = transact[9]
                    cur_time = date.today()
                    rpt_outfile.write('{0:<50}{1:<50}{2:<30}{3}\n'.format(file_name,
                                                                          vend_name,
                                                                          check_num,
                                                                          cur_time))
        # Reconcile totals and return the lists for writing if they are correct
        # if (client_doc_count, client_check_tot) == (rrd_doc_count, rrd_check_tot):
        #     logging.info('Recon totals match!')
        if client_doc_count == rrd_doc_count and client_check_tot == rrd_check_tot:
        #     logging.info('Recon totals match!')
            return True

        else:
            raise ValueError('Recon totals do not match! Client: {0} {1} {2} {3}\n'
                             'RRD {4} {5} {6} {7}'.format(client_doc_count,
                                                          client_check_tot,
                                                          type(client_doc_count),
                                                          type(client_check_tot),
                                                          rrd_doc_count,
                                                          rrd_check_tot,
                                                          type(rrd_doc_count),
                                                          type(rrd_check_tot)))

我正在运行6个文件,其中4个运行正常(通过对帐),然后2个运行失败。这是正常的,客户端给我们提供了不好的数据,除了我在数据中找不到任何表明这是错误的事实。甚至我的堆栈调用也显示客户总数和我的总数应该协调:

ValueError: Recon totals do not match! Client: 2 8739.54 <type 'int'> <type 'float'>
RRD 2 8739.54 <type 'int'> <type 'float'>

我尝试了两种不同的方法来编写检查两者的语句,并得到相同的结果(预期)。

最后,这里有一个(修改的,相关字段除外)有问题的数据字段示例(这是带有计数的标题记录):

"H","XXX","XXX","XXX","XXX","XXX","2","8739.54","","","","","","","","","","","","","","","",""

然后是我和解的行:

"P","XXX","XXX","XXX","","XXX","XXX","XXX","XXX","XXX","XXX","XXX","846.80",...(more fields that aren't pertinent)
"P","XXX","XXX","XXX","","XXX","XXX","XXX","XXX","XXX","XXX","XXX","7892.74",...(more fields that aren't pertinent)

对于每个“P”记录,我增加了我的文档计数,然后我将非“XXX”字段添加到运行总计中。

总之,对此的任何帮助都将非常感激,我看不出我做出的任何逻辑错误。

2 个答案:

答案 0 :(得分:2)

我不同意答案,暗示有误差。这是不可靠的(因为边距会随着你总结的浮动数量而变化)并且实际上似乎不是一个好的解决方案。这让我想起了电影Office Space,他们只是在交易过程中切掉几分钱,并将它们转移到另一个银行账户(你的错误边缘)。

然而我肯定会同意这项检查的建议,以确保使用减法确实这是一个浮点错误。

我会放弃浮动并使用decimal库。您需要做的就是用float构造函数替换所有Decimal构造函数:

from decimal import Decimal


def _recon_totals(self):
    """
    Reconcile the check total amount and document count and write out the file name,
    check numbers, vendor names, and timestamp to weekly report.
    """

    # Client totals
    client_doc_count = int(self.header_data[0][6])
    client_check_tot = Decimal(self.header_data[0][7])
    # Double check variable typing for reconciliation totals.
    logging.info('Document count is: {0}'.format(client_doc_count))
    # doc_var_type = type(client_doc_count)
    # assert doc_var_type is IntType, 'Doc count is not an integer: {0}'.format(
    #    doc_var_type) 
    logging.info('Check Total is: {0}'.format(client_check_tot))

    # RRD totals
    rrd_doc_count = 0
    rrd_check_tot = Decimal(0.0)

    with open(self.rpt_of, 'a') as rpt_outfile:
        for transact in self.transact_data:
            row_type = transact[0]
            logging.debug('Transaction type is: {0}'.format(row_type))

            if row_type == 'P':
                # Reconciliation
                rrd_doc_count += 1
                trans_chk_amt = Decimal(transact[12])                           trans_chk_type)
                rrd_check_tot += trans_chk_amt
                # Reporting
                vend_name = transact[2]
                file_name = self.infile.split('/')[-1]
                print('File name', file_name)
                check_num = transact[9]
                cur_time = date.today()
                rpt_outfile.write('{0:<50}{1:<50}{2:<30}{3}\n'.format(file_name,
                                                                      vend_name,
                                                                      check_num,
                                                                      cur_time))
    # Reconcile totals and return the lists for writing if they are correct
    # if (client_doc_count, client_check_tot) == (rrd_doc_count, rrd_check_tot):
    #     logging.info('Recon totals match!')
    if client_doc_count == rrd_doc_count and client_check_tot == rrd_check_tot:
    #     logging.info('Recon totals match!')
        return True

    else:
        raise ValueError('Recon totals do not match! Client: {0} {1} {2} {3}\n'
                         'RRD {4} {5} {6} {7}'.format(client_doc_count,
                                                      client_check_tot,
                                                      type(client_doc_count),
                                                      type(client_check_tot),
                                                      rrd_doc_count,
                                                      rrd_check_tot,
                                                      type(rrd_doc_count),
                                                      type(rrd_check_tot)))

小数通过将数字存储为基数10而不是像浮点数那样的基数2来工作。 Here是浮点不准确的一些例子。现在,由于我们所有的资金通常都是使用base-10进行交易,因此只使用基数10表示法来操纵它,而不是有效地转换为base-2然后再回到base-10。

答案 1 :(得分:0)

我不会依赖浮点相等检查来获取真实数据,因为浮点数学在各种奇怪的方式中都是不精确的。我建议首先确保这种差异是由浮点不精确引起的,通过打印您正在比较的两个值之间的差异,并确保它与您正在使用的数字相比非常非常小。然后我建议定义一个误差幅度,其中两个总数被认为是有效的;对于现实世界的钱来说,半分钱似乎是这种宽容的自然价值。