为什么打印错误的结果?

时间:2016-03-24 02:12:15

标签: python hadoop

我是python中的新人,今天我编写了一个程序来从一些数据集中获取最大值对,但我写的程序没有给我正确的答案,代码是

#!/usr/bin/python

import sys

maxsale = 0
oldKey = None
# Loop around the data
# It will be in the format key\tval
# Where key is the store name, val is the sale amount
#
# All the sales for a particular store will be presented,
# then the key will change and we'll be dealing with the next store

for line in sys.stdin:
    data_mapped = line.strip().split("\t")
    if len(data_mapped) != 2:
        # Something has gone wrong. Skip this line.
        continue

    thisKey, thisSale = data_mapped

    if oldKey and oldKey != thisKey:
        print oldKey, "\t", maxsale
        oldKey = thisKey;
        oldsale = 0

    oldKey = thisKey
    if maxsale < thisSale:
        maxsale = thisSale
if oldKey != None:
    print oldKey, "\t", maxsale

数据集是:

Anchorage       298.86
Anchorage       6.38
Aurora  34.1
Aurora  10.6
Aurora  55.7
Austin  327.75
Austin  379.6
Austin  469.63
Austin  11.6

结果是:

Anchorage   6.38
Aurora  34.1
Austin  469.63

任何人都可以帮我处理这个问题吗?提前谢谢你!

2 个答案:

答案 0 :(得分:1)

首先,您不是将输入转换为数字。这意味着任何&#34;数字&#34;以'6'开头的数字大于任何&#34;数字&#34;以'2'开头,即使是'6.38''198.86'等值。

thisKey, thisSale = data_mapped
thisSale = float(thisSale)

接下来,您将oldSale设置为0,但从不参考它。我认为你打算在那里做maxSale = 0,重置新商店的价值。

最后,您在oldKey = thisKey;区块中不需要if,因为您之后会立即执行此操作。

请注意,当您将值转换为该货币的最小面额并使用整数时,货币计算效果最佳,因为浮点计算并非始终完全准确,您可能会出现舍入错误。看起来您的数据不能保证有尾随零,因此您必须检查字符串中的小数点,如果存在,则分成小数点,依此类推。

thisKey, thisSale = data_mapped
if '.' not in thisSale:
    thisSale = int(thisSale)*100
else:
    dollars, cents = thisSale.split('.')
    if len(cents) < 2:
        cents += '0'
    thisSale = int(dollars)*100 + int(cents)

对表示分数的整数进行财务计算,然后在必要时将值格式化为美元和分数以用于显示目的:

>>> '%.2f' % (29886/100.)
'298.86'
>>> '{:.02f}'.format(29886/100.)
'298.86'

答案 1 :(得分:0)

#!/usr/bin/python

import sys

maxsale = 0
oldKey = None
# Loop around the data
# It will be in the format key\tval
# Where key is the store name, val is the sale amount
#
# All the sales for a particular store will be presented,
# then the key will change and we'll be dealing with the next store
d = dict() 
for line in sys.stdin:
    data_mapped = line.strip().split("\t")
    if len(data_mapped) != 2:
        # Something has gone wrong. Skip this line.
        continue

    key,value = data_mapped
    if (key in d) and d[key] < float(value):
        d[key] = float(value)
    elif not key in d:
        d[key] = float(value)

for k,v in d.items():
    print k,'\t',v