将百万、十亿和万亿转换为 Python 中的数字

时间:2021-04-21 04:07:30

标签: python regex formatting

我有一列包含诸如“5.00 M”、“1.00 T”和“1.29 Juta”之类的值,并且想要一种简单的方法将其转换为数值。我试过了

import re
powers = {'M': 10 ** 9, 'T': 10 ** 12, 'Juta': 10 ** 6}
var1 = ['4', '7149', '6184.09', '0.00', '8', '134944', '5187.33', '5.00 M', '17', '74104', '60773.22', '260.00 M', '7', '347334', '451922.68', '1.00 T', '80', '18469', '483386.83', '2.50 M', '12', '4716', '14946.30', '0.00', '18', '7119', '111617.66', '0.00', '31', '23131', '814413.09', '0.00', '21', '16281', '192020.50', '0.00', '20', '98381', '57850.37', '0.00', '31', '12501', '39384.40', '0.00', '31', '2851', '1.29 Juta', '0.00', '34', '9440', '171364.82', '0.00', '26', '25442', '54394.00', '0.00', '24', '2492', '165295.95', '0.00', '12', '675', '51301.40', '0.00', '7', '5', '8057.77', '0.00', '6', '704', '35579.19', '0.00', '5', '2133', '15683.20', '0.00', '3', '1356', '5021.00', '0.00', '3', '966', '5456.32', '0.00', '5', '2636', '4097.42', '0.00', '8', '1878', '4554.50', '0.00', '6', '3518', '13900.00', '0.00', '2', '1', '61000.00', '0.00', '3', '0', '1688.00', '0.00', '4', '10', '1488.33', '0.00', '0', '0', '0.00', '0.00', '0', '0', '0.00', '0.00', '2', '0', '4054.00', '0.00', '0', '0', '0.00', '0.00']

def f(num_str):
    match = re.search(r"([0-9\.]+)\s?(M|T|Juta)", num_str)
    if match is not None:
        quantity = match.group(0)
        magnitude = match.group(1)
        return float(quantity) * powers[magnitude]

for i in var1:
    x = f(i)
    print(x)

但是我收到了这个错误:

None
None
None
None
None
None
None
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-23-8dd2f89076c3> in <module>
      1 for i in var1:
----> 2     x = f(i)
      3     print(x)

<ipython-input-22-cb419bc71fb8> in f(num_str)
      7         quantity = match.group(0)
      8         magnitude = match.group(1)
----> 9         return float(quantity) * powers[magnitude]

ValueError: could not convert string to float: '5.00 M'

2 个答案:

答案 0 :(得分:4)

只需使用 group(1)group(2),因为 group(0)entire matching string

import re
powers = {'M': 10 ** 9, 'T': 10 ** 12, 'Juta': 10 ** 6}
var1 = ['4', '7149', '6184.09', '0.00', '8', '134944', '5187.33', '5.00 M', '17', '74104', '60773.22', '260.00 M', '7', '347334', '451922.68', '1.00 T', '80', '18469', '483386.83', '2.50 M', '12', '4716', '14946.30', '0.00', '18', '7119', '111617.66', '0.00', '31', '23131', '814413.09', '0.00', '21', '16281', '192020.50', '0.00', '20', '98381', '57850.37', '0.00', '31', '12501', '39384.40', '0.00', '31', '2851', '1.29 Juta', '0.00', '34', '9440', '171364.82', '0.00', '26', '25442', '54394.00', '0.00', '24', '2492', '165295.95', '0.00', '12', '675', '51301.40', '0.00', '7', '5', '8057.77', '0.00', '6', '704', '35579.19', '0.00', '5', '2133', '15683.20', '0.00', '3', '1356', '5021.00', '0.00', '3', '966', '5456.32', '0.00', '5', '2636', '4097.42', '0.00', '8', '1878', '4554.50', '0.00', '6', '3518', '13900.00', '0.00', '2', '1', '61000.00', '0.00', '3', '0', '1688.00', '0.00', '4', '10', '1488.33', '0.00', '0', '0', '0.00', '0.00', '0', '0', '0.00', '0.00', '2', '0', '4054.00', '0.00', '0', '0', '0.00', '0.00']

def f(num_str):
    match = re.search(r"([0-9\.]+)\s?(M|T|Juta)", num_str)
    if match is not None:
        quantity = match.group(1)
        magnitude = match.group(2)
        return float(quantity) * powers[magnitude]
    else:
        return num_str

for i in var1:
    x = f(i)
    print(x)

答案 1 :(得分:0)

除了使用错误的组号之外,您的正则表达式还有一些问题。您可以按如下方式修复它:

def f(num_str):
    # regex below has been replaced
    match = re.search(r"(\d+(?:.\d+)?)\s?(M|T|Juta)?", num_str)    # added a ? after Juta) and replaced regex for numeric part.
    if match is not None:
        quantity = match.group(1)
        if match.group(2):                # added a test before to check if magnitude exists
            magnitude = match.group(2)
            return float(quantity) * powers[magnitude]
        else:                             # added a else condition for without magnitude
            return float(quantity)
        
for i in var1:
    x = f(i)
    print(x)

事实上,您的数字部分的正则表达式 [0-9\.]+ 不正确。最好使用 \d+(?:.\d+)?\d+ 作为整数部分和可选的小数部分 (.\d+)?,小数部分包含在 (?: ) 中,以使其成为非捕获组。

相关问题