仅在数据框中带有数字的情况下,才如何替换该单词?

时间:2019-05-12 19:03:33

标签: python pandas dataframe

我正在尝试在数据框中搜索字典值中列出的某些单词(如果存在),将其替换为值的键。

units_dic= {'grams':['g','Grams'],
                'kg'   :['kilogram','kilograms']}

问题是某些单位缩写是字母,所以它也会替换所有字母,我只想在替换字母后加上数字以确保它是一个单位。

数据框

    Id | test 
    ---------
    1  |'A small paperclip has a mass of about 111 g'
    2  |'1 kilogram =1000 g'
    3  |'g is the 7th letter in the ISO basic Latin alphabet'

替换循环

  x = df.copy()
  for k in units_dic:
      for i in range(len(x['test'])):
          for w in units_dic[k]:
              x['test'][i] = str(x['test'][i]).replace(str(w), str(k))

输出

    Id | test 
    ---------
    1  |'A small paperclip has a mass of about 111 grams'
    2  |'1 kg =1000 grams'
    3  |'grams is the 7th letter in the ISO basic Latin alphabet'

3 个答案:

答案 0 :(得分:1)

尝试:

for key, val in units_dic.items(): 
    df['test'] = df['test'].replace("\d+[ ]*" + "|".join(val) , key , regex=True)

答案 1 :(得分:1)

正则表达式可用于翻转字典。

import re

d = {i: k for k, v in units_dic.items() for i in v}
u = r'|'.join(d)
v = fr'(\d+\s?)\b({u})\b'

df.assign(test=[re.sub(v, lambda x: x.group(1) + d[x.group(2)], el) for el in df.test])

   Id                                               test
0   1    A small paperclip has a mass of about 111 grams
1   2                                   1 kg =1000 grams
2   3  g is the 7th letter in the ISO basic Latin alp...

答案 2 :(得分:0)

我们可以在此处使用var data = [{ Sum: 4580, class: "01", pID: 1 }, { Sum: 580, class: "01", pID: 2 }, { Sum: 1280, class: "01", pID: 3 }, { Sum: 5580, class: "02", pID: 1 }, { Sum: 280, class: "02", pID: 2 }, { Sum: 380, class: "02", pID: 3 }], result = Array.from(data .reduce( (m, o) => m.set(o.class, Object.assign(m.get(o.class) || {}, { class: o.class, ['pID' + o.pID]: o.Sum })), new Map ) .values() ); console.log(result);的{​​{1}}功能,该功能可以指定它前面必须有一个数字和 optional 一个空格:

.as-console-wrapper { max-height: 100% !important; top: 0; }

lookbehind

说明
首先,我们使用raw + fstring:regex

正则表达式:

  • for k, v in units_dic.items(): df['test'] = df['test'].str.replace(f"(?<=[0-9])\s*({'|'.join(v)})\b", f' {k}') =后跟数字
  • print(df) Id test 0 1 'A small paperclip has a mass of about 111 grams' 1 2 '1 kg =1000 grams' 2 3 'g is the 7th letter in the ISO basic Latin al... 是空格
  • fr'sometext' 给我们您字典中以?<=[0-9]分隔的值 是正则表达式中的\s*运算符