用整数而不是元组更新列

时间:2019-06-18 16:08:10

标签: python python-3.x pandas

我有一个包含3列的数据框,我只想遍历该数据框的第二列(即元组列表)。我只想列出该列表中每个元组的最后一个元素

我已经制作了使用Fuzzywuzzy进行文本匹配的脚本。

import pandas as pd
from fuzzywuzzy import process, fuzz



pd.set_option('display.width', 1000)
pd.set_option('display.max_columns', 10)

data = pd.read_csv(r"address_details.csv", skiprows=0)
id = data['COD_CUST_ID'].values.tolist()
address = data['ADDRESS'].values.tolist()

dict_list=[]

for i in range(0,len(id)):
    for add in range(0,len(address)):
        score=process.extractBests(address[add], address, limit=len(address), score_cutoff=40)
        #print(score)

        dict_={}
        dict_.update({"Cust_Id": id[i]})
        dict_.update({"Match Ratio": score})
        dict_.update({"Search String": address[add]})
        #dict_.update({"Address List": address})

        dict_list.append(dict_)

df=pd.DataFrame(dict_list)
print(df)

s=df.to_csv("match_score.csv",sep=',',index=None)

原始CSV数据

Cust_Id Match Ratio Search String
21527575    [('H.NO.407,ROOM NO.310. 3RD FLOOR MAQBOOL APARTMENT APARTMENT OPP, RABIYA MASJID MANGAL BAZAR SLAP KOT THANE MAHARASHTRA 421302', 100)]    H.NO.407,ROOM NO.310. 3RD FLOOR MAQBOOL APARTMENT APARTMENT OPP, RABIYA MASJID MANGAL BAZAR SLAP KOT THANE MAHARASHTRA 421302
21527575    [('H.NO.407, ROOM NO.310, 3RD FLOOR MAQBOOL APARTMENT OPP,RABIYA MASJID MANGAL BAZAR SLAP KOTER GATE THANE MAHARASHTRA 421302', 100)]   H.NO.407, ROOM NO.310, 3RD FLOOR MAQBOOL APARTMENT OPP,RABIYA MASJID MANGAL BAZAR SLAP KOTER GATE THANE MAHARASHTRA 421302
21527575    [('FLAT NO.103, 1ST FLOOR B-WING,CTS NO.388,KAAP TALAVO  ZAITOON PURA BEHIND KOTER GATE MASJID BHIWANDI THANE MAHARASHTRA 421302', 100)]    FLAT NO.103, 1ST FLOOR B-WING,CTS NO.388,KAAP TALAVO  ZAITOON PURA BEHIND KOTER GATE MASJID BHIWANDI THANE MAHARASHTRA 421302
21527575    [('VPO. SAHWA   CHURU RAJASTHAN 331302', 100)]  VPO. SAHWA   CHURU RAJASTHAN 331302
21527575    [('WARD NO.-3 NATT ROAD TALWANDI SABO BATHINDA  BATHINDA PUNJAB 151302', 100)]  WARD NO.-3 NATT ROAD TALWANDI SABO BATHINDA  BATHINDA PUNJAB 151302
21527575    [('H.NO.-137 RAMA ROAD TALWANDI SABO BATHINDA  BATHINDA PUNJAB 151302', 100)]   H.NO.-137 RAMA ROAD TALWANDI SABO BATHINDA  BATHINDA PUNJAB 151302
21527575    [('WARD NO 25 GHADSISAR ROAD BASANT KUNJ KE SAMNE HANUMAN MANDIR KE PASS CHOUDHARY COLONY GANGASHAR BIKANER RAJASTHAN 334001', 100)]    WARD NO 25 GHADSISAR ROAD BASANT KUNJ KE SAMNE HANUMAN MANDIR KE PASS CHOUDHARY COLONY GANGASHAR BIKANER RAJASTHAN 334001
21527575    [('Karchha Kalan   UDAIPUR RAJASTHAN 313803', 100)] Karchha Kalan   UDAIPUR RAJASTHAN 313803
21527575    [('VAGPUR KARCHCHA KALAN   UDAIPUR RAJASTHAN 313803', 100)] VAGPUR KARCHCHA KALAN   UDAIPUR RAJASTHAN 313803
21527575    [('VILLAGE GORIYAN TEHSIL UDAIPURWATI DIST JHUNJHUNU  JHUJHUNU RAJASTHAN 333307', 100)] VILLAGE GORIYAN TEHSIL UDAIPURWATI DIST JHUNJHUNU  JHUJHUNU RAJASTHAN 333307

所需的输出:

Cust_Id Match Ratio Search String
21527575    100 H.NO.407,ROOM NO.310. 3RD FLOOR MAQBOOL APARTMENT APARTMENT OPP, RABIYA MASJID MANGAL BAZAR SLAP KOT THANE MAHARASHTRA 421302
21527575    100 H.NO.407, ROOM NO.310, 3RD FLOOR MAQBOOL APARTMENT OPP,RABIYA MASJID MANGAL BAZAR SLAP KOTER GATE THANE MAHARASHTRA 421302
21527575    100 FLAT NO.103, 1ST FLOOR B-WING,CTS NO.388,KAAP TALAVO  ZAITOON PURA BEHIND KOTER GATE MASJID BHIWANDI THANE MAHARASHTRA 421302
21527575    100 VPO. SAHWA   CHURU RAJASTHAN 331302
21527575    100 WARD NO.-3 NATT ROAD TALWANDI SABO BATHINDA  BATHINDA PUNJAB 151302
21527575    100 H.NO.-137 RAMA ROAD TALWANDI SABO BATHINDA  BATHINDA PUNJAB 151302
21527575    100 WARD NO 25 GHADSISAR ROAD BASANT KUNJ KE SAMNE HANUMAN MANDIR KE PASS CHOUDHARY COLONY GANGASHAR BIKANER RAJASTHAN 334001
21527575    100 Karchha Kalan   UDAIPUR RAJASTHAN 313803
21527575    100 VAGPUR KARCHCHA KALAN   UDAIPUR RAJASTHAN 313803
21527575    100 VILLAGE GORIYAN TEHSIL UDAIPURWATI DIST JHUNJHUNU  JHUJHUNU RAJASTHAN 333307

2 个答案:

答案 0 :(得分:2)

列名不清楚,因此我正在编写常规代码。

在此,我通过元组的第3个元素更新B列。

希望有帮助:)

 df['B'] = df['B'].apply(lambda x: x[0][1])

示例程序::

import pandas as pd
Cars = {'A': [21527575],
        'B': [[('H.NO.407,ROOM NO.310. 3RD FLOOR MAQBOOL APARTMENT APARTMENT OPP, RABIYA MASJID MANGAL BAZAR SLAP KOT THANE MAHARASHTRA 421302', 100)]],
        'C' : [' H.NO.407,ROOM NO.310. 3RD FLOOR MAQBOOL APARTMENT APARTMENT OPP, RABIYA MASJID MANGAL BAZAR SLAP KOT THANE MAHARASHTRA 421302']
        }
data = pd.DataFrame(Cars)

data['B'] = data['B'].apply(lambda x: x[0][1])
print(data)

输出::

     A    B                                                  C
0  21527575  100   H.NO.407,ROOM NO.310. 3RD FLOOR MAQBOOL APARTMENT APARTMENT OPP, RABIYA MASJID MANGAL BAZAR SLAP KOT THANE MAHARASHTRA 421302

答案 1 :(得分:1)

您的数据框有点不清楚。

看看这是否能解决您的问题。

import pandas as pd

# Sample data frame
data = pd.DataFrame({'a': [1, 2, 3], 'b': [[(1, 2)], [(2, 3)], [(3, 4)]]})
print(data)

# Data
   a         b
0  1  [(1, 2)]
1  2  [(2, 3)]
2  3  [(3, 4)]

# Fix
# [-1] selects last element in tuple
data['b'] = data['b'].apply(lambda x: x[0][-1])
print(data)

# Result
   a  b
0  1  2
1  2  3
2  3  4
相关问题