将2列与另一列中的条件部分匹配

时间:2019-02-22 11:25:34

标签: python pandas

问题陈述

想在条件为另一个条件较低的另一列的df列之间执行str.contains

  1. 第一个希望看到1_Match1_1_Match是或否,如果不是,则2_Match变得不适用
  2. 如果1_Match为或1_1_Match为是,则要检查Country(欧盟)是否在Nation(欧洲)中/包含。如果是,则2_Match变为
  3. 如果其中不包含Country(APAC)和Nation(印度)之间或部分不匹配,则2_Match

DF1

Country          Nation         1_Match   1_1_Match
EU               Europe         Yes       No
MA               MACOPEC        No        No
APAC             INDIA          Yes       No
COPEC            MACOPEC        No        Yes
COPEC            India          No        Yes 

预期输出:

DF1

Country       Nation           1_Match       1_1_Match   2_Match
EU            Europe             Yes           No        Yes
MA            MACOPEC            No            No        Not Applicable
APAC          INDIA              Yes           No        No
COPEC         MACOPEC            No            Yes       Yes
Copec         India              No            Yes       No

代码(不起作用):我正在为条件2&3编写代码,但是它抛出错误,然后我也想容纳条件1

df1['2_Match']  = np.where(df1['Country'].str.strip().str.lower().str.contains(df1['Nation'].str.strip().str.lower().astype(str)),'Yes','No')

1 个答案:

答案 0 :(得分:1)

numpy.selectin一起使用列表理解来检查列之间的子查询:

m1 = df['1_Match'] == 'No'
m2 = [c.lower() in n.lower() for c, n in zip(df['Country'], df['Nation'])]
masks = [m1, m2]
vals = ['Not Applicable','Yes']

df['2_Match'] = np.select(masks, vals, default='No')
print (df)
  Country   Nation 1_Match         2_Match
0      EU   Europe     Yes             Yes
1      MA  MACOPEC      No  Not Applicable
2    APAC    INDIA     Yes              No

编辑:

m1 = df['1_Match'] == 'No'
m2 = [c.lower() in n.lower() for c, n in zip(df['Country'], df['Nation'])]

m3 = df['1_1_Match'] == 'Yes'

masks = [m3, m1, m2]
vals = ['Yes', 'Not Applicable','Yes']

df['2_Match'] = np.select(masks, vals, default='No')
print (df)
  Country   Nation 1_Match 1_1_Match         2_Match
0      EU   Europe     Yes        No             Yes
1      MA  MACOPEC      No        No  Not Applicable
2    APAC    INDIA     Yes        No              No
3   COPEC  MACOPEC      No       Yes             Yes

编辑2:

masks = [m1 & ~m3, m2]
vals = ['Not Applicable','Yes']
print (df)
  Country   Nation 1_Match 1_1_Match         2_Match
0      EU   Europe     Yes        No             Yes
1      MA  MACOPEC      No        No  Not Applicable
2    APAC    INDIA     Yes        No              No
3   COPEC  MACOPEC      No       Yes             Yes
4   COPEC  India        No       Yes             No