根据R

时间:2017-12-05 07:36:33

标签: r dataframe grepl

我有一个名为Test1的数据框,其中包含230,000家公司。我需要做的是将Tests1分组为一个名为FinalDS的新DF。

我创建了一个名为Customers的列表,其中包含我需要放入FinalDS DF的客户端的几个名称变体(大约100k)。 我正在寻找的是R来查看我的Customers DF并在Test1 DF中查找客户名称但是!....我需要的是R来扫描Customers DF并查看它是否可以匹配Customers DF中Test1 DF

中客户名称的任何部分

例如:

我在Customers DF上有这个客户: Centrica PLC

但是在Test1 DF我有Centrica所以通过定义将没有匹配。我知道我可以通过删除PLC DF中的Customers部分来让所有客户匹配,但我有一个大约10万客户的列表。

这是我写的代码:

Customers = c("Adidas","ADIDAS GROUP","ALIBABA GROUP","ALIBABA.COM (EUROPE) LTD"
              ,"Apple Asia Pacific Pte Ltd" ,"APPLE DISTRIBUTION INTERNATIONAL"
              ,"APPLE EUROPE LTD","Apple Sales International"
              ,"AVIVA-PLC","Aviva -Norwich Union"
              ,"Aviva -Norwich Union-MSP","AVIVA PLC"
              ,"AXA TECHNOLOGY SERVICES UK LTD","AXA UK PLC"
              ,"Bank of Baroda","Bank of Baroda"
              ,"BARCLAYS","BARCLAYS BANK PLC"
              ,"BARCLAYS PLC","BRAVURA SOLUTIONS LTD"
              ,"CENTRICA PLC","CISCO"
              ,"Cisco Systems LTD","CSC (NG)-MSP"
              ,"CSC COMPUTER SCIENCES LTD","EMC CORPORATION"
              ,"GE Infrastructure UK Limited","GE MEDICAL SYSTEMS INFORMATION TECHNOLOGIES GMBH")

FinalDS = subset(Test1, grepl(paste(Customers, collapse = "|"), Test1$Customer_Name))

所有这一切都是尝试逐字逐句地匹配我Customer列表中Test1 DF

的内容

请帮助!

1 个答案:

答案 0 :(得分:1)

这个怎么样?

FinalDS = subset(
    Test1, 
    grepl(paste0("(", paste(Customers, collapse = "|"), ")"), Customer_Name))