从另一个数据帧更新数据帧行和列的子集

时间:2017-01-03 09:12:15

标签: r

df1 <- data.frame(w = 1:4, x = c("a", "b", "b", "c"), y = NA)
df1$y[df1$x == "c"] = 1
df2 <- data.frame(x = c("a", "b"), y = 1:2, z = 3:4)

我想用df2中的值更新df1以匹配行,包括所有df2列。

预期结果:

df1
  w x  y  z
1 1 a  1  3
2 2 b  2  4
3 3 b  2  4
4 4 c  1 NA

这是我的尝试:

# add missing columns from df2 to df1
df1[setdiff(colnames(df2), colnames(df1))] <- NA
# update values in df1 from df2 for matching x
df1[df1$x %in% df2$x, colnames(df2[,-1])] <- df2[match(df1$x, df2$x),-1]

这不起作用,因为df2 [match(df1 $ x,df2 $ x), - 1]也包含不匹配。这样做的正确方法是什么? (我的实际数据帧有更多的行和列。)

EDIT1:根据请求,该示例并不代表我实际数据的完整复杂性。在现实生活中,我正在处理df办公室(df1)和结果(df2)。专栏&#34; x&#34;是&#34;地址&#34;。

colnames(offices)
 [1] "org_order"                   "organizations.api_path"      "type"                       
 [4] "uuid"                        "name"                        "street_1"                   
 [7] "street_2"                    "postal_code_o"                 "city"                       
[10] "city_web_path"               "region"                      "region_code2"               
[13] "region_web_path"             "country_o"                     "country_code2"              
[16] "country_code3"               "country_web_path"            "latitude"                   
[19] "longitude"                   "created_at"                  "updated_at"                 
[22] "address"                     "administrative_area_level_1" "administrative_area_level_2"
[25] "country"                   "locality"                    "neighborhood"               
[28] "postal_code"               "route"                       "street_number"              
colnames(results)
 [1] "address"                     "administrative_area_level_1" "administrative_area_level_2"
 [4] "country"                     "locality"                    "neighborhood"               
 [7] "postal_code"                 "route"                       "street_number"              
[10] "subpremise"                  "postal_code_suffix"          "premise"                    
[13] "political" 

请注意,这是此时的图片。我希望df结果可以添加更多列(它通过Google Maps API更新,根据位置返回变量字段)。

EDIT2:以下是来自df办事处和结果的样本数据。

structure(list(org_order = c("1", "3", "4", "8", "9", "10", "11", 
"11", "13", "13", "14", "15", "16", "18", "21", "23", "24", "25", 
"26", "27"), organizations.api_path = c("organizations/care1st-health-plan-arizona", 
"organizations/telecommunications-and-information-administration", 
"organizations/multimatecollection-com", "organizations/ion-holdco", 
"organizations/itris-bv", "organizations/crius-energy", "organizations/uwc-mahindra-college", 
"organizations/uwc-mahindra-college", "organizations/ustech-discovery", 
"organizations/ustech-discovery", "organizations/play-park-structures", 
"organizations/wings-rotors", "organizations/attractive-world", 
"organizations/bigtoys", "organizations/intercontec-produkt-gmbh", 
"organizations/elitesingles", "organizations/rumi-x-limited", 
"organizations/gt-grandstands", "organizations/new-health-capital-partners", 
"organizations/urban-media-2"), type = c("Address", "Address", 
"Address", "Address", "Address", "Address", "Address", "Address", 
"Address", "Address", "Address", "Address", "Address", "Address", 
"Address", "Address", "Address", "Address", "Address", "Address"
), uuid = c("e28deae838eb745ab093bb3d2f66963a", "ae150251316913fd72a1ba9df18bd686", 
"d49e135e7b61d01acf6609e1c5c1a34d", "359399c4f6db344f530897caabce666e", 
"2b4a469b217e5090611a539d952412f3", "3939cfa28f2191535bcdaf4297bfa488", 
"86620bc9b675150a847496bcfbb8f6b9", "ee869cc495e84ebdbbbbd6db0ed5d3c0", 
"f31c6e197494192ec47bcb36d14bc206", "2fb8855d0bcccde661aacbbf69d92a3e", 
"724967633f0be037df09bf63e8bd932d", "8d765e1b0111298f2e295883ac0c1f52", 
"ea0cd535b2a113b9c7d1cbec466963eb", "a7d69835d7abdbee8d28c77e9c54f176", 
"243b70fedc6ff9f6b68bb1ae1cb5f5a1", "7cec7f51b87fff47acf74a0424c88d30", 
"e1c4108f8e4d902ed3524e98418cb157", "666dffa11b2de8aa35307a35d19ec117", 
"d5d56654e892d46c303deebd9b71aaaa", "deb420cf0a669897e0058ab854e928bc"
), name = c("Headquarters", "Headquarters", "Delhi", "Headquarters", 
"Headquarters", "HQ", "Israel Office", "Headquarters", "Israel Office", 
"Headquarters", "Headquarters", "Israel Office", "Headquarters", 
"Headquarters", "HQ", "Headquarters", "Headquarters & Showroom", 
"Headquarters", "Headquarters", "Israel Office"), street_1 = c("2355 E. Camelback Road", 
"1401 Constitution Ave NW", "", "80 State Street 7th Floor", 
"Spray Gaarde 46", "1055 Washington Blvd., 7th Floor", "", "Village Khubavali, PO Paud Taluka Mulshi", 
"", "231 Lagrange Street", "401 Chestnut St", "st. ha Rav Bar Shaul 6", 
"", "7721 New Market Street", "Bernrieder Str. 15", "", "UNIT 1801, 18/F WAGA COMMERCIAL CENTER", 
"2810 Sydney Road", "1350 Avenue of the Americas", ""), street_2 = c("Suite 300", 
"", "", "", "", "", "", "", "", "", "Suite 410", "", "", "", 
"", "", "99 WELLINGTON ST, CENTRAL,", "", "9th Floor", ""), postal_code_cb = c("85016", 
"20230", "", "12207", "03436", "06901", "", "412 108", "", "02132", 
"37402", "7625149", "", "98501", "94559", "", "", "33566-1173", 
"10019", ""), city = c("Phoenix", "Washington", "", "Albany", 
"Nieuwegein", "Stamford", "", "Pune", "", "Boston", "Chattanooga", 
"Rehovot", "Paris", "Olympia", "Niederwinkling", "Dover", "Hong Kong", 
"Plant City", "New York", ""), city_web_path = c("location/phoenix/e55cf011f3c746498d0d801bc399dc52", 
"location/washington/09d220d540bacd296f7a5035ff6e6bef", "location/india/44048bf7db640d7adb20fd3c1ebf47b0", 
"location/albany/b40478e7b38920051a2a0f501b6e8c65", "location/nieuwegein/93f71c3141f8b16e539839308acd911e", 
"location/stamford/01272fa9253161fb0b1e69fe0fa8a53c", "location/israel/ab89e1a9013f82613b5f60487350334d", 
"location/pune/2c69941daa1ea666db3cb74f2029ae62", "location/israel/ab89e1a9013f82613b5f60487350334d", 
"location/boston/9898e533996b05148cc5302f60454c02", "location/chattanooga/9ecb7425cc53be56a353c7a127bf2f56", 
"location/rehovot/fd22cd6db48f2c192068a1bc99853c08", "location/paris/5805284c00a4f0aab8115843398ff2bb", 
"location/olympia/b289b456f00ca6c0536b2032da46985f", "location/niederwinkling/026a83aec4e324d209cd3e1b257b55d5", 
"location/dover/796562878525d2957e1dc10a32a5e13e", "location/hong-kong/ad27df91c866cb08f3b624121637bf69", 
"location/plant-city/646eec50d4fc27131241e0bebdae3673", "location/new-york/d64b7615985cfbf44affaa89d70c4050", 
"location/israel/ab89e1a9013f82613b5f60487350334d"), region = c("Arizona", 
"District of Columbia", "", "New York", "Utrecht", "Connecticut", 
"", "Maharashtra", "", "Massachusetts", "Tennessee", "HaMerkaz", 
"Ile-de-France", "Washington", "Bayern", "Delaware", "Hong Kong Island", 
"Florida", "New York", ""), region_code2 = c("AZ", "DC", "", 
"NY", "9", "CT", "", "16", "", "MA", "TN", "2", "A8", "WA", "2", 
"DE", "", "FL", "NY", ""), region_web_path = c("", "", "", "", 
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", ""
), country_cb = c("United States", "United States", "India", 
"United States", "The Netherlands", "United States", "Israel", 
"India", "Israel", "United States", "United States", "Israel", 
"France", "United States", "Germany", "United States", "Hong Kong", 
"United States", "United States", "Israel"), country_code2 = c("US", 
"US", "IN", "US", "NL", "US", "IL", "IN", "IL", "US", "US", "IL", 
"FR", "US", "DE", "US", "HK", "US", "US", "IL"), country_code3 = c("USA", 
"USA", "IND", "USA", "NLD", "USA", "ISR", "IND", "ISR", "USA", 
"USA", "ISR", "FRA", "USA", "DEU", "USA", "HKG", "USA", "USA", 
"ISR"), country_web_path = c("", "", "", "", "", "", "", "", 
"", "", "", "", "", "", "", "", "", "", "", ""), latitude = c("", 
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
"", "", ""), longitude = c("", "", "", "", "", "", "", "", "", 
"", "", "", "", "", "", "", "", "", "", ""), created_at = c("1475743389", 
"1475899280", "1475743519", "1475899256", "1475742451", "1475742715", 
"1475742786", "1475742171", "1475741845", "1475741660", "1475899240", 
"1475741599", "1475741472", "1475899218", "1475741445", "1475741067", 
"1475742388", "1475899207", "1475740101", "1475740590"), updated_at = c("1475743390", 
"1475899280", "1475743519", "1475899257", "1475742451", "1475742715", 
"1475742786", "1475742171", "1475741845", "1475741660", "1475899240", 
"1475741599", "1475741472", "1475899218", "1475741445", "1475741067", 
"1475742388", "1475899207", "1475740101", "1475740590"), address = c("2355 E. Camelback Road Suite 300 85016 Phoenix Arizona", 
"1401 Constitution Ave NW  20230 Washington District of Columbia", 
"", "80 State Street 7th Floor  12207 Albany New York", "Spray Gaarde 46  03436 Nieuwegein Utrecht", 
"1055 Washington Blvd., 7th Floor  06901 Stamford Connecticut", 
"", "Village Khubavali, PO Paud Taluka Mulshi  412 108 Pune Maharashtra", 
"", "231 Lagrange Street  02132 Boston Massachusetts", "401 Chestnut St Suite 410 37402 Chattanooga Tennessee", 
"st. ha Rav Bar Shaul 6  7625149 Rehovot HaMerkaz", "Paris Ile-de-France", 
"7721 New Market Street  98501 Olympia Washington", "Bernrieder Str. 15  94559 Niederwinkling Bayern", 
"Dover Delaware", "UNIT 1801, 18/F WAGA COMMERCIAL CENTER 99 WELLINGTON ST, CENTRAL,  Hong Kong Hong Kong Island", 
"2810 Sydney Road  33566-1173 Plant City Florida", "1350 Avenue of the Americas 9th Floor 10019 New York New York", 
""), administrative_area_level_1 = c("Arizona", "District of Columbia", 
NA, "New York", NA, "Connecticut", NA, NA, NA, "Massachusetts", 
"Tennessee", "Center District", "?e-de-France", "Washington", 
"Bayern", "Delaware", NA, NA, NA, NA), administrative_area_level_2 = c("Maricopa County", 
NA, NA, "Albany County", NA, "Fairfield County", NA, NA, NA, 
"Suffolk County", "Hamilton County", "Rehovot", "Paris", "Thurston County", 
"Niederbayern", "Kent County", NA, NA, NA, NA), country = c("United States", 
"United States", NA, "United States", NA, "United States", NA, 
NA, NA, "United States", "United States", "Israel", "France", 
"United States", "Germany", "United States", NA, NA, NA, NA), 
    locality = c("Phoenix", "Washington", NA, "Albany", NA, "Stamford", 
    NA, NA, NA, "Boston", "Chattanooga", "Rehovot", "Paris", 
    "Tumwater", "Niederwinkling", "Dover", NA, NA, NA, NA), neighborhood = c("Camelback East Village", 
    "Northwest Washington", NA, "Downtown", NA, "Downtown", NA, 
    NA, NA, "West Roxbury", NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA), postal_code = c("85016", "20230", NA, "12207", NA, "06901", 
    NA, NA, NA, "02132", "37402", NA, NA, "98501", "94559", NA, 
    NA, NA, NA, NA), route = c("East Camelback Road", "Constitution Avenue Northwest", 
    NA, "State Street", NA, "Washington Boulevard", NA, NA, NA, 
    "Lagrange Street", "Chestnut Street", "HaRav Bar Shaul Street", 
    NA, "New Market Street Southwest", "Bernrieder Stra?", NA, 
    NA, NA, NA, NA), street_number = c("2355", "1401", NA, "80", 
    NA, "1055", NA, NA, NA, "231", "401", "6", NA, "7721", "15", 
    NA, NA, NA, NA, NA), subpremise = c("300", NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, "410", NA, NA, NA, NA, NA, NA, NA, NA, 
    NA)), .Names = c("org_order", "organizations.api_path", "type", 
"uuid", "name", "street_1", "street_2", "postal_code_cb", "city", 
"city_web_path", "region", "region_code2", "region_web_path", 
"country_cb", "country_code2", "country_code3", "country_web_path", 
"latitude", "longitude", "created_at", "updated_at", "address", 
"administrative_area_level_1", "administrative_area_level_2", 
"country", "locality", "neighborhood", "postal_code", "route", 
"street_number", "subpremise"), class = c("data.table", "data.frame"
), row.names = c(NA, -20L), .internal.selfref = <pointer: 0x00000000001e0788>)

structure(list(address = structure(1:18, .Label = c("2355 E. Camelback Road Suite 300 85016 Phoenix Arizona", 
    "1401 Constitution Ave NW  20230 Washington District of Columbia", 
    "80 State Street 7th Floor  12207 Albany New York", "Spray Gaarde 46  3436 Nieuwegein Utrecht", 
    "1055 Washington Blvd., 7th Floor  6901 Stamford Connecticut", 
    "Village Khubavali, PO Paud Taluka Mulshi  412 108 Pune Maharashtra", 
    "231 Lagrange Street  2132 Boston Massachusetts", "401 Chestnut St Suite 410 37402 Chattanooga Tennessee", 
    "st. ha Rav Bar Shaul 6  7625149 Rehovot HaMerkaz", "Paris Ile-de-France", 
    "7721 New Market Street  98501 Olympia Washington", "Bernrieder Str. 15  94559 Niederwinkling Bayern", 
    "Dover Delaware", "UNIT 1801, 18/F WAGA COMMERCIAL CENTER 99 WELLINGTON ST, CENTRAL,  Hong Kong Hong Kong Island", 
    "2810 Sydney Road  33566-1173 Plant City Florida", "1350 Avenue of the Americas 9th Floor 10019 New York New York", 
    "Askanischer Platz 3  10963 Berlin Berlin", "Level 2, 145 Flinders Lane  03000 Melbourne Victoria"
    ), class = "factor"), administrative_area_level_1 = c("Arizona", 
    "District of Columbia", "New York", NA, "Connecticut", NA, "Massachusetts", 
    "Tennessee", "Center District", "?e-de-France", "Washington", 
    "Bayern", "Delaware", "Hong Kong Island", "Florida", "New York", 
    "Berlin", "Victoria"), administrative_area_level_2 = c("Maricopa County", 
    NA, "Albany County", NA, "Fairfield County", NA, "Suffolk County", 
    "Hamilton County", "Rehovot", "Paris", "Thurston County", "Niederbayern", 
    "Kent County", NA, "Hillsborough County", "New York County", 
    NA, "Melbourne City"), country = c("United States", "United States", 
    "United States", NA, "United States", NA, "United States", "United States", 
    "Israel", "France", "United States", "Germany", "United States", 
    "Hong Kong", "United States", "United States", "Germany", "Australia"
    ), locality = c("Phoenix", "Washington", "Albany", NA, "Stamford", 
    NA, "Boston", "Chattanooga", "Rehovot", "Paris", "Tumwater", 
    "Niederwinkling", "Dover", NA, "Plant City", "New York", "Berlin", 
    "Melbourne"), neighborhood = c("Camelback East Village", "Northwest Washington", 
    "Downtown", NA, "Downtown", NA, "West Roxbury", NA, NA, NA, NA, 
    NA, NA, "Central", NA, NA, NA, NA), postal_code = c("85016", 
    "20230", "12207", NA, "06901", NA, "02132", "37402", NA, NA, 
    "98501", "94559", NA, NA, "33566", "10019", "10963", "3000"), 
        route = c("East Camelback Road", "Constitution Avenue Northwest", 
        "State Street", NA, "Washington Boulevard", NA, "Lagrange Street", 
        "Chestnut Street", "HaRav Bar Shaul Street", NA, "New Market Street Southwest", 
        "Bernrieder Stra?", NA, "Wellington Street", "Sydney Road", 
        "6th Avenue", "Askanischer Platz", "Flinders Lane"), street_number = c("2355", 
        "1401", "80", NA, "1055", NA, "231", "401", "6", NA, "7721", 
        "15", NA, "99", "2810", "1350", "3", "145"), subpremise = c("300", 
        NA, NA, NA, NA, NA, NA, "410", NA, NA, NA, NA, NA, "1801", 
        NA, NA, NA, "2"), postal_code_suffix = c(NA, NA, NA, NA, 
        NA, NA, "3450", NA, NA, NA, NA, NA, NA, NA, "1173", NA, NA, 
        NA), premise = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
        NA, NA, NA, "Waga Commercial Centre", NA, NA, NA, NA), political = c(NA, 
        NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "Manhattan", 
        "Bezirk Friedrichshain-Kreuzberg", NA)), .Names = c("address", 
    "administrative_area_level_1", "administrative_area_level_2", 
    "country", "locality", "neighborhood", "postal_code", "route", 
    "street_number", "subpremise", "postal_code_suffix", "premise", 
    "political"), row.names = c(NA, -18L), class = c("data.table", 
    "data.frame"), .internal.selfref = <pointer: 0x00000000001e0788>)

3 个答案:

答案 0 :(得分:6)

这是一个完全原子化的data.table版本,用于更新两个数据集中的所有列,并添加df2df1中不存在的df1列。这将更新cols <- setdiff(colnames(df2), "x") setDT(df1)[setDT(df2), (cols) := mget(paste0('i.', cols)), on = "x"] df1 # w x y z # 1: 1 a 1 3 # 2: 2 b 2 4 # 3: 3 b 2 4 # 4: 4 c 1 NA

paste0('i.', cols)

data.table背后的想法是告诉data.table我们想要从位于i位置df2)的import ( "abc.com/package/foo" ) func CallFoo() { foo.DoSomething() } 中获取列,以便它将知道如何处理两个数据集中的列。

免责声明:这个想法是借鉴了这个@eddi's answer

答案 1 :(得分:2)

使用dplyr,您可以尝试:

dplyr::left_join(df1,df2,by="x") %>%
  dplyr::mutate(y = dplyr::coalesce(as.integer(y.x),as.integer(y.y))) %>%
  dplyr::select(w,x,y,z)

给出了

#  w x y  z
#1 1 a 1  3
#2 2 b 2  4
#3 3 b 2  4
#4 4 c 1 NA

P.S。看看您尝试使用setdiff()而不是colnames,可能看起来所提供的示例并不代表您实际问题的完整复杂性。在这种情况下,请更新示例。

答案 2 :(得分:0)

也可以在这里使用merge

dfm = merge(df1, df2, by="x", all=T)
dfm
  x w y.x y.y  z
1 a 1  NA   1  3
2 b 2  NA   2  4
3 b 3  NA   2  4
4 c 4   1  NA NA

必须使用以下方式获得y

dfm$y = ifelse(is.na(dfm$y.y), dfm$y.x, dfm$y.y)
dfm[c("w","x","y","z")]
  w x y  z
1 1 a 1  3
2 2 b 2  4
3 3 b 2  4
4 4 c 1 NA

编辑: 要将上面的内容应用于多个公共列,请获取此类列的名称:

> inames = intersect(names(df1), names(df2))
> inames
 [1] "address"                     "administrative_area_level_1" "administrative_area_level_2" "country"                    
 [5] "locality"                    "neighborhood"                "postal_code"                 "route"                      
 [9] "street_number"               "subpremise"               

自第一栏&#34;地址&#34;已用于合并,使用剩余列来查找非NA值:

> for(i in inames[2:10]) dfm[[i]] = ifelse(is.na(dfm[[paste0(i,".x")]]), dfm[[paste0(i,".y")]], dfm[[paste0(i,".x")]]  ) 

将所需列添加到dfm。

相关问题