Question

我想按组汇总特定年份的所有关键字。

我有一个看起来像这样的数据集：

我的主要问题是单词列的范围在1到52之间！我正在考虑将此列拆分为不同的列，然后使用group_by。但是现在我不确定如何进行。

Answer 1

我们可以将“单词”分成list个vector，unnest分为“ long”格式，删除重复的行，并按“ Year”，“ UID”分组，paste将“单词”组合成一个字符串

library(dplyr)
df1 %>% 
    mutate(Words = strsplit(Words, ",")) %>% 
    unnest %>% 
    distinct(Year, UID, Words) %>% 
    group_by(UID, Year) %>% 
    summarise(Words = toString(Words))
# A tibble: 4 x 3
# Groups:   UID [?]
#    UID  Year Words                                    
#  <dbl> <dbl> <chr>                                    
#1    10  2009 ABC, CDEFGH, LMX, ABCD, IJKLM, PQRS, EFGH
#2    11  2010 BDFC, CDE, PQRS, ACCA, IJKLM             
#3    12  2010 ABCD, CADDE                              
#4    12  2011 ABC, CDE, EFGH

数据

df1 <- structure(list(ID = c(1, 2, 3, 4, 5, 6, 5), Year = c(2011, 2011, 
2010, 2010, 2009, 2010, 2009), UID = c(12, 12, 11, 12, 10, 11, 
10), Words = c("ABC,CDE", "EFGH,CDE", "BDFC,CDE,PQRS", "ABCD,CADDE", 
"ABC,CDEFGH,LMX,ABCD,IJKLM,PQRS", "BDFC,ACCA,IJKLM", "EFGH")),
 class = "data.frame", row.names = c(NA, -7L))

Answer 2

使用from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC # create driver and execute search here # now wait for the presence of the `mySearchResults` element. # if it is not found within 10 secs, a `TimeOutException` is raised. WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.ID, "mySearchResults")))的Base R方法：

aggregate

在R中每组汇总唯一字符值

2 个答案:

数据