在R

时间:2018-10-30 18:43:12

标签: r

我是Python / R的初学者,并开始在工作场所中应用它。现在,我正在尝试解决一个小问题。

任务: 我必须下载一个.csv文件(用分号分隔),然后将该文件导入excel,对新近更新的站点进行排序(有一个列W,标题为Update)。从更新的站点中创建两个新的excel文件:一个用于欧洲站点,另一个用于非欧洲站点。

我最初的想法是这样的:

  • 欧盟国家列表(最短列表)
  • 需要更新的单独站点<-Update_sites
  • 将Update_sites分为欧盟和非欧盟站点
  • 将欧盟和非欧盟网站写入独立的excel文件。

在更新列或更新部分中显示“新”时,我设法使其正常工作:

install.packages("openxlsx")
library("openxlsx")
European_countries <- c("Andorra","Austria","Belarus","Belgium","Bosnia and Herzegovina","Bulgaria","Croatia","Czech Republic","Denmark","Estonia","Finland","France","Germany","Greece","Hungary","Iceland","Ireland","Italy","Latvia","Liechtenstein","Lithuania","Luxembourg","Malta","Moldova","Monaco","Montenegro","Netherlands","Norway","Poland","Portugal","Romania","Russia","San Marino","Serbia","Slovakia","Slovenia","Spain","Sweden","Switzerland","Ukraine","United Kingdom")

origin <- choose.files()

MyData <- read.csv(origin, sep = ";", header = TRUE,) 

Update_sites <- subset(MyData, Update == "Updated") 

EU_site <- Update_sites[Update_sites$Country %in% European_countries,]

'%ni%' <- Negate('%in%')

Not_EU_site <- Update_sites[Update_sites$Country %ni% European_countries,]

write.xlsx(EU_site, "C:/Users/WalzthE/Downloads/European_sites.xlsx") 
write.xlsx(Not_EU_site, "C:/Users/WalzthE/Downloads/Not_european_sites")

但是,在以下情况下,我的问题来了:

  1. 当更新列的值不同于新值时,或者是。有时,它们充满了“更新的/手机/ sitemanager /本地”或“新的/管理器”或“更新的/传真”。我想仅通过具有内容来对单元进行子集化。 我浏览了各个论坛,发现类似:

    z <- character(0)
    subset(df, !(rownmaes(df) %in% z)) 
    

    但这对我没有帮助...

  2. 我希望能够选择保存文件的位置,而不是保存到预定的文件夹中。这与第1点并不重要,只是给用户提供了更多选择。

  3. csv文件中有特定的数据,例如“学习编号XYXY”和“ LOL-123”,这两个在我需要保存文件的末尾组成了文件名,如何我将这两个文件串联起来,使得最终文件名为:“ Study No.XYXY_LOL-123”

在此先感谢您的帮助!

1 个答案:

答案 0 :(得分:0)

我将从编写没有choose.files类型交互的脚本开始。像这样:

input_file <- "file.txt"
output_eu <- "eu.xlsx"
output_noteu <- "noteu.xlsx"

url <- "http://????"
download.file(url, "file.txt")

eu <- c("Andorra","Austria","Belarus","Belgium","Bosnia and Herzegovina","Bulgaria","Croatia","Czech Republic","Denmark","Estonia","Finland","France","Germany","Greece","Hungary","Iceland","Ireland","Italy","Latvia","Liechtenstein","Lithuania","Luxembourg","Malta","Moldova","Monaco","Montenegro","Netherlands","Norway","Poland","Portugal","Romania","Russia","San Marino","Serbia","Slovakia","Slovenia","Spain","Sweden","Switzerland","Ukraine","United Kingdom")

d <- read.table(input_file, sep = ";", header = TRUE) 
# get all cases where there is some text in the Update field
updates <- d[d$Update != "", ]
i <- updates$Country %in% eu
eu_up <- update[i,]
noteu_up <- update[!i,]

library(writexl)
write_xlsx(eu_up, output_eu) 
write_xlsx(noteu_up, output_noteu)

(同样,没有.csv file separated by semicolon这样的东西; c代表逗号)