Question

所以我有兴趣在每天结束时对特定网站进行网页编辑，并导出一个具有唯一名称的.csv文件。我实际上设法创建了一个非常简单的脚本，用于将数据抓取到data.frame并导出它。虽然我不确定如何最好地使这个过程自动化。我正在使用MacBook Pro 2016，以防万一。

我已经确定（经过Google快速搜索）我可以做两件事;要么创建执行命令的plist文件，要么使用计算机中的Automator应用程序。我在YouTube上保存了一个用于创建plist文件的教程，但是我非常害怕创建一个，因为我对计算机不是很好，我不想创建一个严重的bug。

正如我之前提到的，我希望我的计算机每天（在工作周期间）的特定时间运行R脚本，并且我希望此日期反映在将要导出的CSV文件的名称中。例如，当R脚本在2017年10月4日星期一运行时，我希望导出的CSV文件是data.04.10.2017。将所有导出的.csv转储到一个文件夹（同一目录）中，只要名称以这种方式标准化即可。

我的脚本没有包含用于为一周中的每一天创建csv文件的唯一名称的命令。在一个单独的脚本上，我无力地创建了一个＆＃34; for＆＃34;或者＆＃34;重复＆＃34;结构以手动创建一年中的每一天，但无济于事。即使我这样做了，我也不知道如何结合这个＆＃34; date scrip＆＃34;用我的主要网络废料脚本。

总之，我想问以下问题：

1）对于像我这样的初学者来说，自动执行任务最安全的方法是什么？ [每天运行一个R脚本]。鉴于我缺乏经验，我是否担心创建一个plist文件？ 2）如何改进我的脚本，以便为每天导出的每个csv生成一个唯一的名称？例如，data.04.10.2017，data.04.11.2017 ...
3）这是R或其他程序最好的完成吗？

供参考，这是我的工作脚本：

library(xml2)
library(rvest)

URL = url('https:website)

raw.data = read_html(URL) %>% html_nodes('thenode') %>% html_text()

#this tells R to fetch the data from the website
#you can find the node you want by going to the webpage, right click the page and click "view page source"
#nodes are denoted like this <NODE>INFORMATION</NODE>
#use the selector gadget Google Chrome extension to choose your web page element!

#remove weird lines of data; standardize it. 
#raw.data = raw.data[c(-157:-160, -171, -172, -183, -184, -205, -206)]

#Now it should follow this format: 
    #(space)
    #v1
    #v2
    #v3
    #v4
    #(space)

    #now I want to remove the (space) throughout the whole dataframe. 

deletions = seq(1,280, 5)
raw.data = raw.data[-deletions]
matrix1 = matrix(data = raw.data, nrow = 4)
View(matrix1)

#at this point, the data is sort of in reverse form
#we want the entire first row to be the first column instead

v1 = matrix1[1,]
v2 = matrix1[2,]
v3 = matrix1[3,]
v4 = matrix1[4,]

#here, I am acquiring the data row by row to create the dataframe
#we have each row separated. We will now reconstruct the data frame correctly

final.data.frame = data.frame(v1,v2,v3,v4)
View(final.data.frame)

#this is the final data frame.


write.csv(final.data.frame, "C:/directory/file.csv")

#to export the data in excel (or csv form)
#how do I best incorporate unique names at this point of the script?

名称规范

0 个答案: