导出推文的问题很糟糕

时间:2018-02-18 01:36:14

标签: r twitter web-scraping

嗨:我有一个TwitteR脚本正好抓取推文。但是,当我将结果转换为data.frame并使用write.table()写出时,一些推文会被笨拙地分开。当我尝试分析这些问题时,这将会带来问题吗?

我已经附加了csv文件中的一些图像以试图说明问题。

我看到在很多行中都有这些奇怪的符号,我认为这些符号与字符编码有关,但分裂不一定在这些点上发生。所以我不知道会发生什么。

Link to output file

Splits in the text field Splits in the the text field This one is really weird; it splits in the text field and then again but over three rows.

代码在这里:

options(httr_oauth_cache=T)
Sys.setenv(TZ='EST')


setup_twitter_oauth(consumer_key, consumer_secret, access_token,         
access_token_secret)

#Get #onpoli tweet
onpoli<-searchTwitter('#onpoli+#pcpoldr+#pcpo', resultType='recent', n=1500)

#Turn to data.frames
onpolidf<-twListToDF(onpoli)

#Write out to .csv files
write.table(onpolidf, paste('Tweets/', format(Sys.time(), "%m-%d-%H-%M"), 
'.csv', sep=''), append=T, sep=',', col.names=T)

SessionInfo()的结果如下:

R version 3.4.1 (2017-06-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.3

Matrix products: default
BLAS:         
    /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] twitteR_1.1.9

loaded via a namespace (and not attached):
[1] bit_1.1-12     httr_1.2.1     compiler_3.4.1 rjson_0.2.15   R6_2.2.2               
DBI_0.7        tools_3.4.1    curl_2.7      
[9] yaml_2.1.16    bit64_0.9-7    openssl_0.9.6

0 个答案:

没有答案