我正在尝试比较两个文本文件并列出任何日志文件中的差异,为此,我在下面的命令中使用了“ diffr”库,但是比较结果显示在R studios的“查看器控制台”选项卡上。谁能帮我写出更好的代码来比较文本文件并列出差异?
另外,如果要在一个循环中比较文件,该怎么办,因为我在不同的环境中为同一查询保存了多个文件?
代码:
library(diffr)
setwd("C:/Users/squraishi/Desktop/OnDemand/R_ExtractDataSnapshot/Results")
prod_file <- read.csv2(file = "F_Query_Prod_7 .txt", header = TRUE, sep = "")
beta_file <- read.csv2(file = "F_Query_Beta_7 .txt", header = TRUE, sep = "")
diffr("F_Query_Prod_7 .txt", "F_Query_Beta_7 .txt", contextSize = 0, minJumpSize = 500)
答案 0 :(得分:2)
该HTML小部件软件包不会给您返回输出,但是它基于javascript library上的based上的python module。
我们将使用Python版本,但我们将不会使用reticulate
包b / c我不会展示如何迭代R中的Python结构,因此我们将从Python页面获取有关脚本的指针,该指针位于Tools/scripts/diff.py
,并从github获取该脚本,以避免尝试在您的系统上找到它。这确实意味着需要安装python。准确地说,是Python 3(因为这是一个脆弱的,零散的生态系统)。
tf <- tempfile(fileext = ".py")
on.exit(unlink(tf), add = TRUE)
writeLines(
readLines("https://raw.githubusercontent.com/python/cpython/master/Tools/scripts/diff.py"),
tf
)
现在,我们将在您的系统上找到python3
二进制文件,并在您的系统上找到pip3
二进制文件:
python <- Sys.which("python3")
pip <- Sys.which("pip3")
并确保已安装了一个非常关键的模块,该模块应始终安装,但是python是如此愚蠢,并非如此:
# just in case you don't have it
system2(command = pip, args = c("install", "datetime"))
现在对我的两个组成文件运行差异:
system2(
command = python,
args = c(
tf,
path.expand("~/Data/so.txt"),
path.expand("~/Data/so1.txt")
),
stdout = TRUE
) -> res
并查看您现在需要解析的输出:
res
## [1] "*** /Users/bob/Data/so.txt\t2018-10-15T06:38:07.169832-04:00"
## [2] "--- /Users/bob/Data/so1.txt\t2018-10-18T08:50:51.745551-04:00"
## [3] "***************"
## [4] "*** 6,29 ****"
## [5] " QX = X-ray|NRW"
## [6] " UI = Q000000981"
## [7] " "
## [8] "- *NEWRECORD"
## [9] "- RECTYPE = Q"
## [10] "- SH = analogs & derivatives"
## [11] "- QE = ANALOGS"
## [12] "- QA = AA"
## [13] "- QT = 1"
## [14] "- "
## [15] "- *NEWRECORD"
## [16] "- RECTYPE = Q"
## [17] "- SH = abnormalities"
## [18] "- QE = ABNORM"
## [19] "- QX = agenesis|NRW"
## [20] "- QX = anomalies|EQV"
## [21] "- QX = aplasia|NRW"
## [22] "- QX = atresia|NRW"
## [23] "- QX = birth defects|NRW"
## [24] "- QX = congenital defects|NRW"
## [25] "- QX = defects|NRW"
## [26] "- QX = deformities|NRW"
## [27] "- QX = hypoplasia|NRW"
## [28] "- UI = Q000002"
## [29] "--- 6,8 ----"
已经完成了所有^^操作,您也可以只使用tools::Rdiff()
:
(res <- tools::Rdiff("~/Data/so.txt", "~/Data/so1.txt", Log=TRUE))
## $status
## [1] 1
##
## $out
## [1] "files differ in number of lines" "9,29d8"
## [3] "< *NEWRECORD" "< RECTYPE = Q"
## [5] "< SH = analogs & derivatives" "< QE = ANALOGS"
## [7] "< QA = AA" "< QT = 1"
## [9] "< " "< *NEWRECORD"
## [11] "< RECTYPE = Q" "< SH = abnormalities"
## [13] "< QE = ABNORM" "< QX = agenesis|NRW"
## [15] "< QX = anomalies|EQV" "< QX = aplasia|NRW"
## [17] "< QX = atresia|NRW" "< QX = birth defects|NRW"
## [19] "< QX = congenital defects|NRW" "< QX = defects|NRW"
## [21] "< QX = deformities|NRW" "< QX = hypoplasia|NRW"
## [23] "< UI = Q000002"
但我想先展示曲折的路径:-)