Question

我正在使用数据框将多个excel文件整理成一个。文件中有重复的列。是否可以仅合并唯一列？

这是我的代码：

library(rJava)
library (XLConnect)

data.files = list.files(pattern = "*.xls")

# Read the first file
df = readWorksheetFromFile(file=data.files[1], sheet=1, check.names=F) 

# Loop through the remaining files and merge them to the existing data frame
for (file in data.files[-1]) {
newFile = readWorksheetFromFile(file=file, sheet=1, check.names=F)
    df = merge(df, newFile, all = TRUE, check.names=F)
}

Answer 1

首先，如果您正确应用merge，则不应该是任何重复列，前提是重复列在EXCEL文件中也具有完全相同的名称。当您使用merge时，EXCEL文件中必须至少有一列具有完全相同的名称，并包含用于合并它们的值。

因此，我认为您希望根据每列中的值检查结果数据框中是否存在重复的列。为此，您可以使用以下内容：

keepUnique <- function(x){
  combs <- combn(names(x),2)

  dups <- mapply(identical,
                 x[combs[1,]],
                 x[combs[2,]])

  drop <- combs[2,][dups]
  x[ !names(x) %in% drop ]
}

给出了：

> mydf <- cbind(iris,iris[,3])[1:5,]
> mydf
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species iris[, 3]
1          5.1         3.5          1.4         0.2  setosa       1.4
2          4.9         3.0          1.4         0.2  setosa       1.4
3          4.7         3.2          1.3         0.2  setosa       1.3
4          4.6         3.1          1.5         0.2  setosa       1.5
5          5.0         3.6          1.4         0.2  setosa       1.4
> keepUnique(mydf)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa

您可以在读取文件后使用此功能，即添加行

newFile <- keepUnique(newFile,df)

在您自己的代码中。

删除重复的列？

1 个答案: