将一个csv文件与多个csv文件进行比较,并编写新的csv文件R.

时间:2018-01-27 15:24:42

标签: r loops

我对R中的循环很新,所以如果在别处问过这个问题我会道歉。

阅读所有30个CSV文件 - >按类别比较文件A种类与其他30种CSV文件 - >为只有匹配物种

的30个文件中的每个文件写一个新的CSV文件

文件A有一列名称为190种($name)。其他30个csv文件每个都有一个列($SBSname),列$SBSname中具有不同的物种数,其范围可以从100-500到重复(因此文件CSV文件可以更大)超过190行)。但是我不知道如何编写代码...

这就是我现在所拥有的......

我已经使用了所有CSV文件:

30files = list.files(pattern="*.csv")
for (i in 1:length(30files)) assign(30files[i], read.csv(30files[i]))

我只有将一个CSV文件(branching.csv)与文件A进行比较的代码:

> str(FileA)
'data.frame':   **190 obs. of  1 variable**:
 $ name: Factor w/ 190 levels "Acaena novae zelandiae",..: 1 2 3 4 5 6 7 8 9 10 ...

> str(branching.csv)
'data.frame':   **4055 obs. of  7 variables:**
 $ SBSname              : Factor w/ 2877 levels "Abies alba","Abies nordmanniana",..: 794 2075 1049 162 132 333 541 1840 272 1553 ...
 $ SBS.number        : int  16443 26711 40171 40398 40867 41151 37871 42412 35847 36245 ...
 $ general.method    : Factor w/ 5 levels "derivation from morphologies or other plant traits",..: 3 1 2 2 2 2 2 2 2 2 ...
 $ branching         : Factor w/ 2 levels "no","yes": 2 2 1 1 1 1 1 1 1 1 ...
 $ valid             : int  1 1 1 1 1 1 1 1 1 1 ...
 $ reference         : Factor w/ 6 levels "Barkman, J.J.(1988): New systems of plant growth forms and phenological plant types",..: 1 1 3 3 3 3 3 3 3 3 ...
 $ original.reference: Factor w/ 97 levels "Aarssen, L.W. (1981): The biology of Canadian weeds. 50. Hypochoeris radicata L.",..: 9 9 20 3 3 3 3 3 33 33 ...

Species<-branching.csv[(branching.csv$SBSname %in% FileA$name),]
write.csv(Species, file = "Branching.csv")

> str(Species)
'data.frame':   **298 obs. of  7 variables:**
 $ name              : Factor w/ 2877 levels "Abies alba","Abies nordmanniana",..: 1049 162 1548 47 57 1647 1060 2788 2094 1976 ...
 $ SBS.number        : int  40171 40398 36280 40532 41629 42495 40103 32792 32892 30583 ...
 $ general.method    : Factor w/ 5 levels "derivation from morphologies or other plant traits",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ branching         : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 2 1 2 ...
 $ valid             : int  1 1 1 1 1 1 1 1 1 1 ...
 $ reference         : Factor w/ 6 levels "Barkman, J.J.(1988): New systems of plant growth forms and phenological plant types",..: 3 3 3 3 3 3 3 3 3 3 ...
 $ original.reference: Factor w/ 97 levels "Aarssen, L.W. (1981): The biology of Canadian weeds. 50. Hypochoeris radicata L.",..: 20 3 33 33 33 33 33 44 44 44 ...

任何帮助或建议都会很棒。不必是一个循环!

1 个答案:

答案 0 :(得分:0)

这个简单的循环怎么样?

library(dplyr)
for(i in 1:length(30files))
{
   csv.matching = read.csv(30files[i]) %>% inner_join(FileA, by=c("SBSname"="name"))
   write.csv(csv.matching, file=gsub("\\.csv", "_matchin.csv", 30files[i]), na="")
}
相关问题