我有以下CSV文件:
Evaluator 5 9 2 8
Parser 10 5 16 2
Tokenizer 19 3 7 10
我想阅读以下这些列:
Evaluator 5
Parser 10
Tokenizer 19
Evaluator 9
Parser 5
Tokenizer 3
Evaluator 2
Parser 16
Tokenizer 7
Evaluator 8
Parser 2
Tokenizer 10
如何在R中做到这一点?
答案 0 :(得分:2)
我们可以在这里利用R的回收性质。您可以按R中的形式读取csv,然后可以通过
对其进行重塑data.frame(V1 = df$V1 , V2 = unlist(df[-1]))
# V1 V2
# Evaluator 5
# Parser 10
# Tokenizer 19
# Evaluator 9
# Parser 5
# Tokenizer 3
# Evaluator 2
# Parser 16
# Tokenizer 7
# Evaluator 8
# Parser 2
# Tokenizer 10
其中V1
是数据框的第一列。
如果需要按降序对每个组进行排序,则可以创建一个分组变量和arrange
。每个组由V1
中的原始条目数组成,在本例中为3,我们在这些组中按降序排序。
library(dplyr)
data.frame(V1 = df$V1 , V2 = unlist(df[-1])) %>%
arrange(rep(1:(n()/length(df$V1)), each = length(df$V1)), -V2)
# V1 V2
#1 Tokenizer 19
#2 Parser 10
#3 Evaluator 5
#4 Evaluator 9
#5 Parser 5
#6 Tokenizer 3
#7 Parser 16
#8 Tokenizer 7
#9 Evaluator 2
#10 Tokenizer 10
#11 Evaluator 8
#12 Parser 2
或者使用gather
library(dplyr)
df %>%
gather(Type, Value, -V1) %>%
arrange(Type, -Value) %>%
select(-Type)
# V1 Value
#1 Tokenizer 19
#2 Parser 10
#3 Evaluator 5
#4 Evaluator 9
#5 Parser 5
#6 Tokenizer 3
#7 Parser 16
#8 Tokenizer 7
#9 Evaluator 2
#10 Tokenizer 10
#11 Evaluator 8
#12 Parser 2
数据
df <- structure(list(V1 = structure(1:3, .Label = c("Evaluator", "Parser",
"Tokenizer"), class = "factor"), V2 = c(5L, 10L, 19L), V3 = c(9L,
5L, 3L), V4 = c(2L, 16L, 7L), V5 = c(8L, 2L, 10L)), .Names = c("V1",
"V2", "V3", "V4", "V5"), class = "data.frame", row.names = c(NA,
-3L))
答案 1 :(得分:1)
我们可以尝试读取CSV文件,然后使用rbind
:
df1 <- data.frame(type=c("Evaluator", "Parser", "Tokenizer"),
v1=c(5, 10, 19),
v2=c(9, 5, 3),
v3=c(2, 16, 7),
v4=c(8, 2, 10), stringsAsFactors=FALSE)
df2 <- data.frame(type=character(), value=numeric(), stringsAsFactors=FALSE)
names <- c("type", "value")
df2 <- rbind(df2, setNames(df1[, c(1,2)], names))
df2 <- rbind(df2, setNames(df1[, c(1,3)], names))
df2 <- rbind(df2, setNames(df1[, c(1,4)], names))
df2 <- rbind(df2, setNames(df1[, c(1,5)], names))
df2
答案 2 :(得分:1)
这不是明智的方法。但是您可以这样做:
df <- structure(list(Data = c("Evaluator", "Parser", "Tokenizer"),
A = c(5L, 10L, 19L), B = c(9L, 5L, 3L), C = c(2L, 16L, 7L
), D = c(8L, 2L, 10L)), row.names = c(NA, -3L), class = c("tbl_df",
"tbl", "data.frame"), spec = structure(list(cols = list(Data =
structure(list(), class = c("collector_character",
"collector"))), class = "col_spec"))
library(reshape2)
melt(df)->df
df[-2]