Question

我正在努力学习R，但我坚持看似简单的事情。我知道SQL，我最简单的方式就是用这种语言来表达我的问题。有人可以帮我解决从SQL到R的翻译吗？

我发现了这个：

    SELECT col1, sum(col2) FROM table1 GROUP BY col1

转化为：

    aggregate(x=table1$col2, by=list(table1$col1), FUN=sum)

我已经想到了这个：

    SELECT col1, col2 FROM table1 GROUP BY col1, col2

转化为：

    unique(table1[,c("col1","col2")])

但是这个翻译是什么？

    SELECT col1 FROM table1 GROUP BY col1

由于某种原因，“unique”函数在处理一个列时似乎会切换到不同的返回类型，因此它不能像我期望的那样工作。

-TC

Answer 1

我猜你指的是在向量上调用unique将返回一个向量而不是数据帧。以下是一些可能有用的示例：

#Some example data
dat <- data.frame(x = rep(letters[1:2],times = 5),
                  y = rep(letters[3:4],each = 5))
> dat
   x y
1  a c
2  b c
3  a c
4  b c
5  a c
6  b d
7  a d
8  b d
9  a d
10 b d
> unique(dat)
  x y
1 a c
2 b c
6 b d
7 a d
#Unique => vector
> unique(dat$x)
[1] "a" "b"
#Same thing
> unique(dat[,'x'])
[1] "a" "b"
#drop = FALSE preserves the data frame structure
> unique(dat[,'x',drop = FALSE])
  x
1 a
2 b
#Or you can just convert it back (although the default column name is ugly)
> data.frame(unique(dat$x))
  unique.dat.x.
1             a
2             b

Answer 2

如果您了解SQL，请尝试使用包sqldf和data.table。

在R中，如何使用另一个data.frame的一列中的唯一值创建data.frame？

2 个答案: