Difference between apply and sapply for data frame columns?

时间:2016-08-23 15:32:57

标签: r dataframe apply sapply

Could someone please explain the differences between how apply() and sapply() operate on the columns of a data frame?

For example, when attempting to find the class of each column in a data frame, my first inclination is to use apply on the columns:

> apply(iris, 2, class)
Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
 "character"  "character"  "character"  "character"  "character" 

This is not correct, however, as some of the columns are numeric:

> class(iris$Petal.Length)
[1] "numeric"

A quick search on Google turned up this solution for the problem which uses sapply instead of apply:

> sapply(iris, class)
Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
   "numeric"    "numeric"    "numeric"    "numeric"     "factor"

In this case, sapply is implicitly converting iris to a list, and then applying the function to each entry in the list, e.g.:

> class(as.list(iris)$Petal.Length)
[1] "numeric"

What I'm still unclear about is why my original attempt using apply didn't work.

1 个答案:

答案 0 :(得分:3)

As often seems to be the case, I figured out the answer to my question in process of writing it up. Posting the answer here in case anyone else has the same question.

Taking a closer look at ?apply states:

If ‘X’ is not an array but an object of a class with a non-null ‘dim’ value (such as a data frame), ‘apply’ attempts to coerce it to an array via ‘as.matrix’ if it is two-dimensional (e.g., a data frame) or via ‘as.array’.

So just like sapply casts the data frame to a list before operating on it, apply casts the data frame to a matrix. Since matrices cannot have mixed types and there is at least one column with non-numeric data (Species), then everything becomes character data:

> class(as.matrix(iris)[,'Petal.Length'])
[1] "character"