Question

我有一个数据矩阵（900列和5000行），我想做一个pca ..

矩阵在excel中看起来非常好（意味着所有值都是定量的），但在我读取R中的文件并尝试运行pca代码后，我得到一个错误，说“以下变量不是定量的”，我得到一个非定量变量列表。

所以一般来说，有些变量是定量的，有些则不是。请参阅以下示例。当我检查变量1时，它是正确的和定量的..（随机的一些变量在文件中是定量的）当我检查变量2时，它是不正确的和非定量的..（随机的一些像这样的变量在文件中是非定量的）

> data$variable1[1:5]
[1] -0.7617504 -0.9740939 -0.5089303 -0.1032487 -0.1245882

> data$variable2[1:5]
[1] -0.183546332959017 -0.179283451229594 -0.191165669598284 -0.187060515423038
[5] -0.184409474669824
731 Levels: -0.001841783473108 -0.001855956210119 ... -1,97E+05

所以我的问题是，如何将所有非定量变量转换为定量？

使文件缩短无济于事，因为这些值本身就是定量的。我不知道发生了什么事。这是我的原始文件＆lt; - https://docs.google.com/file/d/0BzP-YLnUNCdwakc4dnhYdEpudjQ/edit

的链接

我也尝试过下面给出的答案，但它仍然没有帮助。

所以，让我说明我到底做了什么，

> data <- read.delim("file.txt", header=T)
> res.pca = PCA(data, quali.sup=1, graph=T)
Error in PCA(data, quali.sup = 1, graph = T) :
The following variables are not quantitative:  batch
The following variables are not quantitative:  target79
The following variables are not quantitative:  target148
The following variables are not quantitative:  target151
The following variables are not quantitative:  target217
The following variables are not quantitative:  target266
The following variables are not quantitative:  target515
The following variables are not quantitative:  target530
The following variables are not quantitative:  target587
The following variables are not quantitative:  target620
The following variables are not quantitative:  target730
The following variables are not quantitative:  target739
The following variables are not quantitative:  target801
The following variables are not quantitative:  target803
The following variables are not quantitative:  target809
The following variables are not quantitative:  target819
The following variables are not quantitative:  target868
The following variables a
In addition: There were 50 or more warnings (use warnings() to see the first 50)

Answer 1

R将您的变量视为因子，如Arun所述。因此它创建了一个data.frame（实际上是一个列表）。有许多方法可以解决这个问题，一种是通过以下方式将其转换为数据矩阵;

matrix <- as.numeric(as.matrix(data))
dim(matrix) <- dim(data)

现在你可以在矩阵上运行你的PCA了。

编辑：

稍微扩展一下这个例子，查理的第二部分建议是行不通的。复制以下会话，看看它是如何工作的;

d <- data.frame(
 a = factor(runif(2000)),
 b = factor(runif(2000)),
 c = factor(runif(2000)))

as.numeric(d) #does not work on a list (data frame is a list)

as.numeric(d$a) # does work, because d$a is a vecor, but this is not what you are 
# after. R converts the factor levels to numeric instead of the actual value.

(m <- as.numeric(as.matrix(d))) # this does the rigth thing
dim(m)                        # but m loses the dimensions and is now a vector

dim(m) <- dim(d)              # assign the dimensions of d to m

svd(m)                        # you can do the PCA function of your liking on m

Answer 2

默认情况下，R会将字符串强制转换为因子。这可能会导致意外行为。使用以下命令关闭此默认选项：

      read.csv(x, stringsAsFactors=F)

您也可以使用

将因子强制转换为数字

      newVar<-as.numeric(oldVar)

Answer 3

as.numeric(as.character(data$variable2[1:5]))，先用as.character得到因子变量标签的字符串表示，然后用as.numeric

进行转换

如何将变量转换为定量？

3 个答案: