Question

我有两个具有多个级别的变量; V1有400个级别，V2有250个级别。如何将V2的因子转换为几个不同的变量，并使用变量V1作为唯一标识符？

V1             V2
Garza, Mike    a
Garza, Mike    b
Smith, James   a 
Smith, James   f 
Smith, James   z 
Moore, Jen     b
Klein, April   f

数据框应如下所示。注意：变量如何包含多个因子，而不是每个因子一个变量。考虑到Mike有两个与他相关的因素，因子a和b进入V2和V3，其中Jen，因子b也进入V2，而不是V3。

V1             V2 V3 V4 V5
Garza, Mike    a  b
Smith, James   a  f  z
Moore, Jen     b
Klein, April   f

任何帮助将不胜感激！

谢谢。

Answer 1

这是一个重塑问题。考虑df是您的data.frame，您可以尝试使用它：

> library(reshape2)
> print(dcast(melt(df), ...~V2), na.print="")
Using V1, V2 as id variables
Using V2 as value column: use value.var to override.
           V1 a b f z
1  Garza,Mike a b    
2 Klein,April     f  
3   Moore,Jen   b    
4 Smith,James a   f z

Answer 2

您希望每个V1级别（个人）都有split(df$V2, df$V1)级别的向量。这不是真正如何设计列在data.frames中工作，即使您可以在Excel中执行此操作。相反，我建议您只是将结果作为每个人的矢量，如下所示：

$`Garza, Mike`
[1] a b
Levels: a b f z

$`Klein, April`
[1] f
Levels: a b f z

$`Moore, Jen`
[1] b
Levels: a b f z

$`Smith, James`
[1] a f z
Levels: a b f z

返回：

split

在不知道您的用例的情况下，我无法说明这是否真的更好。但是，根据我的一般经验，它往往更容易使用。如果您只需要打印它们，您可以随时折叠它们。例如，如果将上述out结果保存到out <- split(df$V2, df$V1) sapply(out, paste, collapse = ", ")，则可以执行此操作，然后可以将其作为列添加到其他输出表中：

 Garza, Mike Klein, April   Moore, Jen Smith, James 
      "a, b"          "f"          "b"    "a, f, z"

给出

sapply(out, function(x){"f" %in% x})

或者，如果您想知道谁拥有某个群组，您可以这样做：

 Garza, Mike Klein, April   Moore, Jen Smith, James 
       FALSE         TRUE        FALSE         TRUE

给出了：

<android.support.design.widget.CoordinatorLayout xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:app="http://schemas.android.com/apk/res-auto"
    xmlns:tools="http://schemas.android.com/tools"
    android:layout_width="match_parent"
    android:layout_height="match_parent"
    android:fitsSystemWindows="true"
    tools:context="com.example.MyActivity">

    <RelativeLayout xmlns:android="http://schemas.android.com/apk/res/android"
        xmlns:tools="http://schemas.android.com/tools"
        android:id="@+id/main_layout"
        android:layout_width="match_parent"
        android:layout_height="match_parent"
        android:background="@color/colorPrimaryDark"
        android:orientation="vertical"
        android:padding="15dp">

        ....
        ....

    </RelativeLayout>
</android.support.design.widget.CoordinatorLayout>

Answer 3

您可以在dcast包中使用reshape执行第一部分，然后使用apply将其进一步排序到所需的输出。

dat <- data.frame(V1 = factor(c("Garza", "Garza",
                          "Smith", "Smith", "Smith",
                          "Moore", "Klein")),
                  V2 = c("a","b","a","f","z","b","f"))

# recast your data
dd <- dcast(dat, V1~V2)

#make a function to use with apply

shift_values<- function(x){
  notna <-which(!is.na(x[-1]))
  val <- x[notna+1]
  x[-1] <- c(as.character(val), rep("", (length(x)-1-length(val))))
  return(x)
}

# use it in an apply loop, transpose the data, and turn it into a data.frame
result <- data.frame(t(apply(dd, 1, shift_values)))

# change the column names
colnames(result)[-1] <- paste0("V", 2:(ncol(result)))

然后数据如下：

     V1 V2 V3 V4 V5
1 Garza  a  b      
2 Klein  f         
3 Moore  b         
4 Smith  a  f  z

R将因子转换为新变量

3 个答案: