如何将矢量拆分为带有列的表?

时间:2015-02-15 17:14:14

标签: r dataframe

我有一个包含一些缺失数据的向量,我想将其转换为包含4列的数据帧。

我有两个问题: 1.如何将一列拆分为多列 2.如何解释缺失的数据

数据:

# Create similar data
a <- c("building #1 Addr 01 Zip 99999","20 sq ft","23","-33 rev",
       "building #2 Addr 02 Zip 99999","30 sq ft","23",
       "building #3 Addr 03 Zip 99999","40 sq ft",
       "building #4 Addr 04 Zip 99999","50 sq ft","23","-33 rev",
       "building #5 Addr 05 Zip 99999","-33 rev",
       "building #6 Addr 06 Zip 99999","70 sq ft","23","-33 rev",
       "building #7 Addr 07 Zip 99999","80 sq ft",
       "building #8 Addr 08 Zip 99999","90 sq ft","23","-33 rev",
       "building #9 Addr 09 Zip 99999","00 sq ft")

我想创建一个如下所示的表:

 # Desired output

 building_id <- c("building #1 Addr 01 Zip 99999",
                  "building #2 Addr 02 Zip 99999",
                  "building #3 Addr 03 Zip 99999",
                  "building #4 Addr 04 Zip 99999",
                  "building #5 Addr 05 Zip 99999",
                  "building #6 Addr 06 Zip 99999",
                  "building #7 Addr 07 Zip 99999",
                  "building #8 Addr 08 Zip 99999",
                  "building #9 Addr 09 Zip 99999")
   sqft<- c("20 sq ft","30 sq ft","40 sq ft","50 sq ft","","70 sq ft",
   "80 sq ft","90 sq ft","00 sq ft")
    employees <- c("23","23","","23","","23","","23","")
   revenue <- c("-33 rev","","","-33 rev","","-33 rev","","-33 rev","")

   df <- data.frame(building_id,sqft,employees,revenue)


building_id                   sqft        employees revenue
building #1 Addr 01 Zip 99999 20 sq ft    23        -33 rev
building #2 Addr 02 Zip 99999 30 sq ft    23        
building #3 Addr 03 Zip 99999 40 sq ft                  
building #4 Addr 04 Zip 99999 50 sq ft    23        -33 rev
building #5 Addr 05 Zip 99999                           
building #6 Addr 06 Zip 99999 70 sq ft    23        -33 rev
building #7 Addr 07 Zip 99999 80 sq ft                  
building #8 Addr 08 Zip 99999 90 sq ft    23        -33 rev
building #9 Addr 09 Zip 99999 00 sq ft                  

1 个答案:

答案 0 :(得分:2)

我们可以split向量(&#34; a&#34;)进入列表(&#34; lst&#34;),基于创建分组变量来查找&#34;建筑&#34;在向量中  (grepl('^building',..)grep个别元素循环中的列表元素(sapply()(&#39;建筑&#39;,&#39; sq ft&#39;等)如果结果为&#39; 0&#39;(不存在),则分配NA或grep值,unlistrbind以创建数据集d1

lst <- split(a, cumsum(grepl('^building', a)))

d1 <-  do.call(rbind.data.frame,lapply(lst, function(x) 
     unlist(sapply(c('building', 'sq ft', '^\\d+$', 'rev'), function(y) {
      x1 <- grep(y, x, value=TRUE)
      if(!length(x1)) NA else x1}))))
colnames(d1) <- c("building_id","sqft","employees","revenue")
d1
#                    building_id     sqft employees revenue
#1 building #1 Addr 01 Zip 99999 20 sq ft        23 -33 rev
#2 building #2 Addr 02 Zip 99999 30 sq ft        23    <NA>
#3 building #3 Addr 03 Zip 99999 40 sq ft      <NA>    <NA>
#4 building #4 Addr 04 Zip 99999 50 sq ft        23 -33 rev
#5 building #5 Addr 05 Zip 99999     <NA>      <NA> -33 rev
#6 building #6 Addr 06 Zip 99999 70 sq ft        23 -33 rev
#7 building #7 Addr 07 Zip 99999 80 sq ft      <NA>    <NA>
#8 building #8 Addr 08 Zip 99999 90 sq ft        23 -33 rev
#9 building #9 Addr 09 Zip 99999 00 sq ft      <NA>    <NA>
相关问题