行式拆分

时间:2019-01-04 14:11:31

标签: r string strsplit

我有以下代码:

data <- data_frame(job_id = c("114124", "114188", "114206"), project_skills = c("WordPress,XTCommerce,Magento,Prestashop,VirtueMart,osCommerce", "HTML,SEO,WordPress,SEO Texte", "Illustrator,Graphic Design,Photoshop"))

这将创建以下数据框:

job_id    project_skills
114124    WordPress,XTCommerce,Magento,Prestashop,VirtueMart,osCommerce
114188    HTML,SEO,WordPress,SEO Texte
114206    Illustrator,Graphic Design,Photoshop

我需要按如下所示从project_skills列中拆分字符串(以逗号分隔):

job_id    project_skills
114124    [WordPress] [XTCommerce] [Magento] [Prestashop] [VirtueMart] [osCommerce]
114188    [HTML] [SEO] [WordPress] [SEO Texte]
114206    [Illustrator] [Graphic Design] [Photoshop]

因此,我希望有一个数据框,其中包含分割短语作为行,应该是向量,以便我可以遍历它们。 有谁知道我如何建立这个?预先谢谢!!

1 个答案:

答案 0 :(得分:1)

像这样吗?

l <- strsplit( data$project_skills, ",")
names(l) <- data$job_id
l
# $`114124`
# [1] "WordPress"  "XTCommerce" "Magento"    "Prestashop" "VirtueMart" "osCommerce"
# 
# $`114188`
# [1] "HTML"      "SEO"       "WordPress" "SEO Texte"
# 
# $`114206`
# [1] "Illustrator"    "Graphic Design" "Photoshop"  

使用data.table

的不同角度
library( data.table )
dt <- as.data.table( data )
#determine maximum number of skills
skillmax <- max( lengths( strsplit( dt$project_skills,",")))
#create data.table
dt[, paste0( "skill", 1:skillmax ) := tstrsplit( project_skills, ",", fill = NA)][]

#    job_id                                                project_skills      skill1         skill2    skill3
# 1: 114124 WordPress,XTCommerce,Magento,Prestashop,VirtueMart,osCommerce   WordPress     XTCommerce   Magento
# 2: 114188                                  HTML,SEO,WordPress,SEO Texte        HTML            SEO WordPress
# 3: 114206                          Illustrator,Graphic Design,Photoshop Illustrator Graphic Design Photoshop

# skill4     skill5     skill6
# 1: Prestashop VirtueMart osCommerce
# 2:  SEO Texte       <NA>       <NA>
# 3:       <NA>       <NA>       <NA>