将街道地址拆分为r中的街道号码和街道名称

时间:2014-04-10 12:14:10

标签: r street-address stringr

我想将街道地址拆分为r。

中的街道名称和街道号码

我的输入数据有一个列,例如

    Street.Addresses

    205 Cape Road
    32 Albany Street 
    cnr Kempston/Durban Roads

我想将街道号码和街道名称拆分为两个单独的列,以便显示:

    Street Number    Street Name
    205              Cape Road
    32               Albany Street
                     cnr Kempston/Durban Roads

是否可以将数值与R中因子/字符串中的非数字条目分开?

谢谢

3 个答案:

答案 0 :(得分:3)

你可以尝试:

y <- lapply(strsplit(x, "(?<=\\d)\\b ", perl=T), function(x) if (length(x)<2) c("", x) else x)
y <- do.call(rbind, y)
colnames(y) <- c("Street Number", "Street Name")

HTH

答案 1 :(得分:3)

我确定有人会带着一个很酷的正则表达式解决方案,带有前瞻等等,但这可能适合你:

X <- c("205 Cape Road", "32 Albany Street", "cnr Kempston/Durban Roads")
nonum <- grepl("^[^0-9]", X)
X[nonum] <- paste0(" \t", X[nonum])
X[!nonum] <- gsub("(^[0-9]+ )(.*)", "\\1\t\\2", X[!nonum])
read.delim(text = X, header = FALSE)
#    V1                        V2
# 1 205                 Cape Road
# 2  32             Albany Street
# 3  NA cnr Kempston/Durban Roads

答案 2 :(得分:1)

这是另一种方式:

df <- data.frame (Street.Addresses = c ("205 Cape Road", "32 Albany Street", "cnr Kempston/Durban Roads"),
                 stringsAsFactors = F)

new_df <- data.frame ("Street.Number" = character(), 
                     "Street.Name" = character(), 
                     stringsAsFactors = F)
for (i in 1:nrow (df)) {

  new_df [i,"Street.Number"] <- unlist(strsplit (df[["Street.Addresses"]], " ")[i])[1]
  new_df [i,"Street.Name"] <- paste (unlist(strsplit (df[["Street.Addresses"]], " ")[i])[-1], collapse = " ")

}

> new_df
  Street.Number           Street.Name
1           205             Cape Road
2            32         Albany Street
3           cnr Kempston/Durban Roads