使用R中的正则表达式从字符串中提取信息

时间:2018-08-19 10:56:59

标签: r regex

我有这样的数据,我想从x和y中提取一些信息

x= "{\"device_codename\": \"nikel\", \"brand\": \"Xiaomi\"}" 
y= {"percent_incoming_nighttime": 0.88, "percent_outgoing_daytime": 9.29}

结果

device_codename   brand     percent_incoming_nighttime percent_outgoing_daytime
nikel             Xiaomi    0.88                       9.29

我已经厌倦了使用grep,但是我收到任何建议的错误?

grep("device_codename", x, perl=TRUE, value=TRUE)

3 个答案:

答案 0 :(得分:3)

这可能是JSON格式。有处理这些问题的工具。

library(jsonlite)

x = "{\"device_codename\": \"nikel\", \"brand\": \"Xiaomi\"}" 
y = '{"percent_incoming_nighttime": 0.88, "percent_outgoing_daytime": 9.29}'

> unlist(fromJSON(x))
device_codename           brand 
        "nikel"        "Xiaomi" 
> unlist(fromJSON(y))
percent_incoming_nighttime   percent_outgoing_daytime 
                      0.88                       9.29

答案 1 :(得分:0)

在删除括号({}和双引号gsub之后,使用:read.csv之后的子字符串读入data.frame,然后进行更改带有子字符串的列名称,即:

v1 <- gsub('"|[{}]', "", c(x, y))
out <- read.csv(text=paste(gsub("\\w+:\\s+", "", v1), collapse=", "),
       header=FALSE, stringsAsFactors = FALSE)
colnames(out) <- unlist(regmatches(v1, gregexpr("\\w+(?=:)", v1, perl = TRUE)))


out
#  device_codename   brand percent_incoming_nighttime percent_outgoing_daytime
#1           nikel  Xiaomi                       0.88                     9.29

注意:没有使用外部软件包


或使用RJSONIOtidyverse

library(tidyverse)
library(RJSONIO)
list(x, y) %>%
    map(~ fromJSON(.x) %>% 
            as.list %>%
            as_tibble) %>%
       bind_cols
# A tibble: 1 x 4
#  device_codename brand  percent_incoming_nighttime percent_outgoing_daytime
#  <chr>           <chr>                       <dbl>                    <dbl>
#1 nikel           Xiaomi                       0.88                     9.29

数据

x <- "{\"device_codename\": \"nikel\", \"brand\": \"Xiaomi\"}"
y <- "{\"percent_incoming_nighttime\": 0.88, \"percent_outgoing_daytime\": 9.29}"

答案 2 :(得分:0)

完整的jsonlite解决方案(RomanLuštrik)

library(jsonlite)
library(dplyr)

xx_x= "{\"device_codename\": \"nikel\", \"brand\": \"Xiaomi\"}" 
xx_y= "{\"percent_incoming_nighttime\": 0.88, \"percent_outgoing_daytime\": 9.29}"

c(jsonlite::fromJSON(xx_x), jsonlite::fromJSON(xx_y)) %>% 
  reshape2::melt() %>% mutate(myrow = 1) %>% 
  spread(L1, value)

结果

  myrow  brand device_codename percent_incoming_nighttime percent_outgoing_daytime
1     1 Xiaomi           nikel                       0.88                     9.29
相关问题