R gsub特殊字符

时间:2018-02-21 08:46:35

标签: r gsub

我有数据框。在一列中我有字符串

"\t\tStatus: {\\id\\:\\d6b084be-9429-4b4b-8141-1cb5f5a84d2d\\,\\device\\:\\lge LG-H955 (z2_global_com)\\,\\result\\:\\1\\,\\script\\:[{\\timestamp\\:\\1519033801850\\,\\step\\:\\step1\\,\\answer\\:\\1\\},{\\timestamp\\:\\1519033879798\\,\\step\\:\\step2\\,\\answer\\:\\1\\}]},"

我想删除一些特殊字符,所需的输出是

Status: {"id":"d6b084be-9429-4b4b-8141-1cb5f5a84d2d","device":"lge LG-H955 (z2_global_com)","result":"1","script":[{"timestamp":"1519033801850","step":"step1","answer":"1"},{"timestamp":"1519033879798","step":"step2","answer":"1"}]}

我想要改变每个\到"并从开始删除\ t \ t并删除第一个"最后,"符号

我尝试使用gsub,但它无法正常工作

更新:感谢它的工作! 但是我还有一个问题,与此相同的问题,但它更复杂:(有很多\ t \ t

"Script": "{\t\"id\": \"hh-d6b084be-9429-4b4b-8141-1cb5f5a84d2d\",\t\t\t\t   \"version\": \"1.0.0\",\t\t\t\t\t\"start_step\": \"step0\",\t\t\t\t\t\"script\": [\t\t\t{\t\t\t\t\"id\": \"step0\",\t\t\t\t\"text\": \"hh?\",\t\t\t\t\"interaction\": null,\t\t\t\t\t\"options\": [\t\t\t\t{ \"id\": \"1\", \"text\": \"cc \", \"action\": { \"goto\": \"step1\" } },\t\t\t\t{ \"id\": \"2\", \"text\": \"hh \", \"action\": {\"goto\": \"step2\"} }\t\t\t\t]\t\t\t},\t\t\t{\t\t\t\t\"id\": \"step1\",\t\t\t\t\"text\": \"Chh?\",\t\t\t\t\"interaction\": null,\t\t\t\t\t\"options\": [\t\t\t\t{ \"id\": \"1\", \"text\": \"hhgo hh\", \"action\": { \"goto\": \"step3\" } },\t\t\t\t{ \"id\": \"2\", \"text\": \"jj\", \"action\": { \"goto\": \"step4\" } },\t\t\t\t{ \"id\": \"3\", \"text\": \"jj aa z jj jj \", \"action\": { \"goto\": \"step5\" } }\t\t\t\t]\t\t\t},\t\t\t{\t\t\t\t\"id\": \"step2\",\t\t\t\t\"text\": \"jjj\",\t\t\t\t\"interaction\": null,\t\t\t\t\t\"options\": [\t\t\t\t{ \"id\": \"1\", \"text\": \"jjjj\", \"action\": { \"deeplink\": \"pl://app/nn\" } },\t\t\t\t{ \"id\": \"2\", \"text\": \"jj\", \"action\": { \"deeplink\":\"pl://app/xx/nn/nn\" } },\t\t\t\t{ \"id\": \"3\", \"text\": \"nnn\", \"action\": { \"finish\": \"1\" } },\t\t\t\t{ \"id\": \"4\", \"text\": \"nn\", \"action\": { \"finish\": \"1\" } }\t\t\t\t]\t\t\t},\t\t\t{\t\t\t\t\"id\": \"step3\",\t\t\t\t\t\"text\": \"nnnel. <a href='https://www.dd.pl/dd/dd.pdf'>  </a>\",\t\t\t\"interaction\": null,\t\t\t\t\t\"options\": [\t\t\t\t{ \"id\": \"1\", \"text\": \"fff\", \"action\": { \"deeplink\": \"pl://app/nn/apply?nn=KG&nn=*\" } },\t\t\t\t{ \"id\": \"2\", \"text\": \"hhh\", \"action\": { \"goto\": \"step6\" } },\t\t\t\t{ \"id\": \"3\", \"text\": \"hh \", \"action\": { \"goto\": \"step7\" } }\t\t\t\t]\t\t\t},\t\t   {\t\t\t\t\"id\": \"step4\",\t\t\t\t\"text\": \"hh\",\t\t\t\t\"interaction\": null,\t\t\t\t\t\"options\": [\t\t\t\t{ \"id\": \"1\", \"text\": \"ff\", \"action\": { \"deeplink\": \"https://www.k.uk/hh/hh.html\" } },\t\t\t\t{ \"id\": \"2\", \"text\": \"ss\", \"action\": { \"deeplink\": \"pl://app/ddd\" } },\t\t\t\t{ \"id\": \"3\", \"text\": \"ss\", \"action\": { \"finish\": \"1\" } },\t\t\t\t{ \"id\": \"4\", \"text\": \"ss\", \"action\": { \"finish\": \"1\" } }\t\t\t\t]\t\t\t},\t\t\t{\t\t\t\t\"id\": \"step5\",\t\t\t\t\t\"text\": \"sss?\",\t\t\t\t\t\"interaction\":  {\t\t\t\t\"type\": \"poll\",\t\t\t\t\t\t\"data\": {\t\t\t\t\t\"minimum_checked\": \"1\",\t\t\t\t\t\t\t\"maximum_checked\": \"1\",\t\t\t\t\t\t\t\"fields\": [\t\t\t\t\t{ \"id\": \"1\", \"text\": \"fff\" },\t\t\t\t\t{ \"id\": \"2\", \"text\": \"ff ff\" },\t\t\t\t\t{ \"id\": \"3\", \"text\": \"ff\" }\t\t\t\t\t\t]\t\t\t\t}\t\t\t},\t\t\t\t\"options\": [\t\t\t\t{ \"id\": \"1\", \"text\": \"dd\", \"action\": { \"goto\": \"step8\" } }\t\t\t\t]\t\t\t},\t\t\t{\t\t\t\t\"id\": \"step6\",\t\t\t\t\"text\": \"fff dd ddd dd i dd ff.\",\t\t\t\t\"interaction\": null,\t\t\t\t\t\"options\": [\t\t\t\t{ \"id\": \"1\", \"text\": \"Ok, aa\", \"action\": { \"deeplink\": \"l://app/ff\"} },\t\t\t\t{ \"id\": \"2\", \"text\": \"dddd\", \"action\": { \"deeplink\": \"ff://app/contact/ff/ff\" } },\t\t\t\t{ \"id\": \"3\", \"text\": \"ddd\", \"action\": { \"finish\": \"1\" } },\t\t\t\t{ \"id\": \"4\", \"text\": \"ddd\", \"action\": { \"finish\": \"1\" } }\t\t\t\t]\t\t\t},\t\t\t{\t\t\t\t\"id\": \"step7\",\t\t\t\t\t\"text\": \"ddd\",\t\t\t\t\t\"interaction\": {\t\t\t\t\"type\": \"poll\",\t\t\t\t\t\t\"data\": {\t\t\t\t\t\"minimum_checked\": \"1\",\t\t\t\t\t\t\t\"maximum_checked\": \"3\",\t\t\t\t\t\t\t\"fields\": [\t\t\t\t\t{ \"id\": \"1\", \"text\": \"dddd\" },\t\t\t\t\t{ \"id\": \"2\", \"text\": \"Kssss\" },\t\t\t\t\t{ \"id\": \"3\", \"text\": \"ss ss\" }\t\t\t\t\t\t]\t\t\t\t}\t\t\t},\t\t\t\t\"options\": [\t\t\t\t{ \"id\": \"1\", \"text\": \"ss\", \"action\": { \"goto\": \"step9\" } }\t\t\t\t]\t\t\t},\t\t\t{\t\t\t\t\"id\": \"step8\",\t\t\t\t\"text\": \"sss.\",\t\t\t\t\"interaction\": null,\t\t\t\t\t\"options\": [\t\t\t\t{ \"id\": \"1\", \"text\": \"Ok, aaa\", \"action\": { \"deeplink\": \"ss://app/call\" } },\t\t\t\t{ \"id\": \"2\", \"text\": \"sss\", \"action\": { \"deeplink\": \"ss://app/ss/ss/chat\" } },\t\t\t\t{ \"id\": \"3\", \"text\": \"ssss\", \"action\": { \"finish\": \"1\" } },\t\t\t\t{ \"id\": \"4\", \"text\": \"ss\", \"action\": { \"finish\": \"1\" } }\t\t\t\t]\t\t\t},\t\t\t{\t\t\t\t\"id\": \"step9\",\t\t\t\t\"text\": \"ss.\",\t\t\t\t\"interaction\": null,\t\t\t\t\t\"options\": [\t\t\t\t{ \"id\": \"1\", \"text\": \"Ok, aa\", \"action\": { \"deeplink\": \"ss://app/ss\" } },\t\t\t\t{ \"id\": \"2\", \"text\": \"ss\", \"action\": { \"deeplink\": \"ss://app/ss/cvc/ss\" } },\t\t\t\t{ \"id\": \"3\", \"text\": \"ss\", \"action\": { \"finish\": \"1\" } },\t\t\t\t{ \"id\": \"4\", \"text\": \"aaa\", \"action\": {\"finish\": \"1\" } }\t\t\t\t]\t\t\t}\t   ]\t}"

当我尝试使用DJack的答案中的相同代码执行此操作时

text <- gsub("\\\\",'"', gsub("\t|,$","", text))

看起来像这样

"Script": "{    \"id\": \"d6b084be-9429-4b4b-8141-1cb5f5a84d2d\",    \"start_step\": \"step1\",    \"script\": [\t{            \"id\": \"step1\",            \"text\": \"ggg\",            \"interaction\": null,            \"options\": [{ \t\t\t\t\t\"id\": \"1\",\t\t\t\t\t\"text\": \"gg\",\t\t\t\t\t\"action\": {\t\t\t\t\t\t\"goto\": \"step2\"\t\t\t\t\t}\t\t\t\t}, { \t\t\t\t\t\"id\": \"2\",\t\t\t\t\t\"text\": \"gg\",\t\t\t\t\t\"action\": {\t\t\t\t\t\t\"goto\": \"step3\"\t\t\t\t\t}\t\t\t\t}, { \t\t\t\t\t\"id\": \"3\",\t\t\t\t\t\"text\": \"gg\",\t\t\t\t\t\"action\": {\t\t\t\t\t\"goto\": \"step4\"\t\t\t\t\t}\t\t\t\t}            ]       },\t\t{            \"id\": \"step2\",            \"text\": \"gg?\",            \"interaction\": null,            \"options\": [{ \t\t\t\t\t\"id\": \"1\",\t\t\t\t\t\"text\": \"gg\",\t\t\t\t\t\"action\": {\t\t\t\t\t\t\"deeplink\": \"gg://app/gg/apply?type=KG&gg=*\",\t\t\t\t\t\t\"finish\": \"1\",\t\t\t\t\t\t\"goto\": \"step10\"\t\t\t\t\t}\t\t\t\t\t},{ \t\t\t\t\t\"id\": \"2\",\t\t\t\t\t\"text\": \"gg z gg\",\t\t\t\t\t\"action\": {\t\t\t\t\t\t\"deeplink\": \"ww://app/ww\",\t\t\t\t\t\t\"finish\": \"1\",\t\t\t\t\t\t\"goto\": \"step10\"\t\t\t\t\t}\t\t\t\t\t},{ \t\t\t\t\t\"id\": \"3\",\t\t\t\t\t\"text\": \"ww gg\",\t\t\t\t\t\"action\": {\t\t\t\t\t\t\"deeplink\": \"dd://app/aa/cvc/aaa\",\t\t\t\t\t\t\"finish\": \"1\",\t\t\t\t\t\t\"goto\": \"step10\"\t\t\t\t\t}\t\t\t\t}            ]       },{            \"id\": \"step3\",            \"text\": \"ggg\",            \"interaction\": null,            \"options\": [{ \t\t\t\t\t\"id\": \"1\",\t\t\t\t\t\"text\": \"ww\",\t\t\t\t\t\"action\": {\t\t\t\t\t\t\"deeplink\": \"ww://app/ww/apply?type=KG&dd=*\",\t\t\t\t\t\t\"finish\": \"1\",\t\t\t\t\t\t\"goto\": \"step10\"\t\t\t\t\t}\t\t\t\t\t},{ \t\t\t\t\t\"id\": \"2\",\t\t\t\t\t\"text\": \"aaa\",\t\t\t\t\t\"action\": {\t\t\t\t\t\t\"deeplink\": \"ww://app/c2c\",\t\t\t\t\t\t\"finish\": \"1\",\t\t\t\t\t\t\"goto\": \"step10\"\t\t\t\t\t}\t\t\t\t\t},{ \t\t\t\t\t\"id\": \"3\",\t\t\t\t\t\"text\": \"aaa\",\t\t\t\t\t\"action\": {\t\t\t\t\t\t\"deeplink\": \"dd://app/aa/ss/ss\",\t\t\t\t\t\t\"finish\": \"1\",\t\t\t\t\t\t\"goto\": \"step10\"\t\t\t\t\t}\t\t\t\t}            ]       },{            \"id\": \"step4\",            \"text\": \"ddd\",            \"interaction\": null,            \"options\": [{ \t\t\t\t\t\"id\": \"1\",\t\t\t\t\t\"text\": \"ss\",\t\t\t\t\t\"action\": {\t\t\t\t\t\t\"deeplink\": \"dd://app/oneclick/apply?type=KG&profile=*\",\t\t\t\t\t\t\"finish\": \"1\",\t\t\t\t\t\t\"goto\": \"step10\"\t\t\t\t\t}\t\t\t\t\t},{ \t\t\t\t\t\"id\": \"2\",\t\t\t\t\t\"text\": \"sss\",\t\t\t\t\t\"action\": {\t\t\t\t\t\t\"deeplink\": \"dd://app/dd\",\t\t\t\t\t\t\"finish\": \"1\",\t\t\t\t\t\t\"goto\": \"step10\"\t\t\t\t\t}\t\t\t\t\t},{ \t\t\t\t\t\"id\": \"3\",\t\t\t\t\t\"text\": \"aaa\",\t\t\t\t\t\"action\": {\t\t\t\t\t\t\"deeplink\": \"dd://app/aa/cvc/aa\",\t\t\t\t\t\t\"finish\": \"1\",\t\t\t\t\t\t\"goto\": \"step10\"\t\t\t\t\t}\t\t\t\t}            ]       },\t\t{\t\t\t\"id\": \"step10\",\t\t\t\"text\": \"aaa\",\t\t\t\"interaction\": null,\t\t\t\"options\": null\t\t}]}"

当我尝试这个时

(
fromJSON(substr(text, 9, nchar(text)))) 
)

我有错误

Error: lexical error: invalid char in json text.
            "script": ["t{            "id": "step1",            "text"
                     (right here) ------^

2 个答案:

答案 0 :(得分:2)

正如评论中所提到的,我不确定“删除第一个和最后一个"”是什么意思。它只定义数据类型(字符)。这是一个解决方案(使用'代替",但在R中,它们具有相同的含义):

text <- "\t\tStatus: {\\id\\:\\d6b084be-9429-4b4b-8141-1cb5f5a84d2d\\,\\device\\:\\lge LG-H955 (z2_global_com)\\,\\result\\:\\1\\,\\script\\:[{\\timestamp\\:\\1519033801850\\,\\step\\:\\step1\\,\\answer\\:\\1\\},{\\timestamp\\:\\1519033879798\\,\\step\\:\\step2\\,\\answer\\:\\1\\}]},"

text <- gsub("\\\\","'", gsub("\t|,$","", text))

text

"Status: {'id':'d6b084be-9429-4b4b-8141-1cb5f5a84d2d','device':'lge LG-H955 (z2_global_com)','result':'1','script':[{'timestamp':'1519033801850','step':'step1','answer':'1'},{'timestamp':'1519033879798','step':'step2','answer':'1'}]}"

根据dienow的回答编辑

如果您正在寻找Dienow的回答所建议的有效'json'(我不熟悉此格式),fromJSON函数需要"。因此,您可以将代码调整为:

text <- gsub("\\\\",'"', gsub("\t|,$","", text))

这与Dienow的回答相同:

library(jsonlite)
fromJSON(substr(text, 9, nchar(text)))

$id
[1] "d6b084be-9429-4b4b-8141-1cb5f5a84d2d"

$device
[1] "lge LG-H955 (z2_global_com)"

$result
[1] "1"

$script
      timestamp  step answer
1 1519033801850 step1      1
2 1519033879798 step2      1

答案 1 :(得分:1)

s <- "\t\tStatus: {\\id\\:\\d6b084be-9429-4b4b-8141-1cb5f5a84d2d\\,\\device\\:\\lge LG-H955 (z2_global_com)\\,\\result\\:\\1\\,\\script\\:[{\\timestamp\\:\\1519033801850\\,\\step\\:\\step1\\,\\answer\\:\\1\\},{\\timestamp\\:\\1519033879798\\,\\step\\:\\step2\\,\\answer\\:\\1\\}]},"
r <- gsub("\t", "", gsub("\\\\", "\"",s))

这里证明结果是有效的json:

library(jsonlite)
fromJson(substr(r, 9, nchar(r) - 1))

此输出

$id
[1] "d6b084be-9429-4b4b-8141-1cb5f5a84d2d"

$device
[1] "lge LG-H955 (z2_global_com)"

$result
[1] "1"

$script
      timestamp  step answer
1 1519033801850 step1      1
2 1519033879798 step2      1
相关问题