如果“引用”,我该如何拆分字符串并忽略分隔符

时间:2018-04-11 12:50:38

标签: r regex strsplit

说我有以下字符串:

params <- "var1 /* first, variable */, var2, var3 /* third, variable */"

我想使用,作为分隔符将其拆分,然后提取“引用的子串”,所以我得到2个向量如下:

params_clean <- c("var1","var2","var3")
params_def   <- c("first, variable","","third, variable") # note the empty string as a second element.

我在广义上使用术语“引用”,其中包含/**/的任意字符串,可保护子字符串不被分割。

我找到了一个基于read.table的解决方法,以及它允许​​引用元素的事实:

library(magrittr)
params %>%
  gsub("/\\*","_temp_sep_ '",.) %>%
  gsub("\\*/","'",.) %>%
  read.table(text=.,strin=F,sep=",") %>%
  unlist %>%
  unname %>%
  strsplit("_temp_sep_") %>%
  lapply(trimws) %>%
  lapply(`length<-`,2) %>%
  do.call(rbind,.) %>%
  inset(is.na(.),value="")

但是它非常丑陋和hackish,有什么更简单的方法?我认为在这种情况下必须有regex提供给strsplit

this question

相关

3 个答案:

答案 0 :(得分:2)

您可以使用

library(stringr)
cmnt_rx <- "(\\w+)\\s*(/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/)?"
res <- str_match_all(params, cmnt_rx)
params_clean <- res[[1]][,2]
params_clean
## => [1] "var1" "var2" "var3"
params_def <- gsub("^/[*]\\s*|\\s*[*]/$", "", res[[1]][,3])
params_def[is.na(params_def)] <- ""
params_def
## => [1] "first, variable" ""                "third, variable"

主要的正则表达式详细信息(实际上是(\w+)\s*)(COMMENTS_REGEX)?):

  • (\w+) - 捕获第1组:一个或多个单词字符
  • \s* - 0+空白字符
  • ( - 捕获第2组开始
  • /\* - 匹配评论开始/*
  • [^*]*\*+ - 匹配*以外的0 +个字符,后跟1 +字面*
  • (?:[^/*][^*]*\*+)* - 0+序列:
    • [^/*][^*]*\*+ - 不是/*(与[^/*]匹配),后跟0 +非星号字符([^*]*),后跟1 +星号(\*+
  • / - 关闭/
  • )? - 捕获第2组结束,重复1次或0次(这意味着它是可选的)。

请参阅regex demo

"^/[*]\\s*|\\s*[*]/$"中的gsub模式会删除/**/相邻的空格。

params_def[is.na(params_def)] <- ""部分用空字符串替换NA

答案 1 :(得分:1)

你在这里

<!DOCTYPE html>
<html lang="en">

<head>
  <title>Bootstrap Example</title>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css">
  <link href="https://fonts.googleapis.com/css?family=Black+Han+Sans" rel="stylesheet">
  <link href="https://fonts.googleapis.com/css?family=Montserrat:400,700" rel="stylesheet">
  <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
  <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/js/bootstrap.min.js"></script>
  <link href="https://fonts.googleapis.com/css?family=Lora" rel="stylesheet">
  <link href="https://fonts.googleapis.com/css?family=Gugi" rel="stylesheet">
  <script defer src="https://use.fontawesome.com/releases/v5.0.9/js/all.js" integrity="sha384-8iPTk2s/jMVj81dnzb/iFR2sdA7u06vHJyyLlAd4snFpCl/SnyUjRrbdJsw1pGIl" crossorigin="anonymous"></script>
  <link href="https://fonts.googleapis.com/css?family=Saira+Extra+Condensed:400,900" rel="stylesheet">

  <style>
    body {
      position: relative;
    }
    
    ul.nav-pills {
      top: 100px;
      position: fixed;
    }
    
    div.col-sm-9 div {
      height: 250px;
      font-size: 28px;
    }
    
    .bg-1 {
      background-color: black;
    }
    
    .bg-1 ul li {
      color: #ecf0f1;
      font-family: 'Gugi', cursive;
      font-size: 15px;
    }
    
    .bg-2 {
      width: 86%;
      background-color: #d1d8e0;
    }
    
    .col-sm-3 {
      width: 14% !important;
      background-color: white !important;
    }
    
    @media screen and (max-width: 810px) {
      #about,
      #education,
      #certifications,
      #skills,
      #projects,
      #experience,
      #interest {
        margin-left: 150px;
      }
    }
    
    #myScrollspy {
      position: fixed;
      align-self: center;
      left: 0;
      top: 0;
    }
    
    #hello {
      font-family: 'Black Han Sans', sans-serif;
      font-size: 65px;
      margin-left: 150px;
      margin-top: 200px;
    }
    
    #name {
      font-family: 'Montserrat', sans-serif;
      font-size: 55px;
      font-weight: 600;
      margin-left: 150px;
      margin-top: 30px;
    }
    
    #self {
      font-family: 'Montserrat', sans-serif;
      font-size: 30px;
      font-weight: 500;
      margin-left: 150px;
      margin-top: 30px;
    }
    
    #engineer {
      font-family: 'Montserrat', sans-serif;
      font-size: 30px;
      font-weight: 500;
      margin-left: 150px;
      margin-top: 30px;
    }
    
    #intro {
      font-family: 'Lora', serif;
      font-size: 20px;
      color: #d1d8e0;
      margin-left: 150px;
      margin-top: 15px;
    }
    
    hr {
      width: 400px;
      border-top: 1px solid #4b6584;
      border-bottom: 1px solid #4b6584;
    }
    
    #education {
      margin-top: 0px;
    }
    
    #email {
      font-size: 10px;
    }
    
    #headings {
      font-family: 'Saira Extra Condensed', sans-serif;
      color: #343a40;
      font-weight: 700;
      font-size: 50px;
    }
    
    #social {
      margin-top: -90px;
      margin-left: 250px;
    }
    
    .image {
      margin-left: 20px;
      padding: 1px;
    }
    
    .subheadings {
      font-family: 'Saira Extra Condensed', sans-serif;
      color: #20bf6b;
      font-weight: 500;
      font-size: 40px;
    }
    
    #certifications {
      margin-top: 350px;
    }
    
    .nav-pills>li.active>a,
    .nav-pills>li.active>a:focus,
    .nav-pills>li.active>a:hover {
      color: #fff;
      /* background-color: #337ab7; */
      background-color: unset !important;
    }
  </style>
</head>

<body data-spy="scroll" data-target="#myScrollspy" data-offset="20">

  <div class="container-fluid">
    <div class="row bg-1">
      <!--  Left Side Navigation Bar -->
      <div class="col-sm-3 text-center" id="backg">
        <nav id="myScrollspy">

          <ul class="nav nav-pills nav-stacked ">
            <img class="img-rounded img-responsive center-block image" src="naqqash.png" height="150" width="150 ">
            <li class="active"><a href="#about">ABOUT</a></li>
            <li><a href="#education">EDUCATION</a></li>
            <li><a href="#certifications">CERTIFICATIONS</a></li>
            <li><a href="#skills">SKILLS</a></li>
            <li><a href="#projects">PROJECTS</a></li>
            <li><a href="#experience">EXPERIENCE</a></li>
            <li><a href="#interest">INTEREST</a></li>
          </ul>
        </nav>

      </div>
      <!--  Right Side  -->
      <div class="col-sm-9 bg-2">
        <!-- About -->
        <div id="about">
          <h1 id="hello">hello</h1>
          <h3 id="name">I'm Muhammad Naqqash,</h3>
          <h3 id="self">a self taught developer.</h3>
          <h3 id="engineer">I'm a Computer Engineer. </h3>
          <p id="intro">I'm a positive and friendly person. Also, I love to set goals and achieve them.<br> My important qualities: self-motivated, ability overcome difficulties and the <br> ability to learn.</p>

          <div id="social">
            <a href=""><i class="fab fa-facebook fa-lg"></i><span style="display:inline-block; width: 5px;"></span>
            <a href=""><i class="fab fa-linkedin fa-lg"></i></a> <span style="display:inline-block; width: 0px;"></span>
            <a href=""><i class="fab fa-twitter-square fa-lg"></i></a><span style="display:inline-block; width: 5px;"></span>
            <a href=""><i class="fab fa-github-square fa-lg"></i></a><span style="display:inline-block; width: 5px;"></span>

          </div>


        </div>
        <br>
        <br>
        <br>
        <br>
        <br>
        <br>
        <hr>
        <br>
        <br>
        <br>
        <!-- education -->
        <div id="education">
          <br>
          <br>
          <br>
          <br>
          <br>
          <h1 id="headings">EDUCATION</h1>
          <h3 class="subheadings"><i class="fas fa-graduation-cap fa-sm"></i><span style="display:inline-block; width: 20px;"></span>BS Computer Engineering</h3>
          <h3 class="subheadings"><i class="fas fa-university"></i><span style="display:inline-block; width: 20px;"></span>NUST College of E&ME</h3>
        </div>
        <div id="certifications">
          <h1 id="headings">Section 3</h1>
          <p>Try to scroll this section and look at the navigation list while scrolling!</p>
        </div>
        <div id="skills">
          <h1 id="headings">Section 4</h1>
          <p>Try to scroll this section and look at the navigation list while scrolling!</p>
        </div>
        <div id="projects">
          <h1 id="headings">Section 5</h1>
          <p>Try to scroll this section and look at the navigation list while scrolling!</p>
        </div>
        <div id="experience">
          <h1 id="headings">Section 6</h1>
          <p>Try to scroll this section and look at the navigation list while scrolling!</p>
        </div>
        <div id="interest">
          <h1 id="headings">Section 7</h1>
          <p>Try to scroll this section and look at the navigation list while scrolling!</p>
        </div>
        <div>
          <h1>Section 7</h1>
          <p>Try to scroll this section and look at the navigation list while scrolling!</p>
        </div>
      </div>
    </div>
  </div>

</body>

</html>

答案 2 :(得分:1)

您可以将其包装在一个函数中并使用普通(*SKIP)(*FAIL)中的(没有详细记录的)R机制:

getparams <- function(params) {
  tmp <- unlist(strsplit(params, "/\\*.*?\\*/(*SKIP)(*FAIL)|,", perl = TRUE))

  params_clean <- vector(length = length(tmp))
  params_def <- vector(length = length(tmp))

  for (i in seq_along(tmp)) {
    # get params_def if available
    match <- regmatches(tmp[i], regexec("/\\*(.*?)\\*/", tmp[i]))
    params_def[i] <- ifelse(identical(match[[1]], character(0)), "", trimws(match[[1]][2]))

    # params_clean
    params_clean[i] <- trimws(gsub("/(.*)\\*.*?\\*/", "\\1", tmp[i]))
  }

  return(list(params_clean = params_clean, params_def = params_def))
}

params <- "var1 /* first, variable */, var2, var3 /* third, variable */"
getparams(params)

这会使用(*SKIP)(*FAIL)(请参阅a demo on regex101.com)拆分初始字符串,然后分析这些部分。

<小时/> 这会产生一个列表:

$params_clean
[1] "var1" "var2" "var3"

$params_def
[1] "first, variable" ""                "third, variable"

<小时/> 或者,sapply

缩短
getparams <- function(params) {
  tmp <- unlist(strsplit(params, "/\\*.*?\\*/(*SKIP)(*FAIL)|,", perl = TRUE))
  (p <- sapply(tmp, function(x) {
    match <- regmatches(x, regexec("/\\*(.*?)\\*/", x))
    def <- ifelse(identical(match[[1]], character(0)), "", trimws(match[[1]][2]))
    clean <- trimws(gsub("/(.*)\\*.*?\\*/", "\\1", x))
    c(clean, def)
  }, USE.NAMES = F))
}

这将产生一个矩阵:

     [,1]              [,2]   [,3]             
[1,] "var1"            "var2" "var3"           
[2,] "first, variable" ""     "third, variable"

使用后者,您可以获得变量名称,例如: result[1,]