在tcl中匹配正则表达式时出现问题

时间:2014-06-06 12:02:30

标签: regex tcl

我正在尝试拆分字符串

Change 709131 on 2014/06/05 by person1

    - some description

Change 709081 on 2014/06/05 by person2

    more description

Change 708930 on 2014/06/04 by person3

    description xyz


Change 708906 on 2014/06/04 by person4

    description of change

我想从Change \d+分开(这意味着更改709081等)。

我正在尝试使用

set abc [regexp -inline -all {Change \d+\son.*Change \d+\son} $oIfs]

我没有得到所需的输出

编辑:我发现的一种方式是

set abc [regexp -inline -all {Change.*?(?=Change)} $oIfs]

但它没有给出声明的最后部分。

4 个答案:

答案 0 :(得分:1)

你可以试试这个结构:

Change \d+(?:(?!\mChange\M).)+

(?:(?!Change).)+将匹配除Change之外的任何字符。

codepad demo

答案 1 :(得分:1)

Tcllib救援:http://tcllib.sourceforge.net/doc/textutil_split.html

package require textutil::split

set s {Change 709131 on 2014/06/05 by person1

    - some description

Change 709081 on 2014/06/05 by person2

    more description

Change 708930 on 2014/06/04 by person3

    description xyz


Change 708906 on 2014/06/04 by person4

    description of change}

foreach {chg desc} [lrange [textutil::split::splitx $s {(Change \d+)}] 1 end] {lappend changes "$chg$desc"}

set i 0
foreach chg $changes {puts "[incr i]> $chg"}
1> Change 709131 on 2014/06/05 by person1

    - some description


2> Change 709081 on 2014/06/05 by person2

    more description

3> Change 708930 on 2014/06/04 by person3

    description xyz



4> Change 708906 on 2014/06/04 by person4

    description of change

答案 2 :(得分:1)

解决问题的一种方法是逐行处理数据并构建“记录”。当您遇到记录的开头时,对先前的记录执行某些操作,然后重置(即构建新记录)。以下是一些建议的代码:

set data {Change 709131 on 2014/06/05 by person1

    - some description

Change 708906 on 2014/06/04 by person4

    description of change
}

proc do_something {record} {
    # Process a record, in this case, just print it out with separators
    if {[llength $record] == 0} { return }

    puts "----------------"
    foreach line $record {
        puts $line
    }
}

set record [list]
foreach line [split $data \n] {
    if {[regexp {^Change \d+} $line]} {
        # Encounter the start of a record, process the previous record
        # and start a new record
        do_something $record
        set record [list]
    }
    lappend record "$line"
}

# Process the last record
if {[llength $record] != 0} { do_something $record }

答案 3 :(得分:1)

这是一个棘手的正则表达式,但它适用于您的示例数据:

regexp -all -inline {(?w)^Change.*?(?:\Z|\n(?=Change))} $sampleData

看看RE本身的各个部分:

(?w)             # "Weird" mode; ^ and $ are line anchored but . matches newlines
^Change          # "Change" at the start of a line...
.*?              # and as few extra characters as possible, until...
(?:              #   (start non-capturing group)
  \Z             # ... the end of the whole string...
|                # or...
  \n             # ... newline, followed by...
  (?=Change)     # ... "Change" (as zero-width lookahead)
)                #   (end non-capturing group)

使用您的样本数据:

% regexp -all -inline {(?w)^Change.*?(?:\Z|\n(?=Change))} $sampleData
{Change 709131 on 2014/06/05 by person1

    - some description

} {Change 709081 on 2014/06/05 by person2

    more description

} {Change 708930 on 2014/06/04 by person3

    description xyz


} {Change 708906 on 2014/06/04 by person4

    description of change}

对我来说还不错。假设没有人将“Change”直接放在描述中的行首。