单词中的撇号未被识别为字符串替换

时间:2017-11-17 17:45:37

标签: regex go regex-group

我在使用正则表达式替换"你是" 这个词时遇到了问题。

所有其他字词都正确地改变了"你" 这个词。 我认为它不是在撇号后解析。

我必须将"你" 替换为"我" "你&# 39;重新" "我'" 。 它会将"你" 更改为"我" 但是"你是" 成为"我" 因为它没有超过撇号,并且由于某种原因认为这是单词的结尾。我不得不以某种方式逃避撇号。

请参阅下面的相关代码。

package main

import (
    "fmt"
    "math/rand"
    "regexp"
    "strings"
    "time"
)

//Function ElizaResponse to take in and return a string
func ElizaResponse(str string) string {

    //  replace := "How do you know you are"

    /*Regex MatchString function with isolation of the word "father"
    *with a boundry ignore case regex command.
     */
    if matched, _ := regexp.MatchString(`(?i)\bfather\b`, str);
    //Condition to replace the original string if it has the word "father"
    matched {
        return "Why don’t you tell me more about your father?"
    }
    r1 := regexp.MustCompile(`(?i)\bI'?\s*a?m\b`)

    //Match the words "I am" and capture for replacement
    matched := r1.MatchString(str)

    //condition if "I am" is matched
    if matched {

        capturedString := r1.ReplaceAllString(str, "$1")
        boundaries := regexp.MustCompile(`\b`)
        tokens := boundaries.Split(capturedString, -1)

        // List the reflections.
        reflections := [][]string{
            {`I`, `you`},
            {`you're`, `I'm`},
            {`your`, `my`},
            {`me`, `you`},
            {`you`, `I`},
            {`my`, `your`},
        }

        // Loop through each token, reflecting it if there's a match.
        for i, token := range tokens {
            for _, reflection := range reflections {
                if matched, _ := regexp.MatchString(reflection[0], token); matched {
                    tokens[i] = reflection[1]
                    break
                }
            }
        }

        // Put the tokens back together.
        return strings.Join(tokens, ``)

    }

    //Get random number from the length of the array of random struct
    //an array of strings for the random response
    response := []string{"I’m not sure what you’re trying to say. Could you explain it to me?",
        "How does that make you feel?",
        "Why do you say that?"}
    //Return a random index of the array
    return response[rand.Intn(len(response))]

}

func main() {
    rand.Seed(time.Now().UTC().UnixNano())

    fmt.Println("Im supposed to just take what you're saying at face value?")
    fmt.Println(ElizaResponse("Im supposed to just take what you're saying at face value?"))


}

3 个答案:

答案 0 :(得分:4)

请注意,撇号字符会创建一个单词边界,因此在正则表达式中使用\b可能会使您绊倒。也就是说,字符串"I'm"有四个单词边界,每个字符前后一个。

┏━┳━┳━┓
┃I┃'┃m┃
┗━┻━┻━┛
│ │ │ └─ end of line creates a word boundary
│ │ └─── after punctuation character creates a word boundary
│ └───── before punctuation character creates a word boundary
└─────── start of line creates a word boundary

无法更改单词边界元字符的行为,因此您可能最好将包含带标点符号的完整单词的正则表达式映射到所需的替换,例如:

type Replacement struct {
  rgx *regexp.Regexp
  rpl string
}

replacements := []Replacement{
  {regexp.MustCompile("\\bI\\b"), "you"},
  {regexp.MustCompile("\\byou're\\b"), "I'm"},
  // etc...
}

另请注意,您的一个示例包含一个UTF-8"右侧单引号" (U + 2019,0xe28099),不要与UTF-8 / ASCII撇号(U + 0027,0x27)混淆!

fmt.Sprintf("% x", []byte("'’")) // => "27 e2 80 99"

答案 1 :(得分:1)

您希望在此处实现的是使用特定替换替换特定字符串。使用字符串键和值的映射更容易实现,其中每个唯一键是要搜索的文字短语,值是要替换的文本。

这是您如何定义反射

reflections := map[string]string{
    `you're`: `I'm`,
    `your`: `my`,
    `me`: `you`,
    `you`: `I`,
    `my`: `your`,
    `I` : `you`,
}

接下来,您需要按长度顺序递减(此处为a sample code):

type ByLenDesc []string
func (a ByLenDesc) Len() int {
   return len(a)
}
func (a ByLenDesc) Less(i, j int) bool {
   return len(a[i]) > len(a[j])
}
func (a ByLenDesc) Swap(i, j int) {
   a[i], a[j] = a[j], a[i]
}

然后在函数中:

var keys []string
for key, _ := range reflections {
    keys = append(keys, key)
}
sort.Sort(ByLenDesc(keys))

然后构建模式:

pat := "\\b(" + strings.Join(keys, `|`) + ")\\b"
// fmt.Println(pat) // => \b(you're|your|you|me|my|I)\b

该模式将you'reyouryoumemyI视为整个单词。

res := regexp.MustCompile(pat).ReplaceAllStringFunc(capturedString, func(m string) string {
    return reflections[m]
})

上面的代码创建了一个正则表达式对象,并将所有匹配替换为相应的reflections值。

请参阅Go demo

答案 2 :(得分:0)

我发现我只需更改这两行代码。

boundaries := regexp.MustCompile(`(\b[^\w']|$)`)
return strings.Join(tokens, ` `)

它阻止分割功能在'字符处分裂。 然后,令牌的返回需要一个空格来输出字符串,否则它将是一个连续的字符串。