Question

我具有此功能，该功能旨在将字符串作为输入并替换非字母，数字，下划线或破折号的任何内容：

def clean_label_value(label_value):
    """
    GCP Label values have to follow strict guidelines
        Keys and values can only contain lowercase letters, numeric characters, underscores,
        and dashes. International characters are allowed.
    https://cloud.google.com/compute/docs/labeling-resources#restrictions
    :param label_value: label value that needs to be cleaned up
    :return: cleaned label value
    """
    full_pattern = re.compile('[^a-zA-Z0-9]')
    return re.sub(full_pattern, '_', label_value).lower()

我有这个单元测试，成功了

def test_clean_label_value(self):
    self.assertEqual(clean_label_value('XYZ_@:.;\\/,'), 'xyz________')

但是它的破折号是我不希望的。演示：

def clean_label_value(label_value):
    full_pattern = re.compile('[^a-zA-Z0-9]|-')
    return re.sub(full_pattern, '_', label_value).lower()

但这：

def test_clean_label_value(self):
    self.assertEqual(clean_label_value('XYZ-'), 'xyz-')

然后失败

xyz-！= xyz _

预期：xyz_
  实际：xyz-

换句话说，-被_取代。我不希望那样发生。我摆弄了正则表达式，尝试了各种不同的组合，但是我无法弄清楚这些东西。有人吗？

Answer 1

在集合的最开始或结尾放置一个-（字符类）。然后，它不会创建字符范围，而是表示文字-字符本身。

re.compile('[^-a-zA-Z0-9]')

还可以使用-来对\进行转义，以表明它是文字破折号而不是集合中的范围运算符。

re.compile(r'[^\-\w]')

特殊序列\w等同于集合[a-zA-Z0-9_]（“ w”代表“文字字符”）。

正则表达式可替换除小写字母，数字字符，下划线和破折号之外的所有内容

1 个答案: