Question

我正在尝试编写一个正则表达式，它将搜索单词test并与以下字符串匹配：

 test
test
//test
// tests
         // test
  // 1    2 34 test

但是它会失败：

// 1234567890 test
12345678901test

到目前为止，除了0-10个非空白字符之外，我还有(?=\S|\s)test([^\n]*)，可以使用它。 Here's a regex101 link。

Answer 1

您可以使用：

from sklearn.datasets import fetch_20newsgroups

categories = ['alt.atheism', 'soc.religion.christian',
               'comp.graphics', 'sci.med']

dataset = fetch_20newsgroups(subset='train',
     categories=categories, shuffle=True, random_state=42)

x_train, x_test, y_train, y_test = cross_validation.train_test_split(dataset.data, dataset.target, test_size=0.2, random_state=42)

#production of bag of words from x_train

count_vect = CountVectorizer()
x_train_counts = count_vect.fit_transform(x_train)
train_vocab = count_vect.get_feature_names()

#training the Naive Bayes classifier

clf = MultinomialNB().fit(x_train_counts, y_train)

它使用积极的前瞻性断言用例，以使捕获结果不会流到其他行上。

https://regex101.com/r/x4EA4g/3

Answer 2

尝试这个(?:\s|^)\S{0,10}test

https://regex101.com/r/M2oLJe/1

Answer 3

以下正则表达式将匹配Photogrammetry前面具有10个或更少的非空白字符的字符串，而忽略空白字符：

test

演示：https://regex101.com/r/x4EA4g/2

匹配字符串前面有10个或更少的非空格字符的字符串，忽略空格字符

3 个答案: