使用正则表达式将句子分成单词

时间:2015-03-16 12:11:47

标签: ruby regex string

我正在尝试做以上事情。例如:

"This is a sentence I'm currently writing, potentially with punctuation dotted in: item1, item2, item3. That is all."

应分成单个词:

This
is 
a
sentence
I'm

等等。

我只是在努力编写正则表达式。我知道使用一两个分隔符会很容易,但试图了解有关regexp的更多信息。

2 个答案:

答案 0 :(得分:3)

根据一个或多个空格字符分割您的输入。

> "This is a sentence I'm currently writing, potentially with punctuation dotted in: item1, item2, item3. That is all.".split(/\s+/)
=> ["This", "is", "a", "sentence", "I'm", "currently", "writing,", "potentially", "with", "punctuation", "dotted", "in:", "item1,", "item2,", "item3.", "That", "is", "all."]

> "This is a sentence I'm currently writing, potentially with punctuation dotted in: item1, item2, item3. That is all.".split()
=> ["This", "is", "a", "sentence", "I'm", "currently", "writing,", "potentially", "with", "punctuation", "dotted", "in:", "item1,", "item2,", "item3.", "That", "is", "all."]

匹配一个或多个非空格字符。

> "This is a sentence I'm currently writing, potentially with punctuation dotted in: item1, item2, item3. That is all.".scan(/\S+/)
=> ["This", "is", "a", "sentence", "I'm", "currently", "writing,", "potentially", "with", "punctuation", "dotted", "in:", "item1,", "item2,", "item3.", "That", "is", "all."]

答案 1 :(得分:0)

使用split

2.0.0-p481 :001 > a="This is a sentence I'm currently writing, potentially with punctuation dotted in: item1, item2, item3. That is all."
 => "This is a sentence I'm currently writing, potentially with punctuation dotted in: item1, item2, item3. That is all." 
2.0.0-p481 :002 > a.split
 => ["This", "is", "a", "sentence", "I'm", "currently", "writing,", "potentially", "with", "punctuation", "dotted", "in:", "item1,", "item2,", "item3.", "That", "is", "all."] 
2.0.0-p481 :003 > 

使用循环来构建每行的词语

2.0.0-p481 :036 > a="This is a sentence I'm currently writing, potentially with punctuation dotted in: item1, item2, item3. That is all."
 => "This is a sentence I'm currently writing, potentially with punctuation dotted in: item1, item2, item3. That is all." 
2.0.0-p481 :037 > a.split.each{ |i|  puts "#{i}"}
This
is
a
sentence
I'm
currently
writing,
potentially
with
punctuation
dotted
in:
item1,
item2,
item3.
That
is
all.
 => ["This", "is", "a", "sentence", "I'm", "currently", "writing,", "potentially", "with", "punctuation", "dotted", "in:", "item1,", "item2,", "item3.", "That", "is", "all."] 
2.0.0-p481 :038 >