从较小的短语重建原始句子?

时间:2012-07-02 20:47:15

标签: ruby string nlp

我有一个原始句子

sent = "For 15 years photographer Kari Greer has been documenting wildfires and the men and women who battle them."

和短语:

phrases = [
  "For 15 years",
  "wildfires and the men and women who battle them",
  "has been documenting wildfires",
  "been documenting wildfires and the men and women who battle them",
  "documenting wildfires and the men and women who battle them",
  "them",
  "and the men and women who battle them",
  "battle them",
  "wildfires",
  "the men and women",
  "the men and women who battle them",
  "15 years",
  "photographer Kari Greer"
]

我想从短语中重构原始句子(不丢失任何单词),并将选定的短语存储在新数组中,保持顺序,以便我得到:

 result = [
   "For 15 years",
   "photographer Kari Greer",
   "has been documenting wildfires",
   "and the men and women who battle them"
]

修改result具有最少数量的元素非常重要。

修改:以下是适用于更复杂案例的答案代码版本:

 sent ="Shes got six teeth Pink says of her 13-month-old daughter but shes not a biter"      
 phrases = ["her 13-month-old daughter", "she", "says of her 13-month-old daughter", "a biter", "got six teeth", "Pink", "of her 13-month-old daughter", "s not a biter", "She", "six teeth", "s got six teeth", "Shes got six"] 

def shortest(string, phrases)
 string = string.gsub(/\.|\n|\'|,|\?|!|:|;|'|"|`|\n|,|\?|!/, '')
 best_result = nil
 phrases.each do |phrase|
  if string.match(/#{phrase}/)
    result = [phrase] + shortest(string.sub(/#{phrase}/, "").strip, phrases)
        best_result = result  if (best_result.nil? || result.size < best_result.size) # && string == result.join(" ")
      end
    end
  best_result || []
end

2 个答案:

答案 0 :(得分:1)

def solve arr
    len = arr.count
    (len - 1).downto(0) do |i|
        phrase = arr[0..i].join(" ")
        if $phrases.include?(phrase)
            return [phrase] if len - 1 == i
            ans = solve arr[(i + 1)..(len - 1)]
            return [phrase] + [ans] if ans.count != 0
        end
    end
    []
end

words = sent.gsub(".", "").split(" ")
res = solve words
puts res.flatten.inspect

我认为这应该有效。它寻找匹配的最大短语,并检查短语的其余部分是否可以分解为短语。

这可能有更好的方法,但早上4点......

答案 1 :(得分:1)

def shortest(string, phrases)
  best_result = nil
  phrases.each do |phrase|
    if string.match(/\A#{phrase}/)
      result = [phrase] + shortest(string.sub(/\A#{phrase}/, "").strip, phrases)
      best_result = result if (best_result.nil? || result.size < best_result.size) && string.match(Regexp.new("\\A#{result.join("\\s?")}\\Z"))
    end
  end
  best_result || []
end
result = shortest(sent.gsub(/\./, ""), phrases)

编辑:更新了算法,以允许某些短语之间没有空格。