在功能上按空格分割字符串,按引号分组!

时间:2010-12-02 12:20:05

标签: recursion functional-programming clojure

在Clojure [1]中编写惯用函数代码,如何编写一个用空格分割字符串但保持引用短语完整的函数?快速解决方案当然是使用正则表达式,但如果没有它们,这应该是可能的。快速浏览一下似乎很难!我在命令式语言中写了类似的东西,但我想看看一个功能性的递归方法是如何工作的。

快速查看我们的功能应该做什么:

"Hello there!"  -> ["Hello", "there!"]
"'A quoted phrase'" -> ["A quoted phrase"]
"'a' 'b' c d" -> ["a", "b", "c", "d"]
"'a b' 'c d'" -> ["a b", "c d"]
"Mid'dle 'quotes do not concern me'" -> ["Mid'dle", "quotes do not concern me"]

我不介意引号之间的间距是否会发生变化(因此可以先使用空格分割)。

"'lots    of   spacing' there" -> ["lots of spacing", "there"] ;is ok to me

[1]这个问题可以在一般水平上回答,但我想Clojure中的功能方法可以轻松地转换为Haskell,ML等。

7 个答案:

答案 0 :(得分:7)

这是一个返回lazy seq of words / quoted strings的版本:

(defn splitter [s]
  (lazy-seq
   (when-let [c (first s)]
     (cond
      (Character/isSpace c)
      (splitter (rest s))
      (= \' c)
      (let [[w* r*] (split-with #(not= \' %) (rest s))]
        (if (= \' (first r*))
          (cons (apply str w*) (splitter (rest r*)))
          (cons (apply str w*) nil)))
      :else
      (let [[w r] (split-with #(not (Character/isSpace %)) s)]
        (cons (apply str w) (splitter r)))))))

试运行:

user> (doseq [x ["Hello there!"
                 "'A quoted phrase'"
                 "'a' 'b' c d"
                 "'a b' 'c d'"
                 "Mid'dle 'quotes do not concern me'"
                 "'lots    of   spacing' there"]]
        (prn (splitter x)))
("Hello" "there!")
("A quoted phrase")
("a" "b" "c" "d")
("a b" "c d")
("Mid'dle" "quotes do not concern me")
("lots    of   spacing" "there")
nil

如果输入中的单引号与正确匹配,则最终打开单引号中的所有内容都将构成一个“单词”:

user> (splitter "'asdf")
("asdf")

更新:另一个版本回答了edbond的评论,更好地处理了单词中的引号字符:

(defn splitter [s]
  ((fn step [xys]
     (lazy-seq
      (when-let [c (ffirst xys)]
        (cond
         (Character/isSpace c)
         (step (rest xys))
         (= \' c)
         (let [[w* r*]
               (split-with (fn [[x y]]
                             (or (not= \' x)
                                 (not (or (nil? y)
                                          (Character/isSpace y)))))
                           (rest xys))]
           (if (= \' (ffirst r*))
             (cons (apply str (map first w*)) (step (rest r*)))
             (cons (apply str (map first w*)) nil)))
         :else
         (let [[w r] (split-with (fn [[x y]] (not (Character/isSpace x))) xys)]
           (cons (apply str (map first w)) (step r)))))))
   (partition 2 1 (lazy-cat s [nil]))))

试运行:

user> (doseq [x ["Hello there!"
                 "'A quoted phrase'"
                 "'a' 'b' c d"
                 "'a b' 'c d'"
                 "Mid'dle 'quotes do not concern me'"
                 "'lots    of   spacing' there"
                 "Mid'dle 'quotes do no't concern me'"
                 "'asdf"]]
        (prn (splitter x)))
("Hello" "there!")
("A quoted phrase")
("a" "b" "c" "d")
("a b" "c d")
("Mid'dle" "quotes do not concern me")
("lots    of   spacing" "there")
("Mid'dle" "quotes do no't concern me")
("asdf")
nil

答案 1 :(得分:5)

这个解决方案在haskell中,但主要思想也适用于clojure 解析器的两种状态(引号内部或外部)由两个相互递归的函数表示。

splitq = outside [] . (' ':)

add c res = if null res then [[c]] else map (++[c]) res

outside res xs = case xs of
    ' '  : ' '  : ys -> outside res $ ' ' : ys
    ' '  : '\'' : ys -> res ++ inside [] ys
    ' '  : ys        -> res ++ outside [] ys
    c    : ys        -> outside (add c res) ys
    _                -> res

inside res xs = case xs of
    ' '  : ' ' : ys -> inside res $ ' ' : ys
    '\'' : ' ' : ys -> res ++ outside [] (' ' : ys)
    '\'' : []       -> res
    c    : ys       -> inside (add c res) ys
    _               -> res

答案 2 :(得分:3)

这是一个Clojure版本。这可能会导致非常大的输入堆栈。正则表达式或真正的解析器生成器会更加简洁。

(declare parse*)
(defn slurp-word [words xs terminator]
  (loop [res "" xs xs]
    (condp = (first xs)
      nil  ;; end of string after this word
      (conj words res)

      terminator ;; end of word
      (parse* (conj words res) (rest xs))

      ;; else
      (recur (str res (first xs)) (rest xs)))))

(defn parse* [words xs]
  (condp = (first xs)
    nil ;; end of string
    words

    \space  ;; skip leading spaces
    (parse* words (rest xs))

    \' ;; start quoted part
    (slurp-word words (rest xs) \')

    ;; else slurp until space
    (slurp-word words xs \space)))

(defn parse [s]
  (parse* [] s))

您的意见:

user> (doseq [x ["Hello there!"
                 "'A quoted phrase'"
                 "'a' 'b' c d"
                 "'a b' 'c d'"
                 "Mid'dle 'quotes do not concern me'"
                 "'lots    of   spacing' there"]]
        (prn (parse x)))

["Hello" "there!"]
["A quoted phrase"]
["a" "b" "c" "d"]
["a b" "c d"]
["Mid'dle" "quotes do not concern me"]
["lots    of   spacing" "there"]
nil

答案 3 :(得分:3)

能够修改Brian的使用trampoline以使其不会耗尽堆栈空间。基本上让slurp-wordparse*返回函数而不是执行它们,然后将parse更改为使用trampoline

(defn slurp-word [words xs terminator]
  (loop [res "" xs xs]
    (condp = (first xs)
        nil  ;; end of string after this word
      (conj words res)

      terminator ;; end of word
      #(parse* (conj words res) (rest xs))

      ;; else
      (recur (str res (first xs)) (rest xs)))))

(defn parse* [words xs]
  (condp = (first xs)
      nil ;; end of string
    words

    \space  ;; skip leading spaces
    (parse* words (rest xs))

    \' ;; start quoted part
    #(slurp-word words (rest xs) \')

    ;; else slurp until space
    #(slurp-word words xs \space)))

    (defn parse [s]
      (trampoline #(parse* [] s)))


(defn test-parse []
  (doseq [x ["Hello there!"
             "'A quoted phrase'"
             "'a' 'b' c d"
             "'a b' 'c d'"
             "Mid'dle 'quotes do not concern me'"
             "'lots    of   spacing' there"
             (apply str (repeat 30000 "'lots    of   spacing' there"))]]
    (prn (parse x))))

答案 4 :(得分:2)

例如fnparse允许您以功能方式编写解析器。

答案 5 :(得分:1)

使用正则表达式:

 (defn my-split [string]
  (let [criterion " +(?=([^']*'[^']*')*[^']*$)"]
   (for [s (into [] (.split string criterion))] (.replace s "'" ""))))

正则表达式中的第一个字符是你要分割字符串的字符 - 这里至少有一个空格..

如果你想更改引用字符,只需将每个'更改为/"<

之类的其他内容 编辑:我刚看到您明确提到您不想使用正则表达式。遗憾!

答案 6 :(得分:1)

噢,我的答案看起来似乎比我自己的测试成功了。无论如何,我在这里发布它,以求一些关于代码特征化的评论。

我描绘了一个haskellish伪:

pl p w:ws = | if w:ws empty
               => p
            | if w begins with a quote
               => pli p w:ws
            | otherwise
               => pl (p ++ w) ws

pli p w:ws = | if w:ws empty
                => p
             | if w begins with a quote
                => pli (p ++ w) ws
             | if w ends with a quote
                => pl (init p ++ (tail p ++ w)) ws
             | otherwise
                => pli (init p ++ (tail p ++ w)) ws

好的,名字很糟糕。有

  • 功能pl处理引用
  • 字样
  • 函数pli(我在内部)处理引用的短语
  • 参数(列表)p是已处理(完成)的信息
  • 参数(列表)w:ws是要处理的信息

我已经用这种方式翻译了伪:

(def quote-chars '(\" \')) ;'

; rewrite .startsWith and .endsWith to support multiple choices
(defn- starts-with?
  "See if given string begins with selected characters."
  [word choices]
  (some #(.startsWith word (str %)) choices))

(defn- ends-with?
  "See if given string ends with selected characters."
  [word choices]
  (some #(.endsWith word (str %)) choices))

(declare pli)
(defn- pl [p w:ws]
    (let [w (first w:ws)
          ws (rest w:ws)]
     (cond
        (nil? w)
            p
        (starts-with? w quote-chars)
            #(pli p w:ws)
        true
            #(pl (concat p [w]) ws))))

(defn- pli [p w:ws]
    (let [w (first w:ws)
          ws (rest w:ws)]
     (cond
        (nil? w)
            p
        (starts-with? w quote-chars)
            #(pli (concat p [w]) ws)
        (ends-with? w quote-chars)
            #(pl (concat 
                  (drop-last p)
                  [(str (last p) " " w)])
                ws)
        true
            #(pli (concat 
                  (drop-last p)
                  [(str (last p) " " w)])
                ws))))

(defn split-line
    "Split a line by spaces, leave quoted groups intact."
    [input]
    (let [splt (.split input " +")]
        (map strip-input 
            (trampoline pl [] splt))))

不是Clojuresque,细节。此外,我依赖正则表达式来分割和剥离引号,因此我应该得到一些downvotes。