解析制表符分隔字符串

时间:2012-05-10 00:18:19

标签: parsing lisp common-lisp tab-delimited

如果我有一个我正在阅读的文本文件,我在查找如何将一个以制表符分隔为数据块的字符串作为示例时遇到了一些麻烦

a1     b1     c1     d1     e1
a2     b2     c2     d2     e2

我读了我文件的第一行,得到一个

的字符串
"a1     b1     c1     d1      e2"

我想将其分为5个变量a,b,c,d和e,或者创建一个列表(a b c d e)。有什么想法吗?

感谢。

3 个答案:

答案 0 :(得分:2)

尝试将括号连接到输入字符串的正面和背面,然后使用read-from-string(我假设你使用Common Lisp,因为你标记了你的问题clisp)。

(setf str "a1   b1      c1      d1      e2")
(print (read-from-string (concatenate 'string "(" str ")")))

答案 1 :(得分:2)

另一种方法(可能更强大一些),您也可以轻松修改它,以便在调用回调后可以“设置”字符串中的字符,但我没有这样做这样,因为你似乎不需要这种能力。此外,在后一种情况下,我宁愿使用宏。

(defun mapc-words (function vector
                  &aux (whites '(#\Space #\Tab #\Newline #\Rubout)))
  "Iterates over string `vector' and calls the `function'
with the non-white characters collected so far.
The white characters are, by default: #\Space, #\Tab
#\Newline and #\Rubout.
`mapc-words' will short-circuit when `function' returns false."
  (do ((i 0 (1+ i))
       (start 0)
       (len 0))
      ((= i (1+ (length vector))))
    (if (or (= i (length vector)) (find (aref vector i) whites))
        (if (> len 0)
            (if (not (funcall function (subseq vector start i)))
                (return-from map-words)
                (setf len 0 start (1+ i)))
            (incf start))
        (incf len))) vector)

(mapc-words
 #'(lambda (word)
     (not
      (format t "word collected: ~s~&" word)))
 "a1     b1     c1     d1     e1
a2     b2     c2     d2     e2")

;; word collected: "a1"
;; word collected: "b1"
;; word collected: "c1"
;; word collected: "d1"
;; word collected: "e1"
;; word collected: "a2"
;; word collected: "b2"
;; word collected: "c2"
;; word collected: "d2"
;; word collected: "e2"

这是一个你可以使用的示例宏,如果你想在阅读时修改字符串,但我对此并不完全满意,所以也许有人会想出一个更好的变体。

(defmacro with-words-in-string
    ((word start end
           &aux (whites '(#\Space #\Tab #\Newline #\Rubout)))
     s
     &body body)
  `(do ((,end 0 (1+ ,end))
        (,start 0)
        (,word)
        (len 0))
       ((= ,end (1+ (length ,s))))
     (if (or (= ,end (length ,s)) (find (aref ,s ,end) ',whites))
         (if (> len 0)
             (progn
               (setf ,word (subseq ,s ,start ,end))
               ,@body
               (setf len 0 ,start (1+ ,end)))
             (incf ,start))
         (incf len))))

(with-words-in-string (word start end)
    "a1     b1     c1     d1     e1
a2     b2     c2     d2     e2"
(format t "word: ~s, start: ~s, end: ~s~&" word start end))

答案 2 :(得分:0)

假设它们是标签(不间隔),那么这将创建一个列表

(defun tokenize-tabbed-line (line)
  (loop 
     for start = 0 then (+ space 1)
     for space = (position #\Tab line :start start)
     for token = (subseq line start space)
     collect token until (not space)))

导致以下结果:

CL-USER> (tokenize-tabbed-line "a1  b1  c1  d1  e1")
("a1" "b1" "c1" "d1" "e1")