将纯文本列表转换为html

时间:2010-05-20 12:48:17

标签: html ruby parsing

我有一个这样的纯文本列表:

I am the first top-level list item
  I am his son
  Me too
Second one here
  His son
  His daughter
    I am the son of the one above
    Me too because of the indentation
  Another one

我想把它变成:

<ul>
  <li>I am the first top-level list-item
    <ul>
      <li>I am his son</li>
      <li>Me too</li>
    </ul>
  </li>
  <li>Second one here
    <ul>
      <li>His son</li>
      <li>His daughter
        <ul>
          <li>I am the son of the one above</li>
          <li>Me too because of the indentation</li>
        </ul>
      </li>
      <li>Another one</li>
    </ul>
  </li>
</ul>

如何做到这一点?

5 个答案:

答案 0 :(得分:5)

我从未使用过红宝石,但通常的算法保持不变:

  1. 创建如下数据结构:
    节点=&gt; (Text =&gt; string,Children =&gt;节点数组)
  2. 阅读一行
  3. 检查缩进是否高于当前缩进
  4. 如果是,请将Line附加到当前节点的Children,并以节点为活动状态递归调用该方法。从2继续。
  5. 检查缩进是否等于当前缩进。
  6. 如果是,请将该行附加到活动节点。从2继续。
  7. 检查缩进是否低于当前缩进。
  8. 如果是,请从方法返回。
  9. 重复直至EOF。
  10. 输出:

    1. print <ul>
    2. Take the first node, print <li>node.Text
    3. If there are child nodes (count of node.Children > 0) recurse to 1.
    4. print </li>
    5. take next node, continue from 2.
    6. print </ul>
    

答案 1 :(得分:1)

此代码按预期工作,但标题打印在新行上。

require "rubygems"
require "builder"

def get_indent(line)
  line.to_s =~ /(\s*)(.*)/
  $1.size
end

def create_list(lines, list_indent = -1, 
       b = Builder::XmlMarkup.new(:indent => 2, :target => $stdout))
  while not lines.empty?
    line_indent = get_indent lines.first

    if line_indent == list_indent
      b.li {
        b.text! lines.shift.strip + $/
        if get_indent(lines.first) > line_indent
          create_list(lines, line_indent, b)
        end
      }
    elsif line_indent < list_indent
      break
    else
      b.ul {
        create_list(lines, line_indent, b)
      }
    end
  end
end

答案 2 :(得分:1)

将输入转换为Haml,然后将其呈现为HTML

require 'haml'

def text_to_html(input)
  indent = -1
  haml = input.gsub(/^( *)/) do |match|
    line_indent = $1.length
    repl = line_indent > indent ? "#{$1}%ul\n" : ''
    indent = line_indent
    repl << "  #{$1}%li "
  end
  Haml::Engine.new(haml).render
end

puts text_to_html(<<END)
I am the first top-level list item
  I am his son
  Me too
Second one here
  His son
  His daughter
    I am the son of the one above
    Me too because of the indentation
  Another one
END

结果

<ul>
  <li>I am the first top-level list item</li>
  <ul>
    <li>I am his son</li>
    <li>Me too</li>
  </ul>
  <li>Second one here</li>
  <ul>
    <li>His son</li>
    <li>His daughter</li>
    <ul>
      <li>I am the son of the one above</li>
      <li>Me too because of the indentation</li>
    </ul>
    <li>Another one</li>
  </ul>
</ul>

答案 3 :(得分:1)

老话题,但...... 看起来我找到了一种方法来使Glenn Jackman代码html有效(避免<ul>与孩子<ul>)。
我正在使用带缩进缩进的字符串。

    require 'haml'
    class String
       def text2htmllist
         tabs = -1
         topUL=true
         addme=''

         haml = self.gsub(/^([\t]*)/) do |match|
           line_tabs = match.length

           if ( line_tabs > tabs )
                if topUL
                    repl = "#{match}#{addme}%ul\n"
                    topUL=false
                else
                    repl = "#{match}#{addme}%li\n"
                    addme += "\t"
                    repl += "#{match}#{addme}%ul\n"
                end
           else
              repl = ''
              addme = addme.gsub(/^[\t]/,'') if ( line_tabs < tabs ) #remove one \t 
           end
           tabs = line_tabs
           repl << "\t#{match}#{addme}%li "

         end
         puts haml
         Haml::Engine.new(haml).render
       end
    end #String class

    str = <<FIM
    I am the first top-level list item
        I am his son
        Me too
    Second one here
        His son
        His daughter
            I am the son of the one above
            Me too because of the indentation
        Another one
    FIM

    puts str.text2htmllist

产地:

%ul
    %li I am the first top-level list item
    %li
        %ul
            %li I am his son
            %li Me too
    %li Second one here
    %li
        %ul
            %li His son
            %li His daughter
            %li
                %ul
                    %li I am the son of the one above
                    %li Me too because of the indentation
            %li Another one
<ul>
  <li>I am the first top-level list item</li>
  <li>
    <ul>
      <li>I am his son</li>
      <li>Me too</li>
    </ul>
  </li>
  <li>Second one here</li>
  <li>
    <ul>
      <li>His son</li>
      <li>His daughter</li>
      <li>
        <ul>
          <li>I am the son of the one above</li>
          <li>Me too because of the indentation</li>
        </ul>
      </li>
      <li>Another one</li>
    </ul>
  </li>
</ul>

答案 4 :(得分:0)

你可以通过做一些简单的发现来做到这一点。替换东西。像Mac上的TextWrangler,Windows上的Notepad ++,以及linux上可能的gedit(不确定它的查找内容与复杂的东西一起工作)等程序可以搜索换行符并用其他东西替换它们。从最高级别的东西开始,按照你的方式工作(从前面没有空格的东西开始工作)。您可能需要进行一些实验才能获得正确的东西。如果这是你想要定期做的事情,你可能会制作一个小脚本,但我怀疑是这种情况。