Question

我正在使用Excel列表并希望转向：

Quercus agrifolia var。 oxyadenia （Torr。）J.T。豪威尔

进入：

<i>Quercus agrifolia</i> var. <i>oxyadenia</i> (Torr.) J.T. Howell

我有Rich Text格式列表，其中应用了格式但我想将其发送到Access，并在相关文本周围明确包含格式化标记。

Answer 1

我当时想做同样的事情，并在MSDN上找到答案： Convert contents of a formatted excel cell to HTML format

我希望这对你有所帮助，它使用了一个excel宏。

编辑：使用时我需要修改嵌套标签的代码，请在下面找到我对宏的更新：

Function fnConvert2HTML(myCell As Range) As String
    Dim bldTagOn, itlTagOn, ulnTagOn, colTagOn As Boolean
    Dim i, chrCount As Integer
    Dim chrCol, chrLastCol, htmlTxt, htmlEnd As String

    bldTagOn = False
    itlTagOn = False
    ulnTagOn = False
    colTagOn = False
    chrCol = "NONE"
    'htmlTxt = "<html>"
    htmlTxt = ""
    chrCount = myCell.Characters.Count

    For i = 1 To chrCount
    htmlEnd = ""
        With myCell.Characters(i, 1)
            If (.Font.Color) Then
                chrCol = fnGetCol(.Font.Color)
                If Not colTagOn Then
                    htmlTxt = htmlTxt & "<font color=#" & chrCol & ">"
                    colTagOn = True
                Else
                    If chrCol <> chrLastCol Then htmlTxt = htmlTxt & "</font><font color=#" & chrCol & ">"
                End If
            Else
                chrCol = "NONE"
                If colTagOn Then
                    htmlEnd = "</font>" & htmlEnd
                    'htmlTxt = htmlTxt & "</font>"
                    colTagOn = False
                End If
            End If
            chrLastCol = chrCol

            If .Font.Bold = True Then
                If Not bldTagOn Then
                    htmlTxt = htmlTxt & "<b>"
                    bldTagOn = True
                End If
            Else
                If bldTagOn Then
                    'htmlTxt = htmlTxt & "</b>"
                    htmlEnd = "</b>" & htmlEnd
                    bldTagOn = False
                End If
            End If

            If .Font.Italic = True Then
                If Not itlTagOn Then
                    htmlTxt = htmlTxt & "<i>"
                    itlTagOn = True
                End If
            Else
                If itlTagOn Then
                    'htmlTxt = htmlTxt & "</i>"
                    htmlEnd = "</i>" & htmlEnd
                    itlTagOn = False
                End If
            End If

            If .Font.Underline > 0 Then
                If Not ulnTagOn Then
                    htmlTxt = htmlTxt & "<u>"
                    ulnTagOn = True
                End If
            Else
                If ulnTagOn Then
                    'htmlTxt = htmlTxt & "</u>"
                    htmlEnd = "</u>" & htmlEnd
                    ulnTagOn = False
                End If
            End If

            If (Asc(.Text) = 10) Then
                htmlTxt = htmlTxt & htmlEnd & "<br>"
            Else
                htmlTxt = htmlTxt & htmlEnd & .Text
            End If

        End With
    Next

    If colTagOn Then
        htmlTxt = htmlTxt & "</font>"
        colTagOn = False
    End If
    If bldTagOn Then
        htmlTxt = htmlTxt & "</b>"
        bldTagOn = False
    End If
    If itlTagOn Then
        htmlTxt = htmlTxt & "</i>"
        itlTagOn = False
    End If
    If ulnTagOn Then
        htmlTxt = htmlTxt & "</u>"
        ulnTagOn = False
    End If
    'htmlTxt = htmlTxt & "</html>"
    fnConvert2HTML = htmlTxt
End Function

Function fnGetCol(strCol As String) As String
    Dim rVal, gVal, bVal As String
    strCol = Right("000000" & Hex(strCol), 6)
    bVal = Left(strCol, 2)
    gVal = Mid(strCol, 3, 2)
    rVal = Right(strCol, 2)
    fnGetCol = rVal & gVal & bVal
End Function

Answer 2

这是一个更快的替代解决方案，但是产生的输出更混乱（因为它使用Word的HTML引擎）。您需要为VBA项目添加以下引用：

Microsoft HTML对象库
Microsoft脚本运行时
Microsoft Word 16.0对象库

然后，通过运行eg调用以下代码。 convertToHtml(Range("A1:A100"))在立即窗口中：

' Given a temporary file path, load the HTML in that file
' and return the first paragraph's inner HTML.
Function extractFirstParagraph(filePath As String) As String
    Dim fs As New FileSystemObject, _
        html As New MSHTML.HTMLDocument, _
        par As MSHTML.HTMLGenericElement
    html.body.innerHTML = fs.OpenTextFile(filePath).ReadAll
    Set par = html.getElementsByTagName("P")(0)
    extractFirstParagraph = par.innerHTML
End Function

Sub convertToHtml(rng As Range)
    ' Open a single Word instance.
    Dim w As New Word.Application, d As Word.Document
    Set d = w.Documents.Add

    Dim cell As Range
    Const tempFile As String = "c:\temp\msword.html"
    ' For each cell in the range ...
    For Each cell In rng
        If cell.Value <> "" Then
            ' ... copy it into the Word document ...
            cell.Copy
            d.Range.PasteSpecial xlPasteFormats
            ' ... save the Word document as HTML
            ' in a temporary file ...
            d.SaveAs2 tempFile, wdFormatHTML
            ' ... and extract the first paragraph.
            cell.Value = extractFirstParagraph(tempFile)
            Debug.Print "Cell " & cell.Address & " done."
        End If
    Next cell

    ' Close Word once you're done. Note that if a bug
    ' is encountered, this cleanup won't occur and the 
    ' Word process will need to be killed to release
    ' file locks, otherwise you get an unhelpful error.
    w.Quit False
End Sub

您可以通过添加对Microsoft VBScript正则表达式5.5的引用并运行如下函数来使用正则表达式清理输出：

' Used to avoid duplication in cleanWordHtml.
Function eraseInPlace(ByRef r As RegExp, _
    ByRef s As String, p As String) As String
    r.Pattern = p
    s = r.Replace(s, "")
End Function

' Eliminate junk tags from HTML generated by Word.
Function cleanWordHtml(inputString As String)
    Dim r As New RegExp
    r.Global = True
    eraseInPlace r, inputString, "mso-[^;""]*(; )?"
    eraseInPlace r, inputString, " style="""""
    eraseInPlace r, inputString, "<\?xml[^>]*>"
    eraseInPlace r, inputString, "<\/?o:[^>]*>"
    eraseInPlace r, inputString, "<SPAN><\/SPAN>"
    cleanWordHtml = inputString
End Function

如果您需要将<span>标签转换为<font>标签（我也需要这样做，因为我正在导入不支持CSS的Access RT文本字段），请尝试调用此方法函数并传入在extractFirstParagraph函数中构造的MSHTML对象：

' Given a <p> DOM node, replace any children of the
' form <span style="color: foo"> with <font color="foo">.
Function convertSpanToFont(ByRef par As MSHTML.HTMLGenericElement, _
    ByRef doc As MSHTML.HTMLDocument)
    Dim span As MSHTML.HTMLSpanElement, _
        font As MSHTML.HTMLFontElement
    For Each span In par.getElementsByTagName("span")
        Set font = doc.createElement("font")
        If IsNull(span.Style.Color) _
            Or span.Style.Color <> "" Then
            font.Color = span.Style.Color
            font.innerHTML = span.innerHTML
            span.insertAdjacentElement "afterEnd", font
            span.removeNode True
        End If
    Next span
End Function

我还考虑过将整个电子表格从Excel中保存为HTML，然后使用另一种工具将其保存为Access可以处理的格式，但是Excel的HTML导出会生成CSS类，而不是内联样式。如果仅需要将电子表格的一部分转换为HTML，则此方法也很有用。

Answer 3

实际上有一个使用 Excel 的内部 XML 表示并使用 XSL 样式表对其进行转换的更简洁、更快速的解决方案。

您可以在 https://github.com/HeimMatthias/Excel-HTML-Tools-Public

上找到 VBA 代码和所需的 XSLT

免责声明： 在使用速度缓慢的 VBA 脚本多年后，我自己编写了这个工具。示例输出可以在此 fiddle 中看到。

有关技术细节：

Range.Value-Property 可以接受一个参数 xlRangeValueXMLSpreadsheet，它返回一个完全格式化的 xml-String，其中包括其内容的 html-object。在包含

的单元格上运行ActiveSheet.Range("A1").Value(xlRangeValueXMLSpreadsheet) <块引用>

Quercus agrifolia 变种。 oxyadenia (Torr.) J.T.豪尔

返回以下字符串：

<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
 xmlns:o="urn:schemas-microsoft-com:office:office"
 xmlns:x="urn:schemas-microsoft-com:office:excel"
 xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
 xmlns:html="http://www.w3.org/TR/REC-html40">
 <Styles>
  <Style ss:ID="Default" ss:Name="Normal">
   <Alignment ss:Vertical="Bottom"/>
   <Borders/>
   <Font ss:FontName="Calibri" ss:Size="11" ss:Color="#000000"/>
   <Interior/>
   <NumberFormat/>
   <Protection/>
  </Style>
  <Style ss:ID="s62">
   <Font ss:FontName="Calibri" ss:Size="11" ss:Color="#000000"
    ss:Italic="1"/>
  </Style>
 </Styles>
 <Worksheet ss:Name="Tabelle1">
  <Table ss:ExpandedColumnCount="1" ss:ExpandedRowCount="1"
   ss:DefaultColumnWidth="61.714285714285708"
   ss:DefaultRowHeight="14.571428571428571">
   <Row>
    <Cell ss:StyleID="s62"><ss:Data ss:Type="String"
      xmlns="http://www.w3.org/TR/REC-html40"><I><Font html:Color="#000000">Quercus agrifolia</Font></I><Font
       html:Color="#000000"> var. </Font><I><Font html:Color="#000000">oxyadenia</Font></I><Font
       html:Color="#000000"> (Torr.) J.T. Howell</Font></ss:Data></Cell>
   </Row>
  </Table>
 </Worksheet>
</Workbook>

Cell-Tag 围绕着一个 ss:Data-Object，其中包含 - 或多或少 - 干净的 html-Code。在任何情况下，清理这些数据以获得干净的 html 比解析每个字母并访问其样式要简单得多，速度也快得多。（快得多，我的意思是快 100 倍）。

注意事项： 单元格样式（来自模板和单个单元格）不表示为 html。这并不总是特别明显。在上面的示例中，单元格已自动接收斜体样式，因为第一个单词已被斜体化。由于字符串中还有罗马（直立）词，斜体段落的代码在 html 中表示。但是如果整个单元格都用斜体表示，那么 <i> 标签就会丢失。这尤其令人困惑，因为如果相应的样式具有 <i> 作为属性，您不能只用 ss:Italic="1"-Tag 包围整个 html，您还需要检查它是否在 html 中包含 -Tags -部分。这当然适用于所有单元格样式。

将Rich Text转换为HTML格式标签

3 个答案: