Question

我需要从HTML文件中检索一些值。我需要使用Ant，所以我可以在脚本的其他部分使用这些值。

这甚至可以在Ant中实现吗？

Answer 1

如其他答案中所述，您无法在“纯”XML中执行此操作。你需要嵌入一种编程语言。我个人最喜欢的是Groovy，integration with ANT很棒。

以下是从groovy主页中检索徽标网址的示例：

parse:

print:
     [echo] 
     [echo]         Logo URL: http://groovy.codehaus.org/images/groovy-logo-medium.png
     [echo]

的build.xml

Build使用ivy plug-in来检索所有第三方依赖项。

<project name="demo" default="print" xmlns:ivy="antlib:org.apache.ivy.ant">

    <target name="resolve">
        <ivy:resolve/>
        <ivy:cachepath pathid="build.path" conf="build"/>
    </target>

    <target name="parse" depends="resolve">
        <taskdef name="groovy" classname="org.codehaus.groovy.ant.Groovy" classpathref="build.path"/>

        <groovy>
        import org.htmlcleaner.*

        def address = 'http://groovy.codehaus.org/'

        // Clean any messy HTML
        def cleaner = new HtmlCleaner()
        def node = cleaner.clean(address.toURL())

        // Convert from HTML to XML
        def props = cleaner.getProperties()
        def serializer = new SimpleXmlSerializer(props)
        def xml = serializer.getXmlAsString(node)

        // Parse the XML into a document we can work with
        def page = new XmlSlurper(false,false).parseText(xml)

        // Retrieve the logo URL
        properties["logo"] = page.body.div[0].div[1].div[0].div[0].div[0].img.@src
        </groovy>
    </target>

    <target name="print" depends="parse">
        <echo>
        Logo URL: ${logo}
        </echo>
    </target>

</project>

解析逻辑是纯粹的groovy编程。我喜欢你轻松浏览页面DOM树的方式：

// Retrieve the logo URL
properties["logo"] = page.body.div[0].div[1].div[0].div[0].div[0].img.@src

的ivy.xml

Ivy与Maven类似。它管理您对第三方软件的依赖性。在这里它被用来拉下groovy和groovy逻辑正在使用的HTMLCleaner库：

<ivy-module version="2.0">
    <info organisation="org.myspotontheweb" module="demo"/>
    <configurations defaultconfmapping="build->default">
        <conf name="build" description="ANT tasks"/>
    </configurations>
    <dependencies>
        <dependency org="org.codehaus.groovy" name="groovy-all" rev="1.8.2"/>
        <dependency org="net.sourceforge.htmlcleaner" name="htmlcleaner" rev="2.2"/>
    </dependencies>
</ivy-module>

如何安装常春藤

Ivy是一个标准的ANT插件。下载它的jar并将其放在以下目录之一中：

$HOME/.ant/lib
$ANT_HOME/lib

我不知道为什么ANT项目不附带常春藤。

Answer 2

是的，这很有可能。

请注意，要使用此解决方案，您需要将JAVA_HOME变量设置为JRE 1.6或更高版本。

<project name="extractElement" default="test">
<!--Extract element from html file-->
<scriptdef name="findelement" language="javascript">
     <attribute name="tag" />
     <attribute name="file" />
     <attribute name="property" />
     <![CDATA[
       var tag = attributes.get("tag");
       var file = attributes.get("file");
       var regex = "<" + tag + "[^>]*>(.*?)</" + tag + ">";
       var patt = new RegExp(regex,"g");
       project.setProperty(attributes.get("property"), patt.exec(file));
     ]]>
</scriptdef>

<!--Only available target...-->
<target name="test">
    <!--Load html file into property-->
    <loadfile srcFile="D:\Tools\CruiseControl\Build\artifacts\RECO\20110831100942\RECO_merged_report.html" property="html.file"/>
    <!--Find element with specific tag and save it to property element-->
    <findelement tag="title" file="${html.file}" property="element"/>
    <echo message="File : ${html.file}"/>
    <echo message="Title : ${element}"/>
</target>
</project>

输出：[echo] Title : <title>Test Report</title>,Test Report

由于我不知道您正在寻找的具体变量，这个特定的解决方案将找到您在tag属性中指定的所有元素。当然，您可以修改正则表达式以满足您自己的特定需求。

这也是纯build.xml ant，没有任何外部依赖关系。

Answer 3

当然可以，但你必须自己编写任务。有关为Ant编写自己的任务的更多信息，请访问http://ant.apache.org/manual/develop.html#writingowntask。在Ant任务中，您可以根据需要解析HTML文件。

我声称，用“纯”XML（build.xml）直接实现你想要的东西是不可能的。

Answer 4

查看（http://ant.apache.org/manual/Tasks/xmlproperty.html）任务，看看它是否适合您。这很直接：

<xmlProperty file="${html.file}"
   prefix="html."/>

毕竟，HTML只是XML的一个子集。我之前用它来做这个任务。无需编写自己的任务或脚本。

使用Ant脚本解析HTML

4 个答案:

的build.xml

的ivy.xml

如何安装常春藤