从格式不正确的XML创建格式良好的XML

时间:2010-07-15 13:17:26

标签: xml ajax jquery

我必须使用从Web服务返回的以下XML:

<?xml version="1.0" encoding="utf-8" ?>
<root>
  <staticPage>
    <liStaticPageID>6165</liStaticPageID>
    <sTitle>Ethylene</sTitle>
    <sPageURL>Ethylene.htm</sPageURL>
    <sBody>
      <P>Ethylene is a colourless, odourless, extremely flammable compressed gas. It is slightly soluble in water and soluble in liquid hydrocarbons. It reacts with strong oxidants causing fire and explosion hazard. It may polymerise to form aromatic compounds under the influence of temperatures above 600°C.</P>
      <BR>
        <P>Around 59% of the world’s ethylene demand is consumed in polyethylene production. Other major derivatives are ethylene oxide/glycol (13%), ethylene dichloride/vinyl chloride monomer (13%) and ethyl benzene/styrene (6%), with other uses such as acetaldehyde, alpha-olefins, ethylene-propylene elastomers and vinyl acetate representing around 9% of demand. </P>
        <BR>
          <P>Although ethylene gas poses no risk to skin or eyes, the ethylene liquid can cause frostbite. Ethylene is a dangerous fire and explosion hazard. Exposure to ethylene occurs through inhalation, from leaks, spills, accidents, and cigarette smoke. While ethylene gas is invaluable due to its ability to initiate the ripening process in several fruits, it can also be very harmful to many fruits, vegetables, flowers and plants by accelerating the ageing process and decreasing the product quality and shelf life.</P>
          <BR>
            <P>ICIS pricing quotes ethylene in Europe, Asia-Pacific and the US Gulf. </P>
            <BR>
              <P>Frequency:</P>
              <BR>
                <P>Published weekly on Fridays and an Ethylene Daily (Asia) report is published Mondays-Fridays.</P>
                <P>Real time Price Alert Service (PAS) delivering market news and trends throughout the day. </P>
                <BR>
                  <P>Ethylene (EUROPE)</P>
                  <BR>
                    <P>Weekly Price Assessments:</P>
                    <BR>
                      <P>Ethylene Contract Prices</P>
                      <BR>
                        <P>FD NWE quarterly (EUR/MT &amp; conversion to US CTS/LB) </P>
                        <P>FD NWE monthly (EUR/MT &amp; conversion to US CTS/LB)</P>
                        <BR>
                          <P>Ethylene Spot Prices</P>
                          <BR>
                            <P>FD NWE PIPELINE (EUR/MT &amp; conversion to US CTS/LB) </P>
                            <P>CIF NWE (EUR/MT &amp; conversion to US CTS/LB) </P>
                            <P>CIF MED (EUR/MT &amp; conversion to US CTS/LB)</P>
                            <BR>
                              <P>Feedstock – Naphtha Spot Prices </P>
                              <BR>
                                <P>CIF NWE (USD/MT)</P>
                                <BR>
                                  <P>Ethylene (ASIA-PACIFIC)</P>
                                  <BR>
                                    <BR>
                                      <BR>
                                        <P>Daily and Weekly Price Assessments:</P>
                                        <BR>
                                          <P>Ethylene Daily Spot Prices</P>
                                          <BR>
                                            <P>CFR N.E.Asia (USD/MT &amp; conversion to US CTS/LB</P>
                                            <P>CFR S.E.Asia (USD/MT &amp; conversion to US CTS/LB)</P>
                                            <BR>
                                              <P>Weekly Price Assessments:</P>
                                              <BR>
                                                <P>FOB KOREA (USD/MT &amp; conversion to US CTS/LB) </P>
                                                <P>CFR N.E.ASIA (USD/MT &amp; conversion to US CTS/LB) </P>
                                                <P>CFF S.E.ASIA (USD/MT &amp; conversion to US CTS/LB)</P>
                                                <P>Feedstock – Naphtha Spot Prices </P>
                                                <BR>
                                                  <P>CFR Japan (USD/MT)</P>
                                                  <BR>
                                                    <P>Ethylene (US GULF)</P>
                                                    <BR>
                                                      <P>Weekly Price Assessments:</P>
                                                      <BR>
                                                        <P>Ethylene Net Contract Prices (FD):</P>
                                                        <BR>
                                                          <P>Pipeline monthly (US CTS/LB &amp; conversion to USD/MT)</P>
                                                          <BR>
                                                            <P>Ethylene Spot Prices (FD)</P>
                                                            <BR>
                                                              <P>Pipeline weekly (US CTS/LB &amp; conversion to USD/MT)</P>
                                                              <BR>
                                                                <P>Feedstock – Naphtha Spot Prices </P>
                                                                <BR>
                                                                  <P>DEL USG PARAFFINIC (USD/MT)</P>
                                                                  <BR>
                                                                    <P>General Information:</P>
                                                                    <BR>
                                                                      <P>Assessment window: Price assessments are based on information supplied by market participants through the week up to close of business on Fridays at 1800 hours in London, Singapore and Houston.</P>
                                                                      <BR>
                                                                        <P>Daily assessments are based on information gathered throughout the day up to the close of business at 1730 hours in Singapore.</P>
                                                                        <BR>
                                                                          <P>Specifications: Price quotes are provided on the basis of product of 99.9% purity. The European FD PIPELINE quote is for ARG specification. </P>
                                                                          <BR>
                                                                            <P>Timing: In Asia and Europe, business is usually concluded within a six week forward delivery window from date of publication. However, given arbitrage movements, a maximum forward delivery window of 60 days applies for the quotations. In the US, contract prices are tied to the delivery month referenced next to the price. US spot prices are quoted for one-to-two weeks out.</P>
                                                                            <BR>
                                                                              <P>Terms: 30-90 days after bill of lading date.</P>
                                                                              <BR>
                                                                                <P>Standard cargo size: Typical cargo sizes in Asia range from 2,300 to 3,000 tonnes while product from the Middle East ranges from 4,000-5,000 tonnes. Typical European cargo sizes range between 2,000 and 5,000 tonnes. US domestic deliveries are typically sold in 5-10 million lb parcels via pipeline. Imported cargo sizes can be up to 4,000-5,000 tonnes.</P>
                                                                                <BR>
                                                                                  <P>Assessment basis: ICIS pricing ethylene price assessments are based on information gathered throughout the week from producers, traders, end-consumers and the shipping market. The assessment takes into consideration: confirmed deals, reported deals, firm offers and bids, buy and sell indications, and rumoured deals. </P>
                                                                                  <BR>
                                                                                    <P>All efforts are made to confirm pricing levels with the respective buyer and seller before price assessments are adjusted. In the absence of confirmation and/or trades, price ranges may be adjusted at the discretion of the editor on a notional basis to better reflect levels at which trading activity could take place. Consideration is also given to all factors potentially influencing the price of ethylene at any given time, including supply/demand information; feedstock prices and derivative market prices.</P>
                                                                                    <BR>
                                                                                      <P>In Europe, contract prices are fixed on both a quarterly and a monthly basis. Monthly quotes were first introduced in January 2009. The bi-monthly contract quote was discontinued in Q4 2008. Contracts are negotiated between producers and consumers. </P>
                                                                                      <BR>
                                                                                        <P>It is understood that ICIS pricing price assessments are often used as a benchmark for spot/contract ethylene trades, on an ICIS pricing average +/- alpha basis. As a result special emphasis is given to ensuring that the ICIS pricing average spot price is a number that can be readily agreed upon by as wide a cross-section of the market place as possible. </P>
                                                                                        <BR>
                                                                                          <P>Netback calculations (i.e CFR prices derived from FOB numbers + freight) are not usually considered sufficient to warrant an automatic adjustment of CFR assessments on the basis of open market freights. The use of COA vessels in Asia-Pacific and the need for employment can lead to below-market freight components to apply. ICIS pricing prefers to adjust assessments on a like-for-like basis, CFR for CFR, FOB for FOB. Similarly southeast Asian price assessments are not adjusted on northeast Asian prices + freight component, or vice versa. </P>
                                                                                          <BR>
                                                                                            <P>
                                                                                              The Asia-Pacific report focuses on the regional spot market, however information on domestic contract pricing and prevailing formulae is carried in the text, where details are available. Northeast Asia comprises <?xml:namespace prefix = st1 /><st1:country-region w:st="on">Japan</st1:country-region>, <st1:country-region w:st="on">Korea</st1:country-region>, <st1:country-region w:st="on">Taiwan</st1:country-region> and <st1:country-region w:st="on">China</st1:country-region>, while southeast Asia comprises the <st1:country-region w:st="on">Philippines</st1:country-region>, <st1:country-region w:st="on">Thailand</st1:country-region>, <st1:country-region w:st="on">Malaysia</st1:country-region>, <st1:country-region w:st="on">Singapore</st1:country-region> and <st1:country-region w:st="on"><st1:place w:st="on">Indonesia</st1:place></st1:country-region>. </P><BR> <P>The Ethylene Daily (Asia) report covers spot deals on a CFR N.E.Asia and CFR S.E.Asia basis. The assessment takes into account deals, bids and offers and price ideas heard throughout the day. It also includes cracker production updates.</P><BR> <P>In the <st1:country-region w:st="on"><st1:place w:st="on">US</st1:place></st1:country-region>, the net contract price usually settles at the end of the month listed. US ethylene spot prices are on a free-delivered (FD) basis and represent confirmed business, bid/offer levels or general sentiment.</P><BR><BR></sBody> 
  <liNavigationItemID>1</liNavigationItemID> 
  <uiEnteredByID /> 
  <sEnteredBy /> 
  <dtEntered>07/03/2007 08:51:13</dtEntered> 
  <uiLastModifiedByID>641d1389-710f-42c6-8c10-38a2105f5149</uiLastModifiedByID> 
  <sLastModifiedBy>Barbara Ortner</sLastModifiedBy> 
  <dtLastModified>21/07/2009 16:06:17</dtLastModified> 
  <dtApproved>21/07/2009 16:06:20</dtApproved> 
  <uiApprovedByID>641d1389-710f-42c6-8c10-38a2105f5149</uiApprovedByID> 
  <sApprovedBy>Barbara Ortner</sApprovedBy> 
  <bLive>1</bLive> 
  <liVersionNo>11</liVersionNo> 
  <sMetaDescription /> 
  <sMetaKeywords /> 
  <sPageTitle>Ethylene Methodology ICIS pricing</sPageTitle> 
  </staticPage>
  </root>

然而,jQuery AJAX调用失败,因为XML文档格式不正确。

对XML不熟悉我不知道在进行AJAX调用之前如何处理XML文档以使其形成良好。我已手动编辑它并设法将数据检索到页面,但显然需要自动化。

5 个答案:

答案 0 :(得分:6)

您应该联系Web服务的创建者并告诉他们他们提供的服务不是有效的XML,尽管它被认为是有效的。

答案 1 :(得分:4)

你甚至可以远程实现这一点的唯一方法是:

  • 将输出作为字符串
  • 抓取
  • 使用解析或正则表达式,在<sBody></sBody>
  • 之间提取所有内容
  • 将该blob文本放回<sBody>部分内的<![CDATA[........]]]]>标记之间

这样,你至少可以把那个烂摊子解析成一个有效的XML - 对CDATA里面的内容做不了多少,我害怕......

答案 2 :(得分:2)

我将您的样本保存到文件/tmp/nwf.xml并运行

xmllint /tmp/xml

这会返回一个很好的错误列表。一种可能的方法是通过几个基于正则表达式的替换来过滤它们的输出,直到结果是有效的XML(通过xmllint运行它再次检查),然后继续使用常规XML处理进行处理。

我也跑了这个:

xmllint -html /tmp.xml

它接受结果,返回有效的XML文档。所以第二种方法是通过xmllint -html过滤文本。 (为此你不一定需要调用xmllint命令行工具;它基于libxml2,many programming languages have bindings to,虽然我怀疑JavaScript是否在其中,但你可以编写自己的服务器端过滤器这样做并用AJAX调用它。)

其他回复者是对的:解决这个问题确实不应该是你的工作。

答案 3 :(得分:0)

只想添加,我使用Solaris和Linux,但 xmllint 仅在Linux上可用(至少对我来说)。 这是一个很好的工具,你只需要做一个预运行检查或向提供商证明它的结构很糟糕。

xmllint filename --noout

这将仅返回错误,这使事情变得更容易。例如。我有这样的错误

  

101:警告:xmlParsePITarget:无效的名称前缀'xml'   文本,详细信息可用。东北亚包括&lt; ?XML:命名空间   ...

答案 4 :(得分:0)

作为kluge,您可以将文件作为文本文档读取,并使用字符串库例程来替换&lt; BR&gt;。与&lt; BR /&gt;。问题是您正在从提供商那里收到遗留的HTML,以及一些旧的遗留代码,例如&lt; BR&gt;在XML中无效,因为代码必须配对或表示为&lt; BR /&gt;这是简短形式或&lt; BR&gt;&lt; / BR&gt;。