XML父级和子级到R中的数据帧

时间:2019-03-16 10:48:48

标签: r xml dataframe

我是R的新手,并且是第一次使用XML文件。已经从站点进行了大量发布,以将具有父级和多个子级的xml加载到R中的单个数据帧中。但是,要么行被重复,要么失败。

我要加载的xml文件格式

<?xml version="1.0" encoding="utf-8"?>
<!--File SUMMARY: Total Number of Accounts in file: 2, Total Amount 
Outstanding in file :$30-->
<OCADocument xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xmlns:xsd="http://www.w3.org/2001/XMLSchema" CreationDateTime="2019-03- 
13T04:04:40.6537471-04:00" Jurisdiction="PUNE" 
xmlns="www.oesc.ca/XMLSchema">
<Consumer ConsumerOID="11">
<ConsumerInformation>
  <ConsumerLocationType>Residential</ConsumerLocationType>
</ConsumerInformation>
<BillingContact>
  <NameInformation>
    <LastName>ALIAS</LastName>
    <FirstName>Jibson</FirstName>
  </NameInformation>
  <AddressInformation AddressType="BillingAddress">
    <Address1>4608 32 abc</Address1>
    <City>pune</City>
    <StateProvince>MH</StateProvince>
    <PostalZip>411017</PostalZip>
    <CountryCode>CA</CountryCode>
  </AddressInformation>
  <PhoneInformation Type="Main">
    <PhoneNumber>985623147</PhoneNumber>
  </PhoneInformation>
  <PhoneInformation Type="Home">
    <PhoneNumber>785412369</PhoneNumber>
  </PhoneInformation>
  <CreditInformation>
    <CreditStatus>ACC</CreditStatus>
    <CreditScore>00001</CreditScore>
  </CreditInformation>
</BillingContact>
<SigningContact>
  <NameInformation>
    <FullName>Jibson ALIAS</FullName>
  </NameInformation>
  <AddressInformation AddressType="BillingAddress">
    <Address1>4608 32 abc</Address1>
    <City>pune</City>
    <StateProvince>MH</StateProvince>
    <PostalZip>411017</PostalZip>
    <CountryCode>CA</CountryCode>
  </AddressInformation>
</SigningContact>
<Site SiteOID="400002" MarketParticipant="ANORTH">
  <AccountID>
    <LDCAccountNumber>000167</LDCAccountNumber>
  </AccountID>
  <SiteAddress>
    <Address1>4608 32 abc</Address1>
    <City>pune</City>
    <StateProvince>MH</StateProvince>
    <PostalZip>411017</PostalZip>
    <CountryCode>CA</CountryCode>
  </SiteAddress>
  <PaymentPlan>
    <ESGPaymentPlan>None</ESGPaymentPlan>
  </PaymentPlan>
  <Contract ContractOID="10000002">
    <RtlrContractIdentifier>80000153</RtlrContractIdentifier>
    <ContractState>FLW</ContractState>
    <ContractPrice>7.1900000000</ContractPrice>
    <ContractSigningDate>2018-09-23</ContractSigningDate>
    <ContractFlowStartDate>2018-09-29</ContractFlowStartDate>
    <ContractFlowEndDate>2019-09-29</ContractFlowEndDate>
    <LanguagePreference>eng</LanguagePreference>
  </Contract>
  <BillingEntityOID>200214</BillingEntityOID>
</Site>
<Site SiteOID="45647" MarketParticipant="EPCOR">
  <AccountID>
    <LDCAccountNumber>0014587</LDCAccountNumber>
  </AccountID>
  <SiteAddress>
    <Address1>4608 32 abc</Address1>
    <City>pune</City>
    <StateProvince>MH</StateProvince>
    <PostalZip>411017</PostalZip>
    <CountryCode>CA</CountryCode>
  </SiteAddress>
  <PaymentPlan>
    <ESGPaymentPlan>None</ESGPaymentPlan>
  </PaymentPlan>
  <Contract ContractOID="100003">
    <RtlrContractIdentifier>805553</RtlrContractIdentifier>
    <ContractState>FLW</ContractState>
    <ContractPrice>0.45000000</ContractPrice>
    <ContractSigningDate>2018-09-23</ContractSigningDate>
    <ContractFlowStartDate>2018-09-29</ContractFlowStartDate>
    <ContractFlowEndDate>2019-09-29</ContractFlowEndDate>
    <LanguagePreference>eng</LanguagePreference>
  </Contract>
  <BillingEntityOID>200001</BillingEntityOID>
</Site>
<ConsumerBillingEntity BillingEntityOID="200001" BusinessSegmentIdentifier="JEA">
  <ExitFeesOutstanding>0</ExitFeesOutstanding>
  <EstimateExitFees>0</EstimateExitFees>
  <OldestOutstandingReceivMHlesDueDate>2019-01-11</OldestOutstandingReceivMHlesDueDate>
  <Outstanding_Current>264.24</Outstanding_Current>
  <Outstanding_1to30>221.19</Outstanding_1to30>
  <Outstanding_31to60>0</Outstanding_31to60>
  <Outstanding_61to90>143.66</Outstanding_61to90>
  <Outstanding_91to120>0</Outstanding_91to120>
  <Outstanding_121to150>0</Outstanding_121to150>
  <Outstanding_151to180>0</Outstanding_151to180>
  <Outstanding_181plus>0</Outstanding_181plus>
  <LastPaymentAmount>200.00</LastPaymentAmount>
  <LastPaymentDate>2018-12-24T00:00:00</LastPaymentDate>
  <CustomerAccountNo>904511110</CustomerAccountNo>
  <PaymentScoreInformation>
    <PaymentScore>0</PaymentScore>
  </PaymentScoreInformation>
</ConsumerBillingEntity>
</Consumer>
<Consumer ConsumerOID="421">
<ConsumerInformation>
  <ConsumerLocationType>Residential</ConsumerLocationType>
</ConsumerInformation>
<BillingContact>
  <NameInformation>
    <LastName>BORUDE</LastName>
    <FirstName>GANESH</FirstName>
  </NameInformation>
  <AddressInformation AddressType="BillingAddress">
    <Address1>1420 Pimpri</Address1>
    <City>PUNE</City>
    <StateProvince>MH</StateProvince>
    <PostalZip>41018</PostalZip>
    <CountryCode>CA</CountryCode>
  </AddressInformation>
  <PhoneInformation Type="Main">
    <PhoneNumber>4034789652</PhoneNumber>
  </PhoneInformation>
  <EMailInformation>
    <EMailAddress>abc@gmail.com</EMailAddress>
  </EMailInformation>
  <CreditInformation>
    <CreditStatus>ACC</CreditStatus>
    <CreditScore>00111</CreditScore>
  </CreditInformation>
</BillingContact>
<SigningContact>
  <NameInformation>
    <FullName>GANESH BORUDE</FullName>
  </NameInformation>
  <AddressInformation AddressType="BillingAddress">
    <Address1>1420 Pimpri</Address1>
    <City>PUNE</City>
    <StateProvince>MH</StateProvince>
    <PostalZip>41018</PostalZip>
    <CountryCode>CA</CountryCode>
  </AddressInformation>
</SigningContact>
<Site SiteOID="58796" MarketParticipant="ATCOSouth">
  <AccountID>
    <LDCAccountNumber>0000000416</LDCAccountNumber>
  </AccountID>
  <SiteAddress>
    <Address1>1420 Pimpri</Address1>
    <City>PUNE</City>
    <StateProvince>MH</StateProvince>
    <PostalZip>41018</PostalZip>
    <CountryCode>CA</CountryCode>
  </SiteAddress>
  <PaymentPlan>
    <ESGPaymentPlan>None</ESGPaymentPlan>
  </PaymentPlan>
  <Contract ContractOID="5400006">
    <RtlrContractIdentifier>1000016700</RtlrContractIdentifier>
    <ContractState>XXX</ContractState>
    <ContractPrice>8.8900000000</ContractPrice>
    <ContractSigningDate>2009-09-23</ContractSigningDate>
    <LanguagePreference>eng</LanguagePreference>
  </Contract>
  <BillingEntityOID>3000415</BillingEntityOID>
</Site>
<Site SiteOID="44789653" MarketParticipant="ENMAX">
  <AccountID>
    <LDCAccountNumber>0020003657650</LDCAccountNumber>
  </AccountID>
  <SiteAddress>
    <Address1>1420 Pimpri</Address1>
    <City>PUNE</City>
    <StateProvince>MH</StateProvince>
    <PostalZip>41018</PostalZip>
    <CountryCode>CA</CountryCode>
  </SiteAddress>
  <PaymentPlan>
    <ESGPaymentPlan>None</ESGPaymentPlan>
  </PaymentPlan>
  <Contract ContractOID="140605">
    <RtlrContractIdentifier>80010822</RtlrContractIdentifier>
    <ContractState>FLW</ContractState>
    <ContractPrice>0.0949000000</ContractPrice>
    <ContractSigningDate>2018-10-02</ContractSigningDate>
    <ContractFlowStartDate>2018-10-08</ContractFlowStartDate>
    <ContractFlowEndDate>2019-10-08</ContractFlowEndDate>
    <LanguagePreference>eng</LanguagePreference>
  </Contract>
  <BillingEntityOID>300045</BillingEntityOID>
</Site>
<ConsumerBillingEntity BillingEntityOID="300045" BusinessSegmentIdentifier="JEALP">
  <ExitFeesOutstanding>0</ExitFeesOutstanding>
  <EstimateExitFees>0</EstimateExitFees>
  <OldestOutstandingReceivMHlesDueDate>2019-01-24</OldestOutstandingReceivMHlesDueDate>
  <Outstanding_Current>121.47</Outstanding_Current>
  <Outstanding_1to30>121.94</Outstanding_1to30>
  <Outstanding_31to60>98.29</Outstanding_31to60>
  <Outstanding_61to90>0</Outstanding_61to90>
  <Outstanding_91to120>0</Outstanding_91to120>
  <Outstanding_121to150>0</Outstanding_121to150>
  <Outstanding_151to180>0</Outstanding_151to180>
  <Outstanding_181plus>0</Outstanding_181plus>
  <LastPaymentAmount>100.00</LastPaymentAmount>
  <LastPaymentDate>2019-03-01T00:00:00</LastPaymentDate>
  <CustomerAccountNo>982</CustomerAccountNo>
  <PaymentScoreInformation>
    <PaymentScore>15</PaymentScore>
  </PaymentScoreInformation>
</ConsumerBillingEntity>
</Consumer>
</OCADocument>

方法1

library(XML)
library(methods)

xml_raw<-xmlParse("test.xml",useInternalNodes = TRUE)

xml_data<-xmlToDataFrame(xml_raw)

我收到的错误消息如下所示

[<-.data.frame*tmp*中的错误,i,名称(nodes [[i]]),值= c(“ Residential” ,:列的下标重复

方法2

 df<-ldply(xmlToList(xml_raw), function(x) { data.frame(x) } )

这会将文件加载到数据帧,但重复同一事务两次以上。附加输出屏幕截图。

我希望有一排的ConsumerOID包含所有详细信息,并且不再重复。它应该将所有父节点和子节点作为数据帧中的一列。

Output screenshot

0 个答案:

没有答案