导入包含<>的文本文件角括号分为R形

时间:2015-10-11 16:49:12

标签: r import gsub

我的数据文件有http://kavita-ganesan.com/opinosis-opinion-dataset的尖括号。

<DOCNO>2007_acura_mdx</DOCNO>
<DOC>
<DATE>07/31/2009</DATE>
<AUTHOR>FlewByU</AUTHOR>
<TEXT>I just moved to Germany two months ago and bought an 07 MDX from another military member. It has everything I could want. We just returned from a week driving through the Alps and this SUV is simply amazing. Granted, I get to drive it much faster than I could in the states, but even at 120 MPH, it was rock solid. We need the AWD for the snow and the kids stay entertained with the AV system. Plenty of passing power and very comfortable on long trips. Acuras are rare in Germany and I get stares all the time by curious Bavarians wondering what kind of vehicle I have. If you are in the market for a luxury SUV for family touring, with cool tech toys to play with, MDX can't be beat. </TEXT>
<FAVORITE>The separate controls for the rear passengers are awesome. I can control temp and AV from the front or switch to rear. Sound system is amazing. I will sometimes sit in the driveway and just listen. Also has a 120v outlet in console. Great for us since we live with 220v and need 120 on occasion. </FAVORITE>
</DOC>
<DOC>
<DATE>07/30/2009</DATE>
<AUTHOR>cvillemdx</AUTHOR>
<TEXT>After months of careful research and test drives at BMW, Lexus, Volvo, etc. I settled on the MDX without a doubt in mind. I love the way the car handles, no stiffness or resistance in the steering or acceleration. The interior design is a little Star Trek for me, but once I figured everything out, it is a pleasure to have all the extras (XM radio, navigation, Bluetooth, backup camera, etc.)</TEXT>
<FAVORITE>The self-adjusting side mirrors which rotate to give you a view of the curb/lines as you back up. Makes backing into parking spaces and parallel parking a breeze, along with the back-up camera. Also a fan of the push-to-talk for my cell phone.</FAVORITE>
</DOC>
<DOC>
<DATE>06/22/2009</DATE>
<AUTHOR>Pleased</AUTHOR>
<TEXT>I'm two years into a three year lease and I love this car. The only thing I would change would be the shape of the grill...THAT'S IT. Everything else is perfect. Great performance, plenty of power and AWD when skiing, plenty of room for baggage, great MPG for an SUV, navi system is far superior to GM's Suburban (don't have to put in park to change your destination, etc). Zero problems...just gas and oil changes. One beautiful car...except for the sho-gun shield looking grill.</TEXT>
<FAVORITE>Navi is easy, hands-free is great, AWD is perfect.</FAVORITE>
</DOC>

它似乎是一个XML文件,但是当我尝试

xml.url <- "2007_acura_mdx"
xmlfile <- xmlTreeParse(xml.url)
class(xmlfile)
xmltop <- xmlRoot(xmlfile)
topxml <- xmlSApply(xmltop, function(x) xmlSApply(x, xmlValue))
xml_df <- data.frame(t(topxml), row.names=NULL)

我执行data.frame时遇到了问题。谁能帮我?此刻我想使用grep()`` and gsub(),但这也不容易。

1 个答案:

答案 0 :(得分:1)

试试这个:

txt <- "<DOCNO>2007_acura_mdx</DOCNO>
<DOC>
<DATE>07/31/2009</DATE>
<AUTHOR>FlewByU</AUTHOR>
<TEXT>I just moved to Germany two months ago and bought an 07 MDX from another military member. It has everything I could want. We just returned from a week driving through the Alps and this SUV is simply amazing. Granted, I get to drive it much faster than I could in the states, but even at 120 MPH, it was rock solid. We need the AWD for the snow and the kids stay entertained with the AV system. Plenty of passing power and very comfortable on long trips. Acuras are rare in Germany and I get stares all the time by curious Bavarians wondering what kind of vehicle I have. If you are in the market for a luxury SUV for family touring, with cool tech toys to play with, MDX can't be beat. </TEXT>
<FAVORITE>The separate controls for the rear passengers are awesome. I can control temp and AV from the front or switch to rear. Sound system is amazing. I will sometimes sit in the driveway and just listen. Also has a 120v outlet in console. Great for us since we live with 220v and need 120 on occasion. </FAVORITE>
</DOC>
<DOC>
<DATE>07/30/2009</DATE>
<AUTHOR>cvillemdx</AUTHOR>
<TEXT>After months of careful research and test drives at BMW, Lexus, Volvo, etc. I settled on the MDX without a doubt in mind. I love the way the car handles, no stiffness or resistance in the steering or acceleration. The interior design is a little Star Trek for me, but once I figured everything out, it is a pleasure to have all the extras (XM radio, navigation, Bluetooth, backup camera, etc.)</TEXT>
<FAVORITE>The self-adjusting side mirrors which rotate to give you a view of the curb/lines as you back up. Makes backing into parking spaces and parallel parking a breeze, along with the back-up camera. Also a fan of the push-to-talk for my cell phone.</FAVORITE>
</DOC>
<DOC>
<DATE>06/22/2009</DATE>
<AUTHOR>Pleased</AUTHOR>
<TEXT>I'm two years into a three year lease and I love this car. The only thing I would change would be the shape of the grill...THAT'S IT. Everything else is perfect. Great performance, plenty of power and AWD when skiing, plenty of room for baggage, great MPG for an SUV, navi system is far superior to GM's Suburban (don't have to put in park to change your destination, etc). Zero problems...just gas and oil changes. One beautiful car...except for the sho-gun shield looking grill.</TEXT>
<FAVORITE>Navi is easy, hands-free is great, AWD is perfect.</FAVORITE>
</DOC>"

library(XML)
txt2 <- paste("<root>", txt, "</root>")
doc <- xmlTreeParse(txt2, asText = TRUE, useInternalNodes = TRUE)
L <- xpathApply(doc, "//DOC", xmlApply, FUN = xmlValue)
dd <- do.call(rbind, lapply(L, as.data.frame, stringsAsFactors = FALSE))

,并提供:

> str(dd)
'data.frame':   3 obs. of  4 variables:
 $ DATE    : chr  "07/31/2009" "07/30/2009" "06/22/2009"
 $ AUTHOR  : chr  "FlewByU" "cvillemdx" "Pleased"
 $ TEXT    : chr  "I just moved to Germany two months ago and bought an 07 MDX from another military member. It has everything I could want. We ju"| __truncated__ "After months of careful research and test drives at BMW, Lexus, Volvo, etc. I settled on the MDX without a doubt in mind. I lov"| __truncated__ "I'm two years into a three year lease and I love this car. The only thing I would change would be the shape of the grill...THAT"| __truncated__
 $ FAVORITE: chr  "The separate controls for the rear passengers are awesome. I can control temp and AV from the front or switch to rear. Sound sy"| __truncated__ "The self-adjusting side mirrors which rotate to give you a view of the curb/lines as you back up. Makes backing into parking sp"| __truncated__ "Navi is easy, hands-free is great, AWD is perfect."
相关问题