使用NodeJS解析HTML选项列表的最佳方法是什么?

时间:2015-09-08 01:27:41

标签: node.js parsing dom

我正在尝试将以下HTML列表放入更有用的格式中,我想知道是否有人可以推荐最佳实践来解决此问题。我意识到我可能需要在“>”处拆分每个类别的字符串(“>”之后的所有内容都是子类别),但根据HTML的结构方式递归解析它似乎很棘手。任何帮助赞赏。

我想要一个类似于...的数组

 [
    {
        id: 1,
        name: "Antiques",
        subcount: 0,
        subcategory: [
            id: ?,
            name: "Some Subcategory",
            subcount: 1,
            subcategory: [
                id: ?,
                name: "Some Sub Subcategory",
                subcount: 1,
            ]
        ]
    },
    ...
 ]

以下是我要解析的数据示例。

<select id="catid" name="catid">
<option selected="" value="0">All Categories</option>
<option value="1">Antiques</option>
<option value="15">Art</option>
<option value="70">Art &gt; Drawings</option>
<option value="336">Bath &amp; Body</option>
<option value="339">Bath &amp; Body &gt; Beauty Products</option>
<option value="2">Books/Movies/Music</option>
<option value="99">Books/Movies/Music &gt; Books</option>
<option value="132">Books/Movies/Music &gt; Books &gt; Antique/Collectible</option>
<option value="244">Books/Movies/Music &gt; Books &gt; Fiction &gt; Action</option>
<option value="126">Books/Movies/Music &gt; Books &gt; Non-Fiction</option>
<option value="278">Books/Movies/Music &gt; Books &gt; Non-Fiction &gt; Architecture &amp; Design</option>
<option value="257">Books/Movies/Music &gt; Books &gt; Non-Fiction &gt; Art &amp; Photography</option>
<option value="256">Books/Movies/Music &gt; Books &gt; Reference</option>
<option value="75">Books/Movies/Music &gt; Movies</option>
<option value="241">Books/Movies/Music &gt; Movies &gt; Blu-ray Disc</option>
<option value="76">Books/Movies/Music &gt; Movies &gt; DVDs</option>
<option value="79">Books/Movies/Music &gt; Music</option>
<option value="81">Books/Movies/Music &gt; Music &gt; Cassette Tapes</option>
<option value="80">Books/Movies/Music &gt; Music &gt; CDs</option>
<option value="170">Cameras &amp; Camcorders</option>
<option value="173">Cameras &amp; Camcorders &gt; Camcorders</option>
<option value="171">Cameras &amp; Camcorders &gt; Digital Cameras</option>
<option value="172">Cameras &amp; Camcorders &gt; Film Cameras</option>
<option value="10">Clothing</option>
<option value="453">Clothing &gt; Baby</option>
<option value="29">Clothing &gt; Children's Clothing</option>
<option value="166">Clothing &gt; Children's Clothing &gt; Accessories</option>
<option value="346">Clothing &gt; Children's Clothing &gt; Athletic Apparel</option>
<option value="160">Clothing &gt; Children's Clothing &gt; Dresses</option>
<option value="169">Clothing &gt; Eyewear</option>
<option value="28">Clothing &gt; Men's Clothing</option>
<option value="165">Clothing &gt; Men's Clothing &gt; Accessories</option>
<option value="347">Clothing &gt; Men's Clothing &gt; Athletic Apparel</option>
<option value="27">Clothing &gt; Women's Clothing</option>
<option value="164">Clothing &gt; Women's Clothing &gt; Accessories</option>
<option value="348">Clothing &gt; Women's Clothing &gt; Athletic Apparel</option>
<option value="4">Collectibles</option>
<option value="367">Collectibles &gt; Action Figures, Maquettes &amp; Mini Busts</option>
<option value="434">Collectibles &gt; Advertising</option>
<option value="386">Collectibles &gt; Bells</option>
<option value="36">Collectibles &gt; Clocks</option>
<option value="419">Collectibles &gt; Coca-Cola Memorabilia</option>
<option value="378">Collectibles &gt; Coin Banks</option>
<option value="422">Collectibles &gt; Hollywood Memorabilia &gt; Disney</option>
<option value="421">Collectibles &gt; Music Memorabilia &gt; Beatles</option>
<option value="35">Collectibles &gt; Plush Toys</option>
<option value="43">Collectibles &gt; Post Cards/Photographs/Stationery</option>
<option value="326">Collectibles &gt; Salt &amp; Pepper Shakers</option>
<option value="343">Collectibles &gt; Spoons</option>
<option value="390">Collectibles &gt; Sports</option>
<option value="409">Collectibles &gt; Sports &gt; Baseball</option>
<option value="413">Collectibles &gt; Sports &gt; NASCAR</option>
<option value="44">Collectibles &gt; Sports Cards/Trading Cards</option>
<option value="230">Collectibles &gt; Stamps</option>
<option value="232">Collectibles &gt; Trains</option>
<option value="7">Computers &amp; Electronics</option>
<option value="366">Computers &amp; Electronics &gt; Car Audio</option>
<option value="30">Computers &amp; Electronics &gt; Computers</option>
<option value="179">Computers &amp; Electronics &gt; Computers &gt; Accessories</option>
<option value="372">Computers &amp; Electronics &gt; Computers &gt; Accessories &gt; Apple</option>
<option value="452">Computers &amp; Electronics &gt; Computers &gt; Laptops &gt; Parts &amp; Repair</option>
<option value="451">Computers &amp; Electronics &gt; Computers &gt; Laptops &gt; Refurbished</option>
<option value="350">Computers &amp; Electronics &gt; Computers &gt; Networking</option>
<option value="380">Computers &amp; Electronics &gt; Home &amp; Business Phones</option>
<option value="32">Computers &amp; Electronics &gt; Home Electronics</option>
<option value="33">Computers &amp; Electronics &gt; Home Electronics &gt; Gaming Systems &amp; Games</option>
<option value="305">Computers &amp; Electronics &gt; Home Electronics &gt; Gaming Systems &amp; Games &gt; Atari 2600</option>
<option value="306">Computers &amp; Electronics &gt; Home Electronics &gt; Gaming Systems &amp; Games &gt; Atari 5200</option>
<option value="182">Computers &amp; Electronics &gt; Home Electronics &gt; Home Audio/Theater</option>
<option value="403">Computers &amp; Electronics &gt; Home Electronics &gt; Home Audio/Theater &gt; CD Players</option>
<option value="404">Computers &amp; Electronics &gt; Home Electronics &gt; Home Audio/Theater &gt; MiniDisc Players</option>
<option value="180">Computers &amp; Electronics &gt; Home Electronics &gt; TVs</option>
<option value="181">Computers &amp; Electronics &gt; Home Electronics &gt; VCRs/DVDs/DVRs/Blu-ray</option>
<option value="183">Computers &amp; Electronics &gt; Personal Electronics</option>
<option value="185">Computers &amp; Electronics &gt; Personal Electronics &gt; Cell Phones &amp; Accessories</option>
<option value="373">Computers &amp; Electronics &gt; Personal Electronics &gt; Cell Phones &amp; Accessories &gt; Android</option>
<option value="426">Computers &amp; Electronics &gt; Personal Electronics &gt; Digital Photo Frames</option>
<option value="351">Computers &amp; Electronics &gt; Personal Electronics &gt; eBook Readers</option>
<option value="118">Computers &amp; Electronics &gt; Personal Electronics &gt; MP3/CD Players &amp; Accessories</option>
<option value="377">Computers &amp; Electronics &gt; Personal Electronics &gt; MP3/CD Players &amp; Accessories &gt; iPod</option>
<option value="186">Computers &amp; Electronics &gt; Personal Electronics &gt; PDAs</option>
<option value="417">Computers &amp; Electronics &gt; Personal Electronics &gt; Radios</option>
<option value="431">Computers &amp; Electronics &gt; Vintage Electronics</option>
<option value="8">Crafts &amp; Hobbies</option>
<option value="418">Crafts &amp; Hobbies &gt; Models &amp; Model Kits</option>
<option value="446">Crafts &amp; Hobbies &gt; Scouting &amp; Youth Groups</option>
<option value="415">Crafts &amp; Hobbies &gt; Sewing Machines</option>
<option value="195">For The Home</option>
<option value="196">For The Home &gt; Appliances</option>
<option value="197">For The Home &gt; Home Decor</option>
<option value="202">For The Home &gt; Home Decor &gt; Linens/Fabric/Textiles</option>
<option value="441">For The Home &gt; Home Decor &gt; Linens/Fabric/Textiles &gt; Quilts</option>
<option value="203">For The Home &gt; Home Decor &gt; Pottery</option>
<option value="204">For The Home &gt; Home Decor &gt; Silver &amp; Brass</option>
<option value="240">For The Home &gt; Outdoor/Garden</option>
<option value="14">Glass</option>
<option value="331">Glass &gt; Art Glass</option>
<option value="332">Glass &gt; Carnival Glass</option>
<option value="335">Glass &gt; Vintage Glass</option>
<option value="6">Jewelry &amp; Gemstones</option>
<option value="84">Jewelry &amp; Gemstones &gt; Bracelets</option>
<option value="100">Jewelry &amp; Gemstones &gt; Brooches/Pins</option>
<option value="101">Jewelry &amp; Gemstones &gt; Men's Accessories</option>
<option value="438">Jewelry &amp; Gemstones &gt; Men's Accessories &gt; Belt Buckles</option>
<option value="86">Jewelry &amp; Gemstones &gt; Necklaces</option>
<option value="87">Jewelry &amp; Gemstones &gt; Pendants</option>
<option value="353">Jewelry &amp; Gemstones &gt; Precious Metal Scrap</option>
<option value="88">Jewelry &amp; Gemstones &gt; Rings</option>
<option value="89">Jewelry &amp; Gemstones &gt; Watches</option>
<option value="342">Jewelry &amp; Gemstones &gt; Watches &gt; Children's Watches</option>
<option value="113">Miscellaneous</option>
<option value="447">Miscellaneous &gt; Binoculars &amp; Optics</option>
<option value="119">Miscellaneous &gt; Magazines</option>
<option value="379">Miscellaneous &gt; Pocket Knives</option>
<option value="13">Musical Instruments</option>
<option value="192">Musical Instruments &gt; Accessories</option>
<option value="190">Musical Instruments &gt; Brass</option>
<option value="215">Office Supplies</option>
<option value="424">Office Supplies &gt; Typewriters</option>
<option value="34">Pets</option>
<option value="115">Religious Items</option>
<option value="364">Science &amp; Education</option>
<option value="18">Seasonal &amp; Holiday</option>
<option value="12">Sports</option>
<option value="112">Sports &gt; Sporting Equipment</option>
<option value="281">Sports &gt; Sporting Equipment &gt; Baseball</option>
<option value="392">Sports &gt; Sporting Equipment &gt; Camping &amp; Hiking</option>
<option value="20">Tableware and Kitchenware</option>
<option value="47">Tableware and Kitchenware &gt; Barware</option>
<option value="68">Tableware and Kitchenware &gt; Dinnerware</option>
<option value="51">Tableware and Kitchenware &gt; Dinnerware &gt; Bowls</option>
<option value="61">Tableware and Kitchenware &gt; Glassware</option>
<option value="62">Tableware and Kitchenware &gt; Ovenware</option>
<option value="63">Tableware and Kitchenware &gt; Serving Pieces</option>
<option value="384">Tableware and Kitchenware &gt; Serving Pieces &gt; Holloware</option>
<option value="383">Tableware and Kitchenware &gt; Serving Pieces &gt; Trays</option>
<option value="67">Tableware and Kitchenware &gt; Tools &amp; Gadgets</option>
<option value="114">Tools</option>
<option value="9">Toys/Dolls/Games</option>
<option value="109">Toys/Dolls/Games &gt; Dolls</option>
<option value="398">Toys/Dolls/Games &gt; Dolls &gt; Antique</option>
<option value="396">Toys/Dolls/Games &gt; Dolls &gt; Vintage</option>
<option value="226">Toys/Dolls/Games &gt; Educational</option>
<option value="439">Toys/Dolls/Games &gt; Educational &gt; Educational Electronic Games</option>
<option value="440">Toys/Dolls/Games &gt; Educational &gt; Educational Video Games</option>
<option value="23">Transportation</option>
<option value="427">Travel/Luggage</option>
<option value="429">Travel/Luggage &gt; Backpacks</option>

1 个答案:

答案 0 :(得分:0)

我不确定你的用例是什么,但从你的问题我会假设一个节点html解析器(如htmlparser2)或html到json转换器(如html-to-json)会做特技。希望有所帮助!

相关问题