来自HTML的Microdata Java Parser

时间:2015-09-10 05:30:25

标签: java parsing metadata microdata

实际上我正在尝试使用java库从HTML源页面解析schema.org Microdata。我尝试了很多站点,最后我找到了来自https://maps.googleapis.com/maps/api/directions/json的任何23和Mf2j。对于.apache any23我找不到合适的文档所以我离开了,最后使用Mf2j我构建项目,我得到了jar。我创建了一个示例项目来执行Mf2j来解析Microdata。这是我写的示例代码。

package org.mypro;

import java.io.IOException;
import java.net.HttpURLConnection;
import java.net.URI;
import java.net.URISyntaxException;
import java.net.URL;
import java.util.Map;
import java.util.Properties;
import java.util.Map.Entry;

import com.kylewm.mf2j.Mf2Parser;

public class MySample {

    /**
     * @param args
     * @throws URISyntaxException 
     * @throws IOException 
     */
    public static void main(String[] args) throws IOException, URISyntaxException {
        // TODO Auto-generated method stub
        Mf2Parser parser = new Mf2Parser()
        .setIncludeAlternates(true)
        .setIncludeRelUrls(true);
        URL server = new URL("http://blogbasics.com/examples-of-blogs/");
        Properties systemProperties = System.getProperties();
        systemProperties.setProperty("http.proxyHost","pp.xyz.com");
        systemProperties.setProperty("http.proxyPort","1234");
        HttpURLConnection connection = (HttpURLConnection)server.openConnection();
        connection.connect();
    Map<String,Object> parsed = parser.parse(new URI("http://blogbasics.com/examples-of-blogs/"));
    for (Entry<String, Object> string : parsed.entrySet()) {
        System.out.println(string.getKey() +" = "+string.getValue());
    }

    }

}

但我得到了这样的输出。我得到了图像URL,但我需要所有Microdata元数据。

rels = {"Icon":["http://blogbasics.com/wp-content/uploads/cropped-Blog-Basics-favicon.png"],"Shortcut":["http://blogbasics.com/wp-content/uploads/cropped-Blog-Basics-favicon.png"],"apple-touch-icon-precomposed":["http://blogbasics.com/wp-content/uploads/apple-touch-icon-144x144.png","http://blogbasics.com/wp-content/uploads/apple-touch-icon-114x114.png","http://blogbasics.com/wp-content/uploads/apple-touch-icon-72x72.png","http://blogbasics.com/wp-content/uploads/apple-touch-icon-57x57.png"],"canonical":["http://blogbasics.com/examples-of-blogs/"],"external":["http://blogbasics.com","http://allisondduncan.com","http://www.kuldipsonsjewellers.com/Earring.html","http://jin6000.tumblr.com/","http://ckckarate.com","http://www.ferrypress.com/profile.php?u=Herman1879"],"nofollow":["http://bit.ly/1EQF6HG","https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/","http://blogbasics.com/examples-of-blogs/#comment-261","http://blogbasics.com","http://blogbasics.com/examples-of-blogs/#comment-262","http://allisondduncan.com","http://blogbasics.com/examples-of-blogs/#comment-270","http://blogbasics.com/examples-of-blogs/#comment-736","http://www.kuldipsonsjewellers.com/Earring.html","http://blogbasics.com/examples-of-blogs/#comment-1407","http://blogbasics.com/examples-of-blogs/#comment-3036","http://blogbasics.com/examples-of-blogs/#comment-5682","http://blogbasics.com/examples-of-blogs/#comment-6877","http://blogbasics.com/examples-of-blogs/#comment-8615","http://jin6000.tumblr.com/","http://blogbasics.com/examples-of-blogs/#comment-8684","http://ckckarate.com","http://blogbasics.com/examples-of-blogs/#comment-18326","http://www.ferrypress.com/profile.php?u=Herman1879","http://blogbasics.com/examples-of-blogs/#comment-22883","http://blogbasics.com/examples-of-blogs/#comment-26672","http://blogbasics.com/examples-of-blogs/#respond","http://www.blogbasics.com","http://www.blogbasics.com/privacy","http://www.blogbasics.com/terms-conditions/","http://www.blogbasics.com/contact/"],"publisher":["https://www.google.com/+Blogbasics"],"stylesheet":["http://blogbasics.com/wp-content/themes/tru/style.css?v=1427736009&ver=2.1.3","http://fonts.googleapis.com/css?family=Open+Sans%3A300%2C400italic%2C400%2C600%2C700%7CRokkitt&ver=1.5","http://optinskin.com/src-4/min/normalize.min.css?ver=4.3","http://blogbasics.com/wp-content/plugins/OptinSkin/skins/1/style.css?ver=4.3","http://blogbasics.com/wp-content/themes/tru/includes/lib/spyr_slidingshare/style.css?ver=0.9.3"]}
items = []
rel-urls = {"http://allisondduncan.com":{"rels":["external","nofollow"],"text":"Allison Duncan"},"http://bit.ly/1EQF6HG":{"rels":["nofollow"]},"http://blogbasics.com":{"rels":["external","nofollow"],"text":"Paul Odtaa"},"http://blogbasics.com/examples-of-blogs/":{"rels":["canonical"]},"http://blogbasics.com/examples-of-blogs/#comment-1407":{"rels":["nofollow"],"text":"Reply"},"http://blogbasics.com/examples-of-blogs/#comment-18326":{"rels":["nofollow"],"text":"Reply"},"http://blogbasics.com/examples-of-blogs/#comment-22883":{"rels":["nofollow"],"text":"Reply"},"http://blogbasics.com/examples-of-blogs/#comment-261":{"rels":["nofollow"],"text":"Reply"},"http://blogbasics.com/examples-of-blogs/#comment-262":{"rels":["nofollow"],"text":"Reply"},"http://blogbasics.com/examples-of-blogs/#comment-26672":{"rels":["nofollow"],"text":"Reply"},"http://blogbasics.com/examples-of-blogs/#comment-270":{"rels":["nofollow"],"text":"Reply"},"http://blogbasics.com/examples-of-blogs/#comment-3036":{"rels":["nofollow"],"text":"Reply"},"http://blogbasics.com/examples-of-blogs/#comment-5682":{"rels":["nofollow"],"text":"Reply"},"http://blogbasics.com/examples-of-blogs/#comment-6877":{"rels":["nofollow"],"text":"Reply"},"http://blogbasics.com/examples-of-blogs/#comment-736":{"rels":["nofollow"],"text":"Reply"},"http://blogbasics.com/examples-of-blogs/#comment-8615":{"rels":["nofollow"],"text":"Reply"},"http://blogbasics.com/examples-of-blogs/#comment-8684":{"rels":["nofollow"],"text":"Reply"},"http://blogbasics.com/examples-of-blogs/#respond":{"rels":["nofollow"],"text":"Cancel reply"},"http://blogbasics.com/wp-content/plugins/OptinSkin/skins/1/style.css?ver=4.3":{"media":"all","rels":["stylesheet"],"type":"text/css"},"http://blogbasics.com/wp-content/themes/tru/includes/lib/spyr_slidingshare/style.css?ver=0.9.3":{"media":"all","rels":["stylesheet"],"type":"text/css"},"http://blogbasics.com/wp-content/themes/tru/style.css?v=1427736009&ver=2.1.3":{"media":"all","rels":["stylesheet"],"type":"text/css"},"http://blogbasics.com/wp-content/uploads/apple-touch-icon-114x114.png":{"rels":["apple-touch-icon-precomposed"]},"http://blogbasics.com/wp-content/uploads/apple-touch-icon-144x144.png":{"rels":["apple-touch-icon-precomposed"]},"http://blogbasics.com/wp-content/uploads/apple-touch-icon-57x57.png":{"rels":["apple-touch-icon-precomposed"]},"http://blogbasics.com/wp-content/uploads/apple-touch-icon-72x72.png":{"rels":["apple-touch-icon-precomposed"]},"http://blogbasics.com/wp-content/uploads/cropped-Blog-Basics-favicon.png":{"rels":["Shortcut","Icon"],"type":"image/x-icon"},"http://ckckarate.com":{"rels":["external","nofollow"],"text":"Ed JP"},"http://fonts.googleapis.com/css?family=Open+Sans%3A300%2C400italic%2C400%2C600%2C700%7CRokkitt&ver=1.5":{"media":"all","rels":["stylesheet"],"type":"text/css"},"http://jin6000.tumblr.com/":{"rels":["external","nofollow"],"text":"Edward Carty"},"http://optinskin.com/src-4/min/normalize.min.css?ver=4.3":{"media":"all","rels":["stylesheet"],"type":"text/css"},"http://www.blogbasics.com":{"rels":["nofollow"],"text":"Blog Basics"},"http://www.blogbasics.com/contact/":{"rels":["nofollow"],"text":"Contact"},"http://www.blogbasics.com/privacy":{"rels":["nofollow"],"text":"Privacy Policy"},"http://www.blogbasics.com/terms-conditions/":{"rels":["nofollow"],"text":"Terms and Conditions"},"http://www.ferrypress.com/profile.php?u=Herman1879":{"rels":["external","nofollow"],"text":"hgh xl"},"http://www.kuldipsonsjewellers.com/Earring.html":{"rels":["external","nofollow"],"text":"Jewellery Shop in Ranchi"},"https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/":{"rels":["nofollow"],"text":"(Click Here)"},"https://www.google.com/+Blogbasics":{"rels":["publisher"]}}

实际上我需要这种形式的输出,而不是json格式。我需要像这样的数据内容。我使用来自github的JavaScript Microdata Parser找到了这个,但我需要在java中使用相同类型的解析器。请建议我。感谢

"type": [ "http://schema.org/WebPage" ], "properties": { "mainContentOfPage": [ { "type": [ "http://schema.org/Blog" ], "properties": { "blogPost": [ { "type": [ "http://schema.org/BlogPosting" ], "properties": { "headline": [ "Examples of Blogs" ], "author": [ { "type": [ "http://schema.org/Person" ], "properties": { "name": [ "Kenneth Byrd" ] } } ],

Read more: http://foolip.org/microdatajs/live/#ixzz3lJJII7g0 

0 个答案:

没有答案
相关问题