一些新功能:
多種結(jié)果數(shù)據(jù)類型(String, char, byte, short int, long, double, float, string[], Set, List,Data)
支持用戶之定義腳本處理函數(shù)(目前支持Javascript 函數(shù)配置處理)
支持css,xpath內(nèi)核替換
支持filter功能
對css,xpath 內(nèi)核對象的緩存
一個完整的例子:
html' target='_blank'>public class OsChinaBlog { public static void main(String[] args) throws Exception { Document doc = Jsoup.connect( http://www.oschina.net/news/43879/webmagic-0-3-0 ).timeout(60000) .userAgent( Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:23.0) Gecko/20100101 Firefox/23.0 ).get(); String html = doc.html(); OsChinaBlogModel model = AnnotationExtractor.me().process(html, OsChinaBlogModel.class); System.out.println(model.toJson()); public static class OsChinaBlogModel extends Model { public OsChinaBlogModel() { //use to reflect @Inject @ComboExtract(value = { @ExtractBy(value = h1.OSCTitle , type = ExprType.CSS), @ExtractBy(value = //title/text() , type = ExprType.XPATH) }, op = OP.OR) public String title; @Inject @ExtractBy(value = p.PubDate a[href~=http://my//.oschina//.net/] , type = ExprType.CSS) public String author; @Inject @ExtractBy(value = 發(fā)布于.//s*(//d+年//d+月//d+日) , type = ExprType.REGEX) public Date publishDate; @Inject @ComboExtract(value = { @ExtractBy(value = p.PubDate , type = ExprType.CSS, setting = @Setting(outerHtml = true)), @ExtractBy(value = (//d+)評 , type = ExprType.REGEX) }, op = OP.AND) public int commentNum; @Inject @ExtractBy(value = span#p_favor_count , type = ExprType.CSS, setting = @Setting(function = @Function(value = replace , args = { + , }))) public int collectNum; @Inject @ComboExtract(value = { @ExtractBy(value = p[id=userComments] , type = ExprType.CSS, setting = @Setting(outerHtml = true)), @ExtractBy(value = p.TextContent , type = ExprType.CSS) }, op = OP.AND, multi = true) public List commentContents; @Inject @ExtractBy(value = p[id=toolbar_wrapper] , setting = @Setting(fliters = { b , span }), type = ExprType.CSS, impl = Document.class) public String weibo;}
【相關(guān)推薦】
1. 免費(fèi)html在線視頻教程
2. html開發(fā)手冊
3. VeVb.com原創(chuàng)html5視頻教程
以上就是對HTML 提取器(woody)的介紹的詳細(xì)內(nèi)容,html教程
鄭重聲明:本文版權(quán)歸原作者所有,轉(zhuǎn)載文章僅為傳播更多信息之目的,如作者信息標(biāo)記有誤,請第一時間聯(lián)系我們修改或刪除,多謝。
新聞熱點(diǎn)
疑難解答
圖片精選