需求為:到某一網站抓取查詢結果.環境為vb.net
從0開始,一開始具體需要用到.net里的具體什么東東都不清楚,于是就一頓瞎搜索.又是google,又是baidu,yisou......胡亂搜的內容有.net ie,拆分網頁 .net,內嵌ie等等.沒過多久能得知webbrowser這個控件.
其中對我有幫助比較大的文章是http://www.microsoft.com/china/msdn/archives/workshop/scrape.asp
只是這里介紹的vb環境.到.net也沒什么太大差別,別笑!我最開始找shdocvw.dll 和 mshtml.dll添加引用時候都費了半天勁.因為大家都說webbrowser.而.net里寫的是microsoft web 瀏覽器..
先按照上面的文章練一練!
不說廢話了.
先做一個輸入框,和一個按鈕,供輸入信息,和提交信息.
在按紐的click事件中寫:
dim postdata as string() = {"searchtext=" + me.searchtext.text}
dim strurl as string = "http://"
dim sessionhtml as string = postdate(strurl, postdata)
'產生臨時文件
dim sw as streamwriter = new streamwriter("d:/1.htm", false, encoding.getencoding("gb2312"))
sw.writeline(sessionhtml)
sw.close()
me.axwebbrowserfill.navigate("d:/1.htm")
postdate函數如下:
public function postdate(byval url as string, byval postdata() as string) as string
dim post as string = ""
'拼接成傳遞變量
for each s as string in postdata
post += s + "&"
next
post = post.substring(0, post.length - 1)
dim html as string = ""
dim encoding as encoding = encoding.getencoding("gb2312")
dim data as byte() = encoding.getbytes(post)
dim myrequest as httpwebrequest = ctype(webrequest.create(url), httpwebrequest)
myrequest.method = "post"
myrequest.contenttype = "application/x-www-form-urlencoded"
'myrequest.contenttype = "text/asp"
myrequest.contentlength = data.length
dim newstream as stream = myrequest.getrequeststream()
newstream.write(data, 0, data.length)
newstream.close()
dim resp as httpwebresponse = ctype(myrequest.getresponse(), httpwebresponse)
dim sr as streamreader = new streamreader(resp.getresponsestream(), system.text.encoding.getencoding("gb2312"))
'返回html代碼的字符串
html = sr.readtoend()
sr.close()
return html
end function
這樣就可以了.
至于直接把html顯示在webbrowser控件中,而不通過臨時文件,在網上搜到的都是delphi辦法.而.net似乎沒有完美的解決辦法.
曾經試過:
'axwebbrowserfill.navigate(sessionhtml)
'me.axwebbrowserfill.document.write(sessionhtml + "haga")
'me.axscriptlet.url = "about:blank" + sessionhtml
'me.axwebbrowserfill.document.write(sessionhtml)
'doc = me.axwebbrowserfill.document
'doc.body.innerhtml = sessionhtml
'doc.write(sessionhtml)
往往只是第一次成功,而且中間會涉及到html內雙引號的問題.
也有網上說按如下方法:
''在webbrowser中顯示報告內容字段
'dim doc as ihtmldocument2 = ctype(axwebbrowserfill.document, ihtmldocument2)
'dim bodyelement as ihtmlelement = ctype(doc.body, ihtmlelement)
''bodyelement.innerhtml = sessionhtml + "haga"
而這個方法我就沒有奏效過!