国产探花免费观看_亚洲丰满少妇自慰呻吟_97日韩有码在线_资源在线日韩欧美_一区二区精品毛片,辰东完美世界有声小说,欢乐颂第一季,yy玄幻小说排行榜完本

首頁 > 編程 > ASP > 正文

批量抓取某個列表頁的教程分享

2024-05-04 11:04:59
字體:
供稿:網(wǎng)友

本篇文章介紹了批量抓取某個列表頁的教程分享,下面我們就來看看詳細的教程,需要的朋友可以參考下。

有些人當(dāng)抓取程序是個寶,到目前還TND有人在賣錢,強烈BS一下這些家伙 真是的!可能偶下邊這段東西比較爛哈
下邊這個沒有寫入庫功能,已經(jīng)到這一步了,入庫功能是很簡單的事了,需要的請自己去完成吧,其它功能各位自行完善吧!把代碼拷貝過去直接運行即可看到效果

Dim?Url,List_PageCode,Array_ArticleID,i,ArticleID
Dim?Content_PageCode,Content_TempCode
Dim?Content_CategoryID,Content_CategoryName,BorderID,ClassID,BorderName,ClassName
Dim?ArticleTitle,ArticleAuthor,ArticleFrom,ArticleContent

Url?=?"http://www.webasp.net/article/class/1.htm"
List_PageCode?=?getHTTPPage(Url)
List_PageCode?=?RegExpText(List_PageCode,"打印","

List_PageCode?=?RegExpText(List_PageCode," '取得當(dāng)前列表頁的文章鏈接,以,分隔
Array_ArticleID?=?Split(List_PageCode,",")????'創(chuàng)建數(shù)組,存儲文章ID

For?i=0?To?Ubound(Array_ArticleID)-1
????ArticleID?=?Array_ArticleID(i)????'文章ID
????Content_PageCode?=?getHTTPPage("http://www.webasp.net/article/"&ArticleID)????'取得文章頁的內(nèi)容

????'=========取文章分類及相關(guān)ID參數(shù)?開始=======================
????Content_TempCode?=?RegExpText(Content_PageCode,"
技術(shù)教程?>>?",">>?內(nèi)容",0)
????Content_CategoryID?=?RegExpText(Content_PageCode,"",1)
????BorderID?=?Split(Content_CategoryID,",")(0)????'大類ID
????ClassID?=?Split(Content_CategoryID,",")(1)????'子類ID
????????'==========檢查大類是否存在?開始===============
????????'如果不存在則入庫

????????'==========檢查大類是否存在?結(jié)束===============
????'Response.Write(BorderID?&?","?&?ClassID?&?"
")
????Content_CategoryName?=?RegExpText(Content_PageCode,"/'>","
",1)
????BorderName?=?Split(Content_CategoryName,",")(0)????'大類名稱
????ClassName?=?Split(Content_CategoryName,",")(1)????'子類名稱
????????'==========檢查子類是否存在?開始===============
????????'如果不存在則入庫

????????'==========檢查子類是否存在?結(jié)束===============
????'=========取文章分類及相關(guān)ID參數(shù)?結(jié)束=======================

????'=========取文章標(biāo)題及內(nèi)容?開始=============================

????ArticleTitle?=?RegExpText(Content_PageCode,"","",0)
????ArticleAuthor?=?RegExpText(Content_PageCode," 作者:","",0)
????ArticleFrom?=?RegExpText(Content_PageCode," 來源:","",0)
????ArticleContent?=?RegExpText(Content_PageCode,"",""&VBCrlf&"????????"&VBCrlf&"????",0)
????'=========取文章標(biāo)題及內(nèi)容?結(jié)束=============================
????Response.Write(ArticleTitle&?"

")
????Response.Flush()
Next


附幾個函數(shù):

Function?getHTTPPage(url)?
????IF(IsObjInstalled("Microsoft.XMLHTTP")?=?False)THEN
????????Response.Write?"

服務(wù)器不支持Microsoft.XMLHTTP組件"?
????????Err.Clear
????????Response.End
????END?IF
????On?Error?Resume?Next
????Dim?http?
????SET?http=Server.CreateObject("Msxml2.XMLHTTP")?
????Http.open?"GET",url,False?
????Http.send()?
????IF(Http.readystate4)THEN
????????Exit?Function?
????END?IF?
????getHTTPPage=BytesToBSTR(Http.responseBody,"GB2312")
????SET?http=NOTHING
????IF(Err.number0)THEN
????????Response.Write?"

獲取文件內(nèi)容出錯"?
????????'Response.End
????????Err.Clear
????END?IF??
End?Function

????????
Function?BytesToBstr(CodeBody,CodeSet)
????Dim?objStream
????SET?objStream?=?Server.CreateObject("adodb.stream")
????objStream.Type?=?1
????objStream.Mode?=3
????objStream.Open
????objStream.Write?CodeBody
????objStream.Position?=?0
????objStream.Type?=?2
????objStream.Charset?=?CodeSet
????BytesToBstr?=?objStream.ReadText?
????objStream.Close
????SET?objStream?=?NOTHING
End?Function

'================================================
'作??用:檢查組件是否已經(jīng)安裝
'返回值:True??----已經(jīng)安裝
'????????False?----沒有安裝
'================================================
Function?IsObjInstalled(objName)
????On?Error?Resume?Next
????IsObjInstalled?=?False
????Err?=?0
????Dim?testObj
????SET?testObj?=?Server.CreateObject(objName)
????IF(0?=?Err)THEN?IsObjInstalled?=?True
????SET?testObj?=?NOTHING
????Err?=?0
End?Function

Function?RegExpText(strng,strStart,strEnd,n)
????Dim?regEx,Match,Matches,RetStr
????SET?regEx?=?New?RegExp
????regEx.Pattern?=?strStart&"([/s/S]*?)"&strEnd
????regEx.IgnoreCase?=?True
????regEx.Global?=?True
????SET?Matches?=?regEx.Execute(strng)
????For?Each?Match?in?Matches
????????IF(n=1)THEN
????????????RetStr?=?RetStr?&?regEx.Replace(Match.Value,"$1")?&?","?
????????ELSE
????????????RetStr?=?RetStr?&?regEx.Replace(Match.Value,"$1")
????????END?IF?
????Next
????RegExpText?=?RetStr
????SET?regEx=NOTHING
End?Function
以上就是批量抓取某個列表頁的教程分享的全部內(nèi)容介紹了,希望小編整理的相關(guān)知識和資料都對你們有所幫助,更多內(nèi)容請繼續(xù)關(guān)注錯新技術(shù)頻道網(wǎng)站!
發(fā)表評論 共有條評論
用戶名: 密碼:
驗證碼: 匿名發(fā)表
主站蜘蛛池模板: 台州市| 呼玛县| 遂宁市| 岫岩| 葫芦岛市| 灌云县| 河南省| 威海市| 肇庆市| 博白县| 南木林县| 资中县| 宜昌市| 晋州市| 桐柏县| 湖口县| 讷河市| 昌平区| 永德县| 甘泉县| 随州市| 额济纳旗| 房产| 常德市| 大姚县| 莆田市| 长寿区| 乡宁县| 荣成市| 宾阳县| 乌苏市| 屏东市| 沙湾县| 昭平县| 蒙山县| 平陆县| 古交市| 濮阳县| 曲麻莱县| 石城县| 临邑县|