国产探花免费观看_亚洲丰满少妇自慰呻吟_97日韩有码在线_资源在线日韩欧美_一区二区精品毛片,辰东完美世界有声小说,欢乐颂第一季,yy玄幻小说排行榜完本

首頁 > 學院 > 開發設計 > 正文

Avoiding getting banned(Scrapy)

2019-11-08 02:49:03
字體:
來源:轉載
供稿:網友

Avoiding getting banned Some websites implement certain measures to PRevent bots from crawling them, with varying degrees of sophistication. Getting around those measures can be difficult and tricky, and may sometimes require special infrastructure. Please consider contacting commercial support if in doubt.

Here are some tips to keep in mind when dealing with these kinds of sites:

rotate your user agent from a pool of well-known ones from browsers (google around to get a list of them) disable cookies (see COOKIES_ENABLED) as some sites may use cookies to spot bot behaviour use download delays (2 or higher). See DOWNLOAD_DELAY setting. if possible, use Google cache to fetch pages, instead of hitting the sites directly use a pool of rotating IPs. For example, the free Tor project or paid services like ProxyMesh. An open source alterantive is scrapoxy, a super proxy that you can attach your own proxies to. use a highly distributed downloader that circumvents bans internally, so you can just focus on parsing clean pages. One example of such downloaders is Crawlera If you are still unable to prevent your bot getting banned, consider contacting commercial support.


發表評論 共有條評論
用戶名: 密碼:
驗證碼: 匿名發表
主站蜘蛛池模板: 沛县| 云霄县| 卓尼县| 济宁市| 池州市| 专栏| 亳州市| 达州市| 怀宁县| 金平| 勃利县| 八宿县| 黄石市| 汽车| 雷山县| 汝城县| 闽清县| 隆化县| 繁峙县| 安塞县| 永德县| 泰和县| 江阴市| 舒城县| 昌邑市| 特克斯县| 郧西县| 晋州市| 麻阳| 平山县| 蒙山县| 辽阳市| 阳春市| 灵山县| 天津市| 天镇县| 肥西县| 阿勒泰市| 四平市| 林西县| 永寿县|