阅读背景:

Scrapy:基于mysql选择URL的最佳方式

来源:互联网 

I made a Scrapy crawler that collects some data from forum threads. On the list page, i can see the last modified date. Based on that date, i want to decide whether to crawl the thread again or not. I store the data in mysql, using pipeline. While processing the list page with my CrawlSpider, i want to check a record in the mysql, and based on that record i either want to yield a Request or not. (I DO NOT want to load the url unless there is a new post.)I made a Scrapy crawler that collects some data




你的当前访问异常,请进行认证后继续阅读剩余内容。

分享到: