阅读背景：

（58）-- 用正则层层爬取图片

发表于:2021-08-30

# 用正则层层爬取图片

from urllib import request
import re

base_url = 'https://www.mmonly.cc/wmtp/fjtp/list_21_{}.html'

def download(pic_url):
    print('downloading...%s' % pic_url)
    fname = pic_url.split('/')[-1]
    request.urlretrieve(pic_url,'images/' + fname)

def getPage():

    for i in range(1,73):
        fullurl = base_url.format(i)
        response = request.urlopen(fullurl)
        html = response.read().decode('gb2312','ignore')
        url_pat = re.compile(r'<div class="btns" > <a class="img_album_btn" href="/go.html?url=https://www.mmonly.cc/wmtp/fjtp/(.*?)"', re.S)
        res = url_pat.findall(html)

        for url in res:
            new_url = 'https://www.mmonly.cc/wmtp/fjtp/' + url
            response = request.urlopen(new_url)
            html = response.read().decode('gb2312','ignore')
            img_pat = re.compile(r'<img alt=".*?" src="(.*?)"')
            res = img_pat.findall(html)

            for pic_url in res:
                download(pic_url)


if __name__ == '__main__':
    getPage()




from urllib import request
import

分享到：

非常感谢你花费了来阅读本文,如果你在本站获取到了新知识,那就请点击分享按钮将本站分享出去吧。

你可能喜欢:

使用php计算无序列表中列表项的总数

在读取数据库中的数据时，报错：在没有任何数据时进行无效的读取尝试，是怎么回事？

7.18 部分容器内置方法学习

Windows路径中多个反斜杠的后果（如果有的话）是什么？

ASP.NET MVC - 如何在存储库模式中单元测试边界？

怎样将特定字符串在jsp页面中高亮显示

访问https网站，提交的数据经过抓包是明文的，求解

Unity法线贴图原理理解(为什么存在切线空间？存的值是什么？)

A Gentle Introduction to Transfer Learning for Deep Learning | 迁移学习

AE 动画直接变原生代码：Airbnb 发布开源动画库 Lottie