阅读背景:

Python爬虫抓取页面内容

来源:互联网 

博客园示例:Ctrl+Alt+L格式化代码

#coding:utf-8
import requests
from lxml import etree


def gettitle(url):
    html=requests.get(url)
    selector=etree.HTML(html.text)
    title=selector.xpath('//a[@id="cb_post_title_url"]/text()')
    return title[0]

def getcontent(url):
    html=requests.get(url)
    selector=etree.HTML(html.text)
    contentlist=selector.xpath('//div[@class="postBody"]/div/p/text()')
    contents=''
    for i in contentlist:
        contents=contents+"\n"+i
    return contents
print("请输入博客园文章的链接:")
url=input("")
print(gettitle(url))
print(getcontent(url))#coding:utf-8



你的当前访问异常,请进行认证后继续阅读剩余内容。

分享到: