代码: #!/usr/bin/env python
import requests import json
def get_one_page(url,headers):
response = requests.get(url,headers)
print(response.cookies)
print(response.status_code)
print(response.text)
print(type(response.text))
print(response.json)
print(type(response.json))
if name == 'main': headers={ 'accept': 'text / html, application / xhtml + xml, application / xml;q = 0.9, image / webp, image / apng, * / *;q = 0.8', 'accept - encoding': 'gzip, deflate', 'accept - language': 'zh - CN', 'cache - control': 'max - age = 0', 'dnt': '1', 'upgrade - insecure - requests': '1', 'user - agent': 'Mozilla / 5.0(Windows NT 6.1) AppleWebKit / 537.36(KHTML, like Gecko) Chrome / 61.0.3163.79 Safari / 537.36 Maxthon / 5.2.3.3000', 'x - devtools - emulate - network - conditions - client - id': '0f286fdf - ae53 - 4784 - 9610 - 56f5b068a872' } url = 'https://www.toutiao.com/a6602782094278001159/' get_one_page(url,headers)
运行结果: <RequestsCookieJar[]> 200
<meta charset="UTF-8"><meta content="width=device-width,initial-scale=1" name="viewport"><meta content="ie=edge" http-equiv="X-UA-Compatible"><link href="//s3a.pstatp.com/toutiao/resource/ntoutiao_web/static/image/favicon_8e9c9c7.ico" rel="shortcut icon" type="image/x-icon"><title>今日头条</title><class 'str'> <bound method Response.json of <Response [200]>> <class 'method'>
为什么运行后代码为空,什么原因,网站屏蔽了吗???高手帮忙看一下。谢谢。
1
iSecret 2018-09-20 10:23:37 +08:00
爬今日头条需要用到 PhantomJS。
|
2
NoString 2018-09-20 10:30:06 +08:00
js 渲染的界面得先拿 phantomJS 运行完再..
|
3
misaka19000 2018-09-20 11:29:46 +08:00
请把代码格式化好再发
|
4
happykjoy OP 好的,谢谢各位回复。
|