Scrapy 1

Scrapy 1

Web Spider
Python: How to connect to the Internet?

urllib: (url + lib)

URL :

protocol:// hostname[:port] / path / [;parameters][?query]#fragment
protocol -> http, https, ftp, file, ed2k…
hostname: domain name or IP address
port: 端口号,http默认为80
path: directory

We use Python.doc to learn the urllib module.
Here the urllib.request is the main task for us.

1
2
3
4
5
6
7
8
>>> import urllib.request
>>> responce = urllib.request.urlopen("http://www.fishc.com")
>>> html = responce.read()
>>> print(html)
# 输出一个二进制bit文件
>>> html = html.decode("utf-8")
>>> print(html)
# 转码获得标准源文件

本博客所有文章除特别声明外,均采用 CC BY-SA 4.0 协议 ,转载请注明出处!