Scrapy 2

Example 1:

We are going to visit the “placekitten“. Here we want to use Python to visit the network and download the picture from it.

Example 2:

Translate the sentence with the help of “http://fanyi.youdao.com".

Example 1:

We are going to visit the “placekitten“. Here we want to use Python to visit the network and download the picture from it.

import urllib.request

responce = urllib.request.urlopen("http://placekitten.com/g/500/600")
# equivalent sentence
# req = urllib.request.Request("http://placekitten.com/g/500/600")
# responce = urllib.request.urlopen(req)
# The returned 'responce' is an object
# responce.geturl() -> URL
# responce.info() -> the object
# print(responce.info()) -> object info
# responce.getcode() -> 200(the responce speed of http)
cat_img = responce.read()

with open('cat_500_600.jpg', 'wb') as f:
    f.write(cat_img)

In the end, we get a picture below the running directory.

Example 2:

Translate the sentence with the help of “http://fanyi.youdao.com".

At first, we need to know the “Inspect Elements” in chrome. We can see the source code by opening it. Then choose the ‘Network’, when we use the function of ‘translation’, the network will be added with many documents. In the ‘Method’ column, we can see ‘POST’ and ‘GET’.

‘POST’: 向指定服务器提交被处理的数据；
‘GET’: 从服务器请求获得数据，也常用来提交数据。

Click on the POST, we can see the headers, preview, responce, cookies, timing. In preview, we can see the result we need.

In headers:
Remote Address: IP address + port of the server.
Request URL: not the real address we need.
Request Method: POST or GET.
Status Code: speed of server.
Request Headers: client (chrome -> explorer, python -> code).
User Agent: To recognize the visitor is explorer or code. (OS edition,Net core, Explorer edition) This can be easily user-defined.
Form Data: The ‘POST’ content.

urllib.request.urlopen(url, data=None, [timeout, ]*, cafile=None, capath=None, cadefault=False, context=None)

data=None -> GET
data=value -> POST

import urllib.request
import urllib.parse

url = 'http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule&smartresult=ugc&sessionFrom=null'
data = {}
data['type'] = 'AUTO'
data['i'] = 'happy with Python'
data['doctype'] = 'json'
data['xmlVersion'] = '1.8'
data['keyfrom'] = 'fanyi.web'
data['ue'] = 'UTF-8'
data['action'] = 'FY_BY_CLICKBUTTON'
data['typoResult'] = 'true'

data = urllib.parse.urlencode(data).encode('utf-8')
# encode Unicode -> utf-8 (编译)
responce = urllib.request.urlopen(url, data)
html = responce.read().decode('utf-8')
# decode utf-8 -> Unicode (解码)

print(html)

The result above is a dictionary. We cannot get the direct answer. So we can use the search of alignment and mapping to perfect the program.

import urllib.request
import urllib.parse
import json

content = input("Pls input the content you need to translate: ")

url = 'http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule&smartresult=ugc&sessionFrom=null'
data = {}
data['type'] = 'AUTO'
data['i'] = content
data['doctype'] = 'json'
data['xmlVersion'] = '1.8'
data['keyfrom'] = 'fanyi.web'
data['ue'] = 'UTF-8'
data['action'] = 'FY_BY_CLICKBUTTON'
data['typoResult'] = 'true'

data = urllib.parse.urlencode(data).encode('utf-8')
# encode Unicode -> utf-8 (编译)
responce = urllib.request.urlopen(url, data)
html = responce.read().decode('utf-8')
# decode utf-8 -> Unicode (解码)

# print(html)
# json 轻量级数据交换格式，用字符串结构封装Python结果

target = json.loads(html)
print("Result: %s" % (target['translateResult'][0][0]['tgt']))

Here is the result of our codes.

1 2	`Pls input the content you need to translate: 她真漂亮！ Result: She is really beautiful!`

But there are still some problems remained for us. Such as the fix of User Agent.

Web Spider

Python Scrapy

本博客所有文章除特别声明外，均采用 CC BY-SA 4.0 协议，转载请注明出处！

Regular Expression Previous

Scrapy 1 Next

Scrapy-2

Scrapy 2

Example 1:

Example 2:

Example 1:

Example 2: