Skip to content

Latest commit

 

History

History
98 lines (80 loc) · 4.4 KB

8_requests.md

File metadata and controls

98 lines (80 loc) · 4.4 KB

The Request library

From Part 1 to 7 we've been using Python's standard urllib library. A popular third-party library, Requests offers much of the same functionality through a simplified API.

Much of the commands between urllib.request.urlopen and requests look similar:

from urllib.request import urlopen
response = urlopen("http://python.org")
print(response.url) # https://www.python.org/
print(response.status) # 200
print(response.reason) # OK
print(response.getheader('Content-Type')) # text/html; charset=utf-8

With Requests, the keys in the headers attribute are case-insensitive:

import requests
response = requests.get('https://python.org')
response.url # https://www.python.org/
response.status_code # 200
response.reason # OK
response.headers['content-type'] # text/html; charset=utf-8
response.ok # True indicates status code in the 200 range
response.is_redirect # False in this case
response.request.headers # {'User-Agent': 'python-requests/2.21.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}
response.headers['Content-Encoding'] # 'gzip'

Notice that Requests automatically handle compression for us: in the request headers it has the Accept-Encoding and in response.headers we can see the key-value pair indicating a gzip type encoding. To print the response content, we can do response.content. But because Request also performs automatic decoding for us (by using values in the headers to determining a character set), we can use response.text instead to get the decoded content in Unicode. The returned values will be str rather than bytes.

import requests
response = requests.get('https://google.com')
print(response.encoding) # see what encoding Requests use for this response
response.content # byte objects from HTTPResponse
response.text  # get the decoded content
  • We can change the encoding (e.g response.encoding = 'utf-8').
  • We can get the cookies from our responses (response.cookies)

The Requests library has a Session class which allows the reuse of cookies similar to how we use http.CookieJar and HTTPCookieHandler:

import requests
from urllib.parse import parse_qs
s = requests.Session()
res = s.get('http://google.com')
print(dict(res.cookies)['NID'])
responsecookies = res.headers['Set-Cookie']

response = s.get('http://google.com')
requestcookies = response.request.headers['Cookie']

print(parse_qs(responsecookies)[' domain'][0])
print(parse_qs(requestcookies)[' NID'][0])

After our first request, the server sent its response the header of which contains a Set-Cookie header item. As we'll see, when we use the same session, there is a Cookie header containing the same cookie that was assigned to us by the server in our first request.

Compare that to the code block below, which is syntatically similar but you'll notice that the second request do not contain a Cookie header because these are two separate sessions.

import requests
from urllib.parse import parse_qs
res = requests.get('http://google.com')
print(dict(res.cookies)['NID']) # 180=Xcj5zaq9vwndLK9VciAQvNQ...

response = requests.get('http://google.com')
print(response.request.headers) # No 'Cookie' key

Instead of a GET method, we can use other methods directly with Request:

# HEAD is identical to GET except the server MUST NOT return a message-body in the response
response = requests.head('http://google.com') 
response.status_code # 200

data = {'P': 'Python'}
response = requests.post('http://search.debian.org/cgi-bin/omega', data=data)

params = {':action': 'search', 'term': 'models'}
response = requests.get('http://pypi.python.org/pypi', params=params)

One difference between Request and urllib.request is how error conditions are handled. requests.get() doesn't raise an exception unless we explicitly tells it to do so using response.raise_for_status():

import requests
response = requests.get('http://google.com/randompages')
print(response.status_code) # 404

The equivalent code in urllib however would throw an exception:

from urllib.request import urlopen
response = urlopen('http://google.com/randompages')
print(response.status)

raise_for_status() returns None if there is no exception, making it suitable for a try-except block.