Tuesday, March 3, 2020

Encode and decode URLs in Python 3

>>> import urllib.parse >>> query = 'Hellö Wörld@Python' >>> urllib.parse.quote(query) 'Hell%C3%B6%20W%C3%B6rld%40Python'
Note that, the quote() function considers / character safe by default. That means, It doesn’t encode / character -
>> urllib.parse.quote('/') '/'
The quote() function accepts a named parameter called safe whose default value is /. If you want to encode / character as well, then you can do so by supplying an empty string in the safe parameter like this-
>>> urllib.parse.quote('/', safe='') '%2F'

The quote() function encodes space characters to %20. If you want to encode space characters to plus sign (+), then you can use another function named quote_plus provided by urllib.parse package.
>>> import urllib.parse >>> query = 'Hellö Wörld@Python' >>> urllib.parse.quote_plus(query) 'Hell%C3%B6+W%C3%B6rld%40Python'

Encoding multiple parameters at once
You can encode multiple parameters at once using urllib.parse.urlencode() function. This is a convenience function which takes a dictionary of key value pairs or a sequence of two-element tuples and uses the quote_plus() function to encode every value. The resulting string is a series of key=value pairs separated by & character.
Let’s see an example -
>>> import urllib.parse >>> params = {'q': 'Python URL encoding', 'as_sitesearch': 'www.urlencoder.io'} >>> urllib.parse.urlencode(params) 'q=Python+URL+encoding&as_sitesearch=www.urlencoder.io'
If you want the urlencode() function to use the quote() function for encoding parameters, then you can do so like this -
urllib.parse.urlencode(params, quote_via=urllib.parse.quote)
Encoding multiple parameters at once where one parameter can have multiple values
The urlencode() function takes an optional argument called doseq. If your input can have multiple values for a single key, then you should set the doseq argument to True so that all the values are encoded properly -
>>> import urllib.parse >>> params = {'name': 'Rajeev Singh', 'phone': ['+919999999999', '+628888888888']} >>> urllib.parse.urlencode(params, doseq=True) 'name=Rajeev+Singh&phone=%2B919999999999&phone=%2B628888888888'

------
URL Decoding query strings or form parameters in Python (3+)
In Python 3+, You can URL decode any string using the unquote() function provided by urllib.parse package. The unquote() function uses UTF-8 encoding by default.
Let’s see an example -
>>> import urllib.parse >>> encodedStr = 'Hell%C3%B6%20W%C3%B6rld%40Python' >>> urllib.parse.unquote(encodedStr) 'Hellö Wörld@Python'

Decoding plus sign (+) to space character
The unquote() function does not decode plus sign (+) -
>>> urllib.parse.unquote('My+name+is+Rajeev') 'My+name+is+Rajeev'
But if you’re working with HTML forms then you’ll need to replace plus sign (+) with space character because HTML forms use application/x-www-form-urlencoded MIME type which encodes space character to plus sign (+) instead of %20.
For replacing plus sign with space, you can use the unquote_plus() function of urllib.parse package -
>>> import urllib.parse >>> encodedStr = 'My+name+is+Rajeev' >>> urllib.parse.unquote_plus(encodedStr) 'My name is Rajeev'

URL Decoding multiple query strings at once
If you want to decode or parse multiple query strings of type application/x-www-form-urlencoded (e.g 'name=Rajeev+Singh&phone=%2B919999999999'), then you can use parse_qs or parse_qsl functions provided by urllib.parse package.
The parse_qs function returns a dictionary of key-value pairs whereas the parse_qsl function returns a list of (key, value) tuples.
parse_qs
>>> import urllib.parse >>> queryStr = 'name=Rajeev+Singh&phone=%2B919999999999&phone=%2B628888888888' >>> urllib.parse.parse_qs(queryStr) {'name': ['Rajeev Singh'], 'phone': ['+919999999999', '+628888888888']}
parse_qsl
>>> import urllib.parse >>> queryStr = 'name=Rajeev+Singh&phone=%2B919999999999&phone=%2B628888888888' >>> urllib.parse.parse_qsl(queryStr) [('name', 'Rajeev Singh'), ('phone', '+919999999999'), ('phone', '+628888888888')]