On Jan 26, 2006, at 12:12 PM, Patrik Simons wrote: > > And here quixote.html.url_quote does it wrong, imho. If you set > quixote.DEFAULT_CHARSET to 'utf-8' and then url_quote a unicode > string, > url_quote should first encode the string as utf-8 and then quote it. This is interesting. For a unicode argument, the url_quote in quixote is really the same as urllib.quote. > It doesn't and quixote breaks with a UnicodeDecodeError on urls like > this one: u'/component?test=\xc4' On python 2.3.5, urllib.quote(u'\xc4') returns '%C4'. On python 2.4.2, urllib.quote(u'\xc4') raises KeyError. From rfc3986: When a new URI scheme defines a component that represents textual data consisting of characters from the Universal Character Set [UCS], the data should first be encoded as octets according to the UTF-8 character encoding [STD63]; then only those octets that do not correspond to characters in the unreserved set should be percent- encoded. For example, the character A would be represented as "A", the character LATIN CAPITAL LETTER A WITH GRAVE would be represented as "%C3%80", and the character KATAKANA LETTER A would be represented as "%E3%82%A2". This suggests to me that urllib.quote should *always* encode unicode arguments to 'utf8' first. Is this a bug in urllib.quote?