I'm trying to figure out how Quixote handles non-ASCII characters in form input. Our users tend to paste text from Word documents and FileMaker databases etc, which often contain: - degree symbols - "Word-enhanced" characters (curly quotes, long dashes, bullets) - Spanish/Portuguese letters (less common) The source charset is windows-1252 or mac_roman depending on which platform the document was created on. I want to use unicode in memory and utf-8 for display and MySQL. I thought I would have to guess the charset or ask the user and then try to convert, but I'm getting errors. But when I inspected the form input, to my amazement it was already unicode. Is this happening at some lower layer? My form is embedded in a utf-8 webpage, so if the data comes back as utf-8 and something autocoverts it to uncode, that's (almost) OK. Is this how HTTP and Quixote work? In this case, the only remaining problems would be: - Will it safely interpret any 8-bit character string the user pastes in without raising an exception? - What if the document's character set is different from the platform the user is running on? Of course, both will be different than utf-8 in any case. Will the browser convert the characters, reinterpret them as-is, or what? If I start getting 8-bit strings as form input, I'll have to convert them using an algorithm to guess the charset, or a companion pulldown for the user to tell me. I'll hold off on the details of this for now since I'm still navigating through the minefield of UnicodeDecodeError and UnicodeEncodeError in various circumstances. * * * By the way, Quixote's error handler can't display an error message that contains non-ASCII characters. File "/usr/local/lib/python2.4/Quixote-2.4-py2.4-linux-i686.egg/quixote/publish.py", line 195, in finish_failed_request tb) File "/usr/local/lib/python2.4/Quixote-2.4-py2.4-linux-i686.egg/quixote/publish.py", line 236, in _generate_cgitb_error error_file.write(str(util.dump_request(request))) UnicodeEncodeError: 'ascii' codec can't encode characters in position 41-42: ordinal not in range(128) The problem is that str() call. However, if you take it out you get the same error, because the called method also calls str(). Ayayay! This happens no matter whether display_exceptions is set to "html", "plain" or "none". To see the real error I added "raise" before the call to publish.finish_failed_request (publish.py line 284). The real error was: File "./char_conversion_site.py", line 84, in _q_index text = text.decode("mac_roman", "replace") File "/usr/lib/python2.4/encodings/mac_roman.py", line 22, in decode return codecs.charmap_decode(input,errors,decoding_map) UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128) (The reason for the error was, 'text' is already unicode so should not be decoded.) There don't seem to be non-ASCII characters in that traceback so I'm not 100% sure why Quixote blew up on it, but it did. -- Mike Orr