On 6/4/06, Neil Schemenauerwrote: > Mike Orr wrote: > > def filter(self, val, **kw): > > val = htmlescape(val) > > if isinstance(val, htmltext): > > return str(val) # Cheetah > 1.0rc1 compatibility. > > else: > > return val > > > > In this case it's trying to filter U"A\xa0B" retrieved from the > > database. That's "AB" with the degree symbol in between. Voila: > > > > UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in > > position 1: ordinal not in range(128) I finally got it to work with: return val.__str__().encode('latin1', 'xmlcharrefreplace') The .__str__() is a Cheetah method that does return Unicode, which may be wrong, but Guido has said str() will be allowed to return Unicode in a future version to get around some of these problems. .__str__() in Cheetah has the side effect of calling the template's main method so you don't have to hardcode its name, that's why I was using str() in the first place and then switched to unicode() because I thought maybe that would work. I was surprised that XML entities can he higher than 255 but Python things they can. I don't really care what the browser displays for it because we don't really know what the original character was supposed to be anyway (it was pasted from another application using who knows what charset that may have been different from the browser's charset of the person who uploaded it). I just want the page to be readable because it contains scientific data people need. I still don't know why MySQLdb or MySQL or some Python or C library is truncating the values on insert at the first non-ascii or non-latin-1 character, but I'm just running a conversion function to asciify new values to sidestep the issue. I'm also trying SQLAlchemy for my new application, so maybe it'll do a better job. -- Mike Orr