Htmltext and latin-1 characters
2006-05-10
Mike OrrRe: [Cheetahtemplate-discuss] Htmltext and latin-1
characters
2006-05-10
Ian BickingRe: [Cheetahtemplate-discuss] Htmltext and
latin-1 characters
2006-05-11
David BingerRe: [Cheetahtemplate-discuss] Htmltext and latin-1
characters
2006-05-10
Mike Orr2006-05-11
David Binger2006-05-11
Mike Orr2006-05-11
David Binger2006-05-11
David Binger2006-05-11
David Binger2006-05-13
Akihiro KAYAMA2006-05-13
Mike Orr2006-05-15
Akihiro KAYAMARe: Htmltext and latin-1 characters
2006-06-04
Neil Schemenauer2006-06-04
ak (2 parts)2006-06-06
Akihiro KAYAMARe: Htmltext and latin-1 characters
2006-06-05
Neil Schemenauer2006-06-05
Akihiro KAYAMA2006-06-05
Neil Schemenauer2006-06-05
Mike Orr2006-06-06
Neil Schemenauer2006-06-05
Mike OrrRe: Htmltext and latin-1 characters
Neil Schemenauer
Mike Orrwrote: > def filter(self, val, **kw): > val = htmlescape(val) > if isinstance(val, htmltext): > return str(val) # Cheetah > 1.0rc1 compatibility. > else: > return val > > In this case it's trying to filter U"A\xa0B" retrieved from the > database. That's "AB" with the degree symbol in between. Voila: > > UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in > position 1: ordinal not in range(128) Right, that's the same as trying str(U"A\xa0B"). > OK, let's try returning Unicode instead. > > return unicode(val, 'latin1') # Cheetah > 1.0rc1 compatibility. > > TypeError: coercing to Unicode: need string or buffer, htmltext found You already have a unicode string. Decoding from 'latin1' makes no sense. If you had a unicode object instead of a htmltext object, you would still get an error: >>> unicode(U"A\xa0B", "latin1") Traceback (most recent call last): File " ", line 1, in ? TypeError: decoding Unicode is not supported > Darn it, why didn't htmltext subclass str!!! Because then people who wanted to represent Unicode characters would be out of luck. Qpy makes htmltext a subclass of unicode. That forces everyone who uses it to correctly handle unicode strings. > Peeking into the htmltext implementation, it stores the actual > value in an attribute ..s: > > return unicode(val.s, 'latin1') # Cheetah > 1.0rc1 compatibility. > > TypeError: decoding Unicode is not supported Same error as my code snippet above. You already have a unicode string. Decoding it makes no sense. > How about this? > > return unicode(val.s) # Cheetah > 1.0rc1 compatibility. > > UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in > position 412: ordinal not in range(128) That should work, although this the following would do the same thing since s is already a unicode string: return val.s The UnicodeEncodeError is being raised by some other code, I expect. See below. >>>> print htmltext(U"A\xa0B") > UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in > position 1: ordinal not in range(128) Does this work for you: >>> print U"A\xa0B" It works for me because: >>> import sys >>> sys.stdout.encoding 'UTF-8' Sometimes stdout is 'ascii' and so you have to manually set the encoding, eg: >>> import sys, codecs >>> sys.stdout = codecs.getwriter('utf-8')(sys.stdout) The other problem you are running into is a bug in Python, IMHO. You can't print an object that has a __str__ (or __unicode__) method that returns a unicode string: >>> class A: ... def __str__(self): ... return u"\u1234" ... >>> print A() UnicodeEncodeError another try: >>> class A: ... def __unicode__(self): ... return u"\u1234" ... >>> print A() <__main__.A instance at 0xb7dd1f4c> I'm going to post a patch for the Python bug. Hopefully it will get applied for Python 2.5. The answer you are looking for, I think, is: def filter(self, val, **kw): val = htmlescape(val) if isinstance(val, htmltext): return val.s # Cheetah > 1.0rc1 compatibility. else: return val alternatively, def filter(self, val, **kw): val = htmlescape(val) return stringify(val) # from quixote.html > >>>> print htmltext("A\xa0B") > UnicodeEncodeError Try: print stringify(htmltext("A\xa0B") Again, PyFile_WriteObject cannot print an object that has a unicode representation. You need to give PyFile_WriteObject a unicode string. It surprises me that on one else is complaining about these Python bugs. Neil