Mike Orr wrote:
> def filter(self, val, **kw):
> val = htmlescape(val)
> if isinstance(val, htmltext):
> return str(val) # Cheetah > 1.0rc1 compatibility.
> else:
> return val
>
> In this case it's trying to filter U"A\xa0B" retrieved from the
> database. That's "AB" with the degree symbol in between. Voila:
>
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in
> position 1: ordinal not in range(128)
Right, that's the same as trying str(U"A\xa0B").
> OK, let's try returning Unicode instead.
>
> return unicode(val, 'latin1') # Cheetah > 1.0rc1 compatibility.
>
> TypeError: coercing to Unicode: need string or buffer, htmltext found
You already have a unicode string. Decoding from 'latin1' makes no
sense. If you had a unicode object instead of a htmltext object,
you would still get an error:
>>> unicode(U"A\xa0B", "latin1")
Traceback (most recent call last):
File "", line 1, in ?
TypeError: decoding Unicode is not supported
> Darn it, why didn't htmltext subclass str!!!
Because then people who wanted to represent Unicode characters would
be out of luck. Qpy makes htmltext a subclass of unicode. That
forces everyone who uses it to correctly handle unicode strings.
> Peeking into the htmltext implementation, it stores the actual
> value in an attribute ..s:
>
> return unicode(val.s, 'latin1') # Cheetah > 1.0rc1 compatibility.
>
> TypeError: decoding Unicode is not supported
Same error as my code snippet above. You already have a unicode
string. Decoding it makes no sense.
> How about this?
>
> return unicode(val.s) # Cheetah > 1.0rc1 compatibility.
>
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in
> position 412: ordinal not in range(128)
That should work, although this the following would do the same
thing since s is already a unicode string:
return val.s
The UnicodeEncodeError is being raised by some other code, I expect.
See below.
>>>> print htmltext(U"A\xa0B")
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in
> position 1: ordinal not in range(128)
Does this work for you:
>>> print U"A\xa0B"
It works for me because:
>>> import sys
>>> sys.stdout.encoding
'UTF-8'
Sometimes stdout is 'ascii' and so you have to manually set the
encoding, eg:
>>> import sys, codecs
>>> sys.stdout = codecs.getwriter('utf-8')(sys.stdout)
The other problem you are running into is a bug in Python, IMHO.
You can't print an object that has a __str__ (or __unicode__) method
that returns a unicode string:
>>> class A:
... def __str__(self):
... return u"\u1234"
...
>>> print A()
UnicodeEncodeError
another try:
>>> class A:
... def __unicode__(self):
... return u"\u1234"
...
>>> print A()
<__main__.A instance at 0xb7dd1f4c>
I'm going to post a patch for the Python bug. Hopefully it will get
applied for Python 2.5. The answer you are looking for, I think,
is:
def filter(self, val, **kw):
val = htmlescape(val)
if isinstance(val, htmltext):
return val.s # Cheetah > 1.0rc1 compatibility.
else:
return val
alternatively,
def filter(self, val, **kw):
val = htmlescape(val)
return stringify(val) # from quixote.html
>
>>>> print htmltext("A\xa0B")
> UnicodeEncodeError
Try:
print stringify(htmltext("A\xa0B")
Again, PyFile_WriteObject cannot print an object that has a unicode
representation. You need to give PyFile_WriteObject a unicode
string. It surprises me that on one else is complaining about these
Python bugs.
Neil