Skip Montanaro wrote:
> Graham> Neil, what would we have to do to set Quixote right? We have a
> Graham> lot of hands (though time is always scarce ;-) but I doubt most
> Graham> of us have a clear appreciation of the problem.
>
> Don't believe when Neil says he's "not familiar with Unicode". He's
> apparently familiar enough to know where most of the stumbling blocks
> are. ;-) His previous message hit most/all the places I encountered.
>
> In theory, if all body text was accumulated as unicode objects instead of
> strings and there was a way to convey the desired encoding down to the level
> where stuff gets encoded into strings for tranmission, there'd be little
> else to do. I don't know what sort of breakage lower-level APIs in Quixote
> can tolerate, but I personal have nothing yet to be backward-compatible
> with, so I can probably charge ahead like a bull in a china shop and get
> something which works for me but isn't terribly pretty.
>
That's kind of what I thought. Maybe I had a clue after all ;-)
I wonder whether "conveying the desired encoding" shouldn't just be done
at the top level:
quixote.enable_ptl(encoding_in='iso-8859-1', encoding_out='utf-8')
unless the ptl_importer could check the encoding hint on the .ptl source
file by looking for a
# -*- coding: iso-8859-1 -*-
line; then just an outbound encoding would be necessary.
Ultimately the Publisher object receives an 'output' value, which it
expects to be a string (or convertable to one). I'm picturing something
like this:
if isunicode(output):
out_enc = quixote.get_outbound_encoding() # made up
output = output.encode(out_enc)
request.response.set_content_type(encoding=out_enc) # made up
return output
in Publisher.try_publish().
And perhaps if no inbound/outbound encodings were set, Quixote could
failover to its current string-based implementation, avoiding any legacy
issues.
> Skip
-- Graham