[Quixote-users] Re: Unicode?

Skip Montanaro wrote:

>     Graham> Neil, what would we have to do to set Quixote right? We have a
>     Graham> lot of hands (though time is always scarce ;-) but I doubt most
>     Graham> of us have a clear appreciation of the problem.
>
> Don't believe when Neil says he's "not familiar with Unicode".  He's
> apparently familiar enough to know where most of the stumbling blocks
> are. ;-)  His previous message hit most/all the places I encountered.
>
> In theory, if all body text was accumulated as unicode objects instead of
> strings and there was a way to convey the desired encoding down to the level
> where stuff gets encoded into strings for tranmission, there'd be little
> else to do.  I don't know what sort of breakage lower-level APIs in Quixote
> can tolerate, but I personal have nothing yet to be backward-compatible
> with, so I can probably charge ahead like a bull in a china shop and get
> something which works for me but isn't terribly pretty.
>

That's kind of what I thought. Maybe I had a clue after all ;-)

I wonder whether "conveying the desired encoding" shouldn't just be done
at the top level:

    quixote.enable_ptl(encoding_in='iso-8859-1', encoding_out='utf-8')

unless the ptl_importer could check the encoding hint on the .ptl source
file by looking for a

   # -*- coding: iso-8859-1 -*-

line; then just an outbound encoding would be necessary.

Ultimately the Publisher object receives an 'output' value, which it
expects to be a string (or convertable to one). I'm picturing something
like this:

   if isunicode(output):
      out_enc = quixote.get_outbound_encoding()  # made up
      output = output.encode(out_enc)
      request.response.set_content_type(encoding=out_enc) # made up
   return output

in Publisher.try_publish().

And perhaps if no inbound/outbound encodings were set, Quixote could
failover to its current string-based implementation, avoiding any legacy
issues.

> Skip

-- Graham