Skip Montanaro wrote: > Graham> Neil, what would we have to do to set Quixote right? We have a > Graham> lot of hands (though time is always scarce ;-) but I doubt most > Graham> of us have a clear appreciation of the problem. > > Don't believe when Neil says he's "not familiar with Unicode". He's > apparently familiar enough to know where most of the stumbling blocks > are. ;-) His previous message hit most/all the places I encountered. > > In theory, if all body text was accumulated as unicode objects instead of > strings and there was a way to convey the desired encoding down to the level > where stuff gets encoded into strings for tranmission, there'd be little > else to do. I don't know what sort of breakage lower-level APIs in Quixote > can tolerate, but I personal have nothing yet to be backward-compatible > with, so I can probably charge ahead like a bull in a china shop and get > something which works for me but isn't terribly pretty. > That's kind of what I thought. Maybe I had a clue after all ;-) I wonder whether "conveying the desired encoding" shouldn't just be done at the top level: quixote.enable_ptl(encoding_in='iso-8859-1', encoding_out='utf-8') unless the ptl_importer could check the encoding hint on the .ptl source file by looking for a # -*- coding: iso-8859-1 -*- line; then just an outbound encoding would be necessary. Ultimately the Publisher object receives an 'output' value, which it expects to be a string (or convertable to one). I'm picturing something like this: if isunicode(output): out_enc = quixote.get_outbound_encoding() # made up output = output.encode(out_enc) request.response.set_content_type(encoding=out_enc) # made up return output in Publisher.try_publish(). And perhaps if no inbound/outbound encodings were set, Quixote could failover to its current string-based implementation, avoiding any legacy issues. > Skip -- Graham