Well if you assume utf8, then there is a simpler workaround. The idea is
that if you assume utf-8 encoding you can also immediately encode the
unicode returned from gettext. Here is a wrapper to make that easy.
import gettext
class EncodingTranslationsProxy(gettext.NullTranslations):
def __init__(self, trans, encoding='utf-8'):
self.trans = trans
self.__encoding = encoding
def __getattr__(self, name):
return getattr(self.trans, name)
def gettext(self, message):
return self.trans.ugettext(message).encode(self.__encoding)
def translation(domain, locale, language):
return EncodingTranslationsProxy(gettext.translation(domain,
locale,
languages=[language]))
Regards,
Maas
> For the first time, I have a Quixote application that requires
> internationalization/localization. I wanted to share some notes about
> my experiences with Unicode and PTL, and the patches I'm using to make
> it work.
>
> In the application code, the interface is expressed in English. Our
> first translation is from English to French, but I need to allow for
> possible translations into other, non-European languages.
>
> (As an aside: I've also written a simple gettext() substitute to
> facilitate my translations. Its translation function returns Unicode
> instances. For example:
>
> >>> import xlate
> >>> xlate.set_locale('fr')
> >>> u = _('Next Answer')
> >>> u
> u'Prochaine R\xe9ponse'
>
> I use the translation function throughout my app, primarily in PTL
> modules. The key point is that it returns Unicode instances.)
>
> I don't want to juggle multiple encodings, so I'm settling on UTF-8 as
> a de-facto standard encoding for my application.
>
>
> The problem
> ===========
>
> Perhaps unsurprisingly, the 'htmltext' implementations have not played
> very nicely with Unicode strings. Examples of expressions that will
> raise exceptions:
>
> from quixote.html import htmltext
> phrase = u'Prochaine r\xe9ponse'
> htmltext(phrase) # will fail
> htmltext('%s') % phrase # will fail
>
> Both expressions will fail with "UnicodeEncodeError: 'ascii' codec
> can't encode character u'\xe9' in position 11: ordinal not in
> range(128)". The failure occurs in both the C and Python
> implementations of htmltext.
>
> Needless to say, this is a show-stopper of a problem. I needed to get
> Unicode expressions into my PTL functions in a sensible,
> straightforward manner.
>
>
> My workaround
> =============
>
> I've managed to work around the problem, though not very
> elegantly. What I did:
>
> - assume UTF-8 encoding as a de facto standard. All request handlers
> that return text must ensure that the response Content-Type is set to
> "text/*; charset=UTF-8" where "*" is the appropriate subtype
> (e.g. text/html).
>
> - patch quixote/html.py to ensure that the C implementation of
> htmltext is not used. (This will be necessary until I get around to
> fixing the C implementation.) Raising an ImportError at the right spot
> does the job.
>
> - patch quixote/_py_htmltext.py so that the classes 'htmltext' and
> '_QuoteWrapper' both anticipate Unicode input, and handle it by
> encoding it into UTF-8 text. (_QuoteWrapper was already Unicode-aware,
> but it chose to encode into Latin-1, which seems a short-sighted
> if expedient solution.)
>
> The two patches appear below.
>
> These three changes have the expected consequence of allowing Unicode
> strings to work as expected within PTL functions (and other places
> where htmltext() is used). To be fair, my tests have been rather
> limited (only my English-to-French translations), but should be
> representative.
>
>
> Problems with my approach
> =========================
>
> - although assuming UTF-8 is a reasonable standard (since all Unicode
> characters can be expressed in UTF-8), it's not optimal for all
> possible uses. Someone would likely prefer or require a different
> encoding. Nonetheless, assuming UTF-8 is better than assuming
> ASCII.
>
> - I need to explicitly set the encoding on each textual response. Not
> too big a deal, since most responses call some kind of header()
> function, where the content-type can be set. But in cases where
> non-UTF-8 text is to be returned, the behaviour I've introduced may be
> surprising. (For example, returning a text file encoded in Latin-1
> will result in some garbage characters if the UTF-8 content-type
> header isn't overridden.) But at least it's not magical: with an
> underlying assumption of UTF-8, such problems become obvious and
> can be fixed easily.
>
> - other problems? Nothing I've encountered yet, but I'm sure it can't
> be this simple.
>
>
> * * *
>
> I would be interested in feedback from my fellow Quixote users! Have
> you encountered the same problem, and solved it in a better fashion?
> Ideally, I would like to see Quixote "fixed" so that it handles
> Unicode more gracefully, but I'm too focussed on my current app to
> know what would qualify as an appropriate general solution.
>
> -- Graham
>
>
>
> The patches
> ===========
>
>
> --- F:\quixote\html.py Wed Dec 03 18:30:30 2003
> +++ html.py Sat Jul 31 02:04:45 2004
> @@ -59,8 +59,9 @@
> import urllib
> from types import UnicodeType
> try:
> + raise ImportError
> # faster C implementation
> from quixote._c_htmltext import htmltext, htmlescape,
> _escape_string, \
> TemplateIO
> except ImportError:
>
>
>
> --- F:\quixote\_py_htmltext.py Wed Dec 03 18:30:30 2003
> +++ _py_htmltext.py Sat Jul 31 02:08:21 2004
> @@ -42,12 +42,14 @@
> using entities.
> """
>
> __slots__ = ['s']
>
> def __init__(self, s):
> + if isinstance(s, unicode):
> + s = s.encode('utf-8')
> self.s = str(s)
>
> # XXX make read-only
> #def __setattr__(self, name, value):
> # raise AttributeError, 'immutable object'
>
> @@ -158,12 +160,14 @@
> # helper for htmltext class __mod__
>
> __slots__ = ['value', 'escape']
>
> def __init__(self, value, escape):
> self.value = value
> + if isinstance(self.value, unicode):
> + self.value = self.value.encode('utf-8')
> self.escape = escape
>
> def __str__(self):
> return self.escape(str(self.value))
>
> def __repr__(self):
> @@ -187,13 +191,13 @@
> already a 'htmltext' object then the HTML markup characters \",
> <, >,
> and & are first escaped.
> """
> if classof(s) is htmltext:
> return s
> elif isinstance(s, UnicodeType):
> - s = s.encode('iso-8859-1')
> + s = s.encode('utf-8')
> else:
> s = str(s)
> # inline _escape_string for speed
> s = s.replace("&", "&") # must be done first
> s = s.replace("<", "<")
> s = s.replace(">", ">")
>
> _______________________________________________
> Quixote-users mailing list
> Quixote-users@mems-exchange.org
> http://mail.mems-exchange.org/mailman/listinfo/quixote-users
>