On Fri, Jul 11, 2003 at 01:09:35PM +0400, Oleg Broytmann wrote:
> On Fri, Jul 11, 2003 at 10:32:25AM +0200, Bud P. Bruegger wrote:
> > But the functions you pointed me to don't affect accented characters,
> > umlauts, etc.
>
> These functions are for *html* quoting. If you need *URL* quoting you
> need urllib.quote, urllib.quote_plus or quixote.html.url_quote.
I think the OP is looking for something that will turn 'é' into
'é' or 'é', for instance, for use where UTF-8 isn't
supported (and I'd suggest trying UTF-8 first!). Otherwise, something like:
def escape_to_entities(string):
string = string.replace('&', '&')
string = string.replace('<', '<')
string = string.replace('>', '>')
string = string.replace('"', '"')
result = []
for s in string:
if ord(s) > 0x7f:
s = '%d;'%ord(s)
result.append(s)
return ''.join(result)
Alternatively, with python 2.3, if you want to map to named entities:
import htmlentitydefs
codepoint2entity = {}
for c in htmlentitydefs.codepoint2name:
codepoint2entity[c] = '&%s;'%unicode(htmlentitydefs.codepoint2name[c])
def escape_to_entities(string):
ustr = unicode(string).translate(codepoint2entity)
result = []
for s in ustr:
if ord(s) > 0x7f:
s = '%d;'%ord(s)
result.append(s)
return ''.join(result)
(This should also work in 2.2 if you use 2.3's htmlentitydefs)
'course, this would be a bit slow. Hmm, maybe there should be a codec so
you could do u'\u00e9'.encode('html-text').
--
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/
|cookedm@physics.mcmaster.ca