durusmail: quixote-users: html encode??? (special characters to entities)
html encode??? (special characters to entities)
2003-07-11
2003-07-11
2003-07-11
2003-07-11
2003-07-11
html encode??? (special characters to entities)
David M. Cooke
2003-07-11
On Fri, Jul 11, 2003 at 01:09:35PM +0400, Oleg Broytmann wrote:
> On Fri, Jul 11, 2003 at 10:32:25AM +0200, Bud P. Bruegger wrote:
> > But the functions you pointed me to don't affect accented characters,
> > umlauts, etc.
>
>    These functions are for *html* quoting. If you need *URL* quoting you
> need urllib.quote, urllib.quote_plus or quixote.html.url_quote.

I think the OP is looking for something that will turn 'é' into
'é' or 'é', for instance, for use where UTF-8 isn't
supported (and I'd suggest trying UTF-8 first!). Otherwise, something like:

def escape_to_entities(string):
    string = string.replace('&', '&')
    string = string.replace('<', '<')
    string = string.replace('>', '>')
    string = string.replace('"', '"')
    result = []
    for s in string:
        if ord(s) > 0x7f:
            s = '&#%d;'%ord(s)
        result.append(s)
    return ''.join(result)

Alternatively, with python 2.3, if you want to map to named entities:

import htmlentitydefs
codepoint2entity = {}
for c in htmlentitydefs.codepoint2name:
    codepoint2entity[c] = '&%s;'%unicode(htmlentitydefs.codepoint2name[c])

def escape_to_entities(string):
    ustr = unicode(string).translate(codepoint2entity)
    result = []
    for s in ustr:
        if ord(s) > 0x7f:
            s = '&#%d;'%ord(s)
        result.append(s)
    return ''.join(result)

(This should also work in 2.2 if you use 2.3's htmlentitydefs)

'course, this would be a bit slow. Hmm, maybe there should be a codec so
you could do u'\u00e9'.encode('html-text').

--
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke                      http://arbutus.physics.mcmaster.ca/dmc/
|cookedm@physics.mcmaster.ca

reply