durusmail: qp: QPY h8 and unicode.replace
QPY h8 and unicode.replace
2008-05-02
2008-05-02
2008-05-04
2008-05-04
2008-05-05
2008-05-06
QPY h8 and unicode.replace
Mario Ruggier
2008-05-06
On May 4, 2008, at 2:02 PM, Binger David wrote:

>> In any case, what would seem to me to be natural and consistent
>> behaviour is to just quote the match and/or replacement strings, if
>> they are themselves not yet an h8 instance.
>
> The problem is that replace (like strip and slice)  can
> break entities.
>
> w.replace(';', '.').

Ah, that is a concern I was not even thinking about. It would be
indeed worrisome to have things like that be done on your xml. On the
other hand, qpy constrains itself strictly to *generic* string
manipulations, and not to XML string manipulation. I.e. XMl semantics,
beyond the escape characters, are by choice not addressed in any way.
To be able to do proper XML-semantic-respecting string replacement,
proper XML parsing would be needed...

Plus, thinking more about the concern of anyone actually doing the
crazy substitution as above, I think whether you ensure that the match
";" and replacement "." strings are first cast to h8, or whether you
"downcast" the ref string to unicode (and do the operation in unicode)
will always give the same result... so, the concern is either way not
addressed.

But, if the match or the replacement string was a special char, then
it would make a difference (but in the inverse sense!):

$ python
Python 2.5.1 (r251:54863, Feb  4 2008, 21:48:13)
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
 >>> from qpy import h8
 >>> h = h8("")
 >>> h.replace(">", "<")
u'>> h.replace(h8("")+">", h8("")+"<") # ensure match/replacements are
safely escaped
u''
 >>>

So, if i understand correctly, the concern you raise will be better
addressed when the match and replacement strings are ensured to be
safely quoted.

mario

reply