durusmail: quixote-users: urllib.quote() and cgi.escape()
urllib.quote() change in 2.4.2
2006-01-16
2006-01-17
2006-01-17
2006-01-17
urllib.quote() and cgi.escape()
2006-01-17
2006-01-19
2006-01-20
2006-01-20
2006-01-23
2006-01-24
2006-01-26
2006-01-26
2006-01-27
2006-01-17
urllib.quote() and cgi.escape()
Patrik Simons
2006-01-26
On Thu, 26 Jan 2006 13:16:02 +0100 mario ruggier  wrote:

>
> On Jan 24, 2006, at 8:21 PM, Titus Brown wrote:
>
> > So, should htmlescape deal with this differently?
> >
> > right now it does this:
> >
> >>>> print str(htmlescape("'"))
> > '
> >>>> print str(htmlescape('"'))
> > "
>
> If you were trying to use these characters in a URI value (for their
> normal meaning in that context!) then my understanding is that you have
> to use their HTML char entities: & < > ". This way, the
> HTML document can be valid.
>
> If however you are trying to use them as a string literal value in a
> URI context, then you should use the %xx mechanism.

And here quixote.html.url_quote does it wrong, imho. If you set
quixote.DEFAULT_CHARSET to 'utf-8' and then url_quote a unicode string,
url_quote should first encode the string as utf-8 and then quote it.

It doesn't and quixote breaks with a UnicodeDecodeError on urls like
this one: u'/component?test=\xc4'

Compare:
>>> url_quote(u'\xc4')
'%C4'
>>> url_quote(u'\xc4'.encode('utf-8'))
'%C3%84'

The problem happens in the functions quixote.http_request.parse_query
and _decode_string: '\xc4'.decode('utf-8') -> UnicodeDecodeError.

>
> (I was however unable to easily find a clear and convenient statement
> of the above in RFC 2396).
>
> Now, in your original question, you were actually trying to use such
> characters in the literal value attribute of an input element... as
> this value can become a part of the URL for the page (e.g. in the
> querystring) than it should follow that it should be escaped with
> urllib.quote(), i.e. the %xx mechanism.
>
> So, similar to your original example:
> ''
> %("""contains'different"quotes&stuff""")
>
> and assume some other input field:
> ''
>
> if we submit the form (or specify the fields in the querystring for the
> page) we should end up with a  querystring such as:
> ?one=contains%27different%22quotes%26stuff&two=normal
>
> Note that the & (as delimeter!) is html escaped as it should be,
> but the & as literal value (%26) is url escaped (as it should be?).
>
> But, re your actual question above, I was under the impression that the
> "'" character should also be escaped with ' ... but, I see that
> this char entity is not even listed in
> . So, maybe not.
>
> mario
>
> _______________________________________________
> Quixote-users mailing list
> Quixote-users@mems-exchange.org
> http://mail.mems-exchange.org/mailman/listinfo/quixote-users
>


--
Patrik
reply