[Quixote-users] "htmltext" commited to Quixote CVS

"htmltext" commited to Quixote CVS
2002-10-18
"htmltext" commited to Quixote CVS
Neil Schemenauer
2002-10-18
Just in case someone is interested in playing with this stuff I'm going
to give a quick overview of how it works.  Forgive me if I start
rambling.  It's late friday afternoon here. :-)

The CVS version of the PTL compiler supports a new syntax for declaring
template functions in PTL modules.  You can use:

    def something [text] (...):

or

    def something [html] (...):

The first example creates a template that is equivalent to using the old

    template something (...):

The old syntax is still supported.  The interesting stuff happens when
you use [html] templates.

There is a new type implemented in quixote.html named 'htmltext'.  It is
a string-like object that is used for designating HTML markup.  There is
also a new function named 'htmlescape'.  It does the same thing as the
old 'html_quote' except that the return value is always of type
'htmltext' and that 'htmltext' arguments are returned unchanged.

Think about that last sentence for a while as it is the foundation of
the htmltext system.  Calling 'html_quote' twice on some data ends up
doing too much quoting.  People writing library functions are faced with
a dilemma.  If you call html_quote on data passed in then HTML markup
cannot be used in the data.  If you don't call html_quote then the
caller has the responsibility of calling html_quote on unsafe data and
could easily forget.

With 'htmltext' and 'htmlescape' you don't need to worry about double
quoting.  If you are unsure about where some data came from, calling
htmlescape() on it doesn't hurt.  The cost of using this system is that
any bit of HTML markup must be in a 'htmltext' object.  Instead of a
system where everything is assumed to be safe and unsafe data must be
sanitized using 'html_quote' the system assumes everything is unsafe
unless wrapped in an 'htmltext' instance.

So, using 'htmltext' becomes the potential security hole.  To minimize
the risk that somthing unsafe gets marked as safe it is best to use the
htmltext() constructor on literal strings.  For example:

    a = htmltext('Hello world')

it much better than

    htmltext(a)

Since this is such a common thing, PTL automatically converts all
literal strings to the htmltext type if the template is declared as
[html].  For convenience, the 'htmltext' type is inserted into the
global namespace of PTL modules.  The new PTL compiler should be
completely backwards compatible and should work with Python 2.0 and 2.1
(although 2.2 is recommended).

In order to maintain backwards compatibility, most of the functions in
quixote.html grew 'htmltext' returning counterparts.  For html_quote()
there is htmlescape(), render_tag() has htmltag() and link() has href().
I didn't like creating all the new functions but it seemed to be the
only way to avoid breaking existing applications.  Hopefully the new
names are as good or better than the old ones.

The forms library has been changed to generate htmltext.  That change is
not backwards compatible.  The demo applications still need to be fixed.

A few issues that might bite when converting an application to htmltext:

    * sometimes you need a real string inside a [html] template.
      Adding a str() call around the string literal does the trick.
      We have found that often the code can be improved by refactoring.
      For example, we had lots of UI code that use DateTime.strftime().
      Rather than add str() calls around all of the format
      specifications we added helper functions to format dates in
      site standard ways (e.g. format_date(), format_date_time()).

    * the join() method on strings only accepts arguments that contain
      strings.  Usually the fix it to add an explicit htmltext call
      around the joiner (e.g. htmltext('').join(...)).

    * there is a bug in the Python interpreter that causes in-place add
      to fail if the LHS is a 'str' object.  The fix is to spell out the
      operation the old long way (e.g. msg = msg + htmltext('...')).
      This bug should be fixed in Python 2.3.

    * extra str() calls that used to be harmless now cause things to
      be double quoted (think about htmlescape(str(htmltext('
')))).
      Basicly str() strips the "safe" flag attached to htmltext objects
      and cause the data to be quoted again.  Note str() can also be
      written as: '%s' % something  (assuming the LHS literal is not
      in a [html] template)

I still need to do some benchmarking so see how much the htmltext
business is costing in terms of performance.  The good news is that if
you don't use it the PTL compiler generates basically the same code as
it always did.

Well, I've rambled on enough.  I hope some people are interested enough
to do a CVS checkout and give the new htmltext type a spin and send
feedback.

  Neil