durusmail: quixote-users: Request for comments: changing the cache control headers
Request for comments: changing the cache control headers
Request for comments: changing the cache control headers
2009-12-17
Request for comments: changing the cache control headers
Request for comments: changing the cache control headers
Neil Schemenauer
2009-12-16
Greetings Quixote and QP users,

[tl;dr: see the patch below]

Recently I ran into trouble with certain client and the problem
appears to be caused by unexpected caching of dynamic resources. I
haven't been able to conclusively determine the problem yet, but
while looking into it, I realized that Quixote's use of the Expires
header is not sufficient to prevent unwanted caching (QP inherited
the same code). Has anyone else seen caching problems "in the wild"
where "Expires: -1" is not sufficient?

Specifically, RFC 2616 (HTTP 1.1) recognizes the Expires header but
allows clients and proxies to ignore it (i.e. use a stale copy) in
certain cases (section 13.1.5 and 14.9.4). You must set the
must-revalidate directive of the Cache-Control header to prevent
this. On the one hand, this seems like a kind of "do it harder"
stupidity but I can imagine use cases. For example, if CNN's news
page expired in 10 minutes but your connection was horribly slow
then allowing stale information for say 1 hour would be reasonable.

In any case, there is not point complaining about the design,
servers must work with the web as it is, not as we would like. The
simplest thing to do would be to set both "Expires: -1" and
"Cache-Control: no-cache" for dynamically generated pages. Modern
browsers should do what we expect.

Unfortunately, it looks like older browsers conflated the cache with
the history mechanism (e.g. the back button). IMHO, the back button
should always work. RFC 2616 is quite clear, section 13.13:


    [...]History mechanisms and caches are different. In particular
    history mechanisms SHOULD NOT try to show a semantically transparent
    view of the current state of a resource. Rather, a history mechanism
    is meant to show exactly what the user saw at the time when the
    resource was retrieved.

    By default, an expiration time does not apply to history mechanisms.
    If the entity is still in storage, a history mechanism SHOULD
    display it even if the entity has expired, unless the user has
    specifically configured the agent to refresh expired history
    documents.[...]

Unfortunately older versions of IE do the wrong thing and prevent you
from using the back button if no-cache is set. It looks like IE >= 5
fixed that problem:

    http://support.microsoft.com/kb/199805

Since IE 4 is positively prehistoric, maybe we don't care and
should just use no-cache.

My proposed change is to keep the Expires header and add a
Cache-Control header with a max-age and and a must-validate
directive if max-age is 0. I think this should be sufficient to
prevent detrimental caching and also shouldn't provoke any weird
browser behavior. See the patch below.

  Neil


--- a/quixote/http_response.py
+++ b/quixote/http_response.py
@@ -124,10 +124,10 @@ class HTTPResponse:
         future requests.  The cookie value is stored as the "value"
         attribute.  The other attributes are as specified by RFC 2109.
       cache : int | None
-        the number of seconds the response may be cached.  The default is 0,
-        meaning don't cache at all.  This variable is used to set the HTTP
-        expires header.  If set to None then the expires header will not be
-        added.
+        the number of seconds the response may be cached.  The default
+        is 0, meaning don't cache at all.  This variable is used to set
+        the HTTP expires and cache-control headers.  If set to None then
+        no headers will not be added.
       javascript_code : { string : string }
         a collection of snippets of JavaScript code to be included in
         the response.  The collection is built by calling add_javascript(),
@@ -138,7 +138,7 @@ class HTTPResponse:
     DEFAULT_CONTENT_TYPE = 'text/html'
     DEFAULT_CHARSET = None # defaults to quixote.DEFAULT_CHARSET

-
+
     def __init__(self, status=200, body=None, content_type=None, charset=None):
         """
         Creates a new HTTP response.
@@ -412,14 +412,37 @@ class HTTPResponse:

         # Cache directives
         if self.cache is None:
-            pass # don't mess with the expires header
-        elif "expires" not in self.headers:
+            pass # don't mess with the expires or cache control header
+        else:
+            # We add both an Expires header and a Cache-Control header
+            # with a max-age directive.  The max-age directive takes
+            # priority when both Expires and max-age are present (even
+            # if Expires is more restrictive, RFC 2616 section 14.9.3).
             if self.cache > 0:
                 expire_date = formatdate(now + self.cache)
+                cache_control = "max-age=%d" % self.cache
             else:
-                expire_date = "-1" # allowed by HTTP spec and may work better
-                                   # with some clients
-            headers.append(("Expires", expire_date))
+                # The is the default case and makes sense for a dynamically
+                # generated response that can change on each request.
+                #
+                # Using the current date is not a good idea since clocks
+                # might not be synchronized. Any invalid date is treated
+                # as in the past but Microsoft recommends "-1" for
+                # Internet Explorer so that's what we use.
+                expire_date = "-1"
+                # The Expires header is sufficient for HTTP 1.0 but
+                # for HTTP 1.1 we must add a must-revalidate directive.
+                # Clients and proxies are allowed to ignore Expires in
+                # certain cases and use stale pages (RFC 2616 sections
+                # 13.1.5 and 14.9.4).
+                cache_control = "max-age=0, must-revalidate"
+            if ("expires" not in self.headers and
+                    "cache-control" not in self.headers):
+                # If either of these headers are set then don't add
+                # any of them. We assume the programmer knows what he
+                # is doing in that case.
+                headers.append(("Expires", expire_date))
+                headers.append(("Cache-Control", cache_control))

         # Content-type
         if "content-type" not in self.headers:
reply