durusmail: quixote-users: Re: Need Medusa users to test patch (improving large response performance)...
Need Medusa users to test patch (improving large response performance)...
2003-06-04
Need Medusa users to test patch (improving large response performance)...
Re: Need Medusa users to test patch (improving large response performance)...
2003-06-05
Medusa patch rev. 2 (Re: Need Medusa users to test patch (improving large response performance)...)
2003-06-05
Re: Need Medusa users to test patch (improving large response performance)...
Graham Fawcett
2003-06-05
Neil Schemenauer wrote:



> I'd like to see a more general fix.

Agreed, although I'd call your suggestion a feature, not a fix. In the meantime
I'd still merge the proposed patch since it fixes a pretty serious performance
issue.

Instead of the publisher always
> calling str() on the response it should be possible to return a
> "streamed response".  Something like:
>
>     # in http_response.py
>     class ResponseStream:
>         def __iter__(self):
>             ...
>
> and then in publisher.py publish():
>
>     if output:
>         response.response.set_body(output)
>     request.response.write(stdout)
>
> finally in http_response.py:
>
>     def write(self, file):
>         ...
>         if isinstance(self.body, ResponseStream):
>             for chunk in self.body:
>                 file.write(chunk)
>         else:
>             stdout.write(str(self.body))
>

I like the iterate-over-stream approach... could lead to some efficient and
elegant app code. Certainly it would be a favoured approach for serving large
static filestreams.

 > There are a obviously a few details to resolve but I think it would be a
 > feasible solution.  Does someone want to try to make a patch?

A few thoughts first:

I would think that your approach should require or automagically set
Transfer-encoding to Chunked. Also, you wouldn't want to calculate
Content-length: it could easily be unknown without consuming the iterator. You
would want to actually suppress it, I think: if chunks = ['list', 'of',
'strings'], then a careless Content-length calculation might set it to
len(chunks) == 3.

In the case of async servers: they would probably do better with a lazy-writer,
i.e. passing the chunk iterator to the server rather than Quixote doing the
file.write() calls; that way the iteration is left to the last step in the
chain, async performance (probably) improves, and buffering along the chain is
avoided. Otherwise, there may be no mesaurable performance gain -- for example,
in 0.6 version of server.medusa_http, the "file" in question would still be a
StringIO, and the performance issue that hit me would still be in effect.

To achieve this for a generic async_server_http script, I'd override the
publish() method with a variant that tested the response body for
"iterability", and could wrap the body in the iterator/producer provided by the
async server's framework.

async_publisher.publish() might look like:

        ...
        resp = request.response
        resp.set_body(output)
        if self.deemed_to_be_iterable(output):  # or resp.iterable()?
                resp.set_header('Transfer-Encoding', 'Chunked')
                native_request.push(native_iterator_wrapper(output))
                native_request.send_headers(resp.headers)
                # but don't send Content-length, and avoid its calculation!
        else:
                ...
                request.response.write(my_stringio)
                native_request.push(str(my_stringio))
                ...
        ...

I'd vote for a protocol test rather than an instance-test of a new
ResponseStream class: something like `hasattr(self.body, 'iter_chunks')` rather
than `isinstance(self.body, ResponseStream)`.

Or perhaps even `hasattr(self.body, '__iter__')` : then one could return
iterators, or custom objects supporting the iteration protocol. This seems more
suitable for a general fix, and more in the spirit of Python. (+1 for not
saying "pythonic"... agh, I didn't mean to say that...)

cheers,

-- Graham



reply