Re: [Quixote-users] SCGI vs. mod_python (was RFC: SCGI graceful restart)

> Why is there a significant performance improvement, considering that
> mod_python embeds the Python interpreter in the web server, while SCGI
> keeps the interpreter in a separate process?

Trust me, I've spent a fair amount of time wondering about that.

To a great extent, I think it comes down to locality and efficient memory
use.  With the mod_python approach, I had some 50-60 apache processes
(only, I had to cap it there) running with the full Python interpreter and
all the site code loaded.  Traffic tended to be distributed among the
servers.  The result was poor use of the operating system's file caches,
and of the processor cache as well.  The most common hits on the LWN site
(front page without being logged in, and the RSS files) are optimized
enough that the relevant code and data should stay in the processor cache,
leading to better performance - if you don't have 60 of them running at
once.

Here's a ps from the server, running since last night:

nobody   10150  0.1  0.4  5408 2252 pts/4    S    Feb19   1:49 lwn-scgi
nobody   10151  2.1  2.5 15588 13164 pts/4   S    Feb19  30:32 lwn-scgi
nobody   10152  0.2  1.3  9660 6952 pts/4    S    Feb19   4:10 lwn-scgi
nobody   10153  0.0  1.1  8564 5664 pts/4    S    Feb19   0:44 lwn-scgi
nobody   10154  0.0  1.0  8100 5396 pts/4    S    Feb19   0:12 lwn-scgi
nobody   10155  0.0  0.9  8048 5116 pts/4    S    Feb19   0:04 lwn-scgi
nobody   10157  0.0  0.9  8040 4956 pts/4    S    Feb19   0:02 lwn-scgi
nobody   10159  0.0  0.9  8032 5036 pts/4    S    Feb19   0:02 lwn-scgi
nobody   10165  0.0  0.8  7888 4184 pts/4    S    Feb19   0:01 lwn-scgi
nobody   10166  0.0  0.7  7704 3848 pts/4    S    Feb19   0:01 lwn-scgi
nobody   10167  0.0  0.9  7992 4720 pts/4    S    Feb19   0:01 lwn-scgi

The first line is the dispatcher process; the second is the hard-working
server process that handles almost all the hits.  It's relatively rare that
the dispatcher has to go beyond that one process.  The result is better
cache performance and an almost-zero page fault rate.

I get an even bigger improvement, BTW, when testing on a multiprocessor
system.  SCGI divides the work nicely between the CPUs.

[And yes, before somebody points it out, there is clearly a memory leak in
there somewhere.  Concentrating the traffic on one process has brought that
out in an very clear way.  I'm not quite sure how I'm going to track that
down... you're not supposed to have to deal with memory leaks in Python...]

Anyway, that's my hand-waving explanation of why I get better performance
out of SCGI.

jon

Jonathan Corbet
Executive editor, LWN.net
corbet@lwn.net