> Why is there a significant performance improvement, considering that > mod_python embeds the Python interpreter in the web server, while SCGI > keeps the interpreter in a separate process? Trust me, I've spent a fair amount of time wondering about that. To a great extent, I think it comes down to locality and efficient memory use. With the mod_python approach, I had some 50-60 apache processes (only, I had to cap it there) running with the full Python interpreter and all the site code loaded. Traffic tended to be distributed among the servers. The result was poor use of the operating system's file caches, and of the processor cache as well. The most common hits on the LWN site (front page without being logged in, and the RSS files) are optimized enough that the relevant code and data should stay in the processor cache, leading to better performance - if you don't have 60 of them running at once. Here's a ps from the server, running since last night: nobody 10150 0.1 0.4 5408 2252 pts/4 S Feb19 1:49 lwn-scgi nobody 10151 2.1 2.5 15588 13164 pts/4 S Feb19 30:32 lwn-scgi nobody 10152 0.2 1.3 9660 6952 pts/4 S Feb19 4:10 lwn-scgi nobody 10153 0.0 1.1 8564 5664 pts/4 S Feb19 0:44 lwn-scgi nobody 10154 0.0 1.0 8100 5396 pts/4 S Feb19 0:12 lwn-scgi nobody 10155 0.0 0.9 8048 5116 pts/4 S Feb19 0:04 lwn-scgi nobody 10157 0.0 0.9 8040 4956 pts/4 S Feb19 0:02 lwn-scgi nobody 10159 0.0 0.9 8032 5036 pts/4 S Feb19 0:02 lwn-scgi nobody 10165 0.0 0.8 7888 4184 pts/4 S Feb19 0:01 lwn-scgi nobody 10166 0.0 0.7 7704 3848 pts/4 S Feb19 0:01 lwn-scgi nobody 10167 0.0 0.9 7992 4720 pts/4 S Feb19 0:01 lwn-scgi The first line is the dispatcher process; the second is the hard-working server process that handles almost all the hits. It's relatively rare that the dispatcher has to go beyond that one process. The result is better cache performance and an almost-zero page fault rate. I get an even bigger improvement, BTW, when testing on a multiprocessor system. SCGI divides the work nicely between the CPUs. [And yes, before somebody points it out, there is clearly a memory leak in there somewhere. Concentrating the traffic on one process has brought that out in an very clear way. I'm not quite sure how I'm going to track that down... you're not supposed to have to deal with memory leaks in Python...] Anyway, that's my hand-waving explanation of why I get better performance out of SCGI. jon Jonathan Corbet Executive editor, LWN.net corbet@lwn.net