FWIW, I switched LWN over to the SCGI mode of operation (from mod_python) a week or so ago. I've been well pleased with the change; the performance improvement is significant. The 1.0b1 release works well. One thing that has been missing, for me, is a "graceful restart" mechanism. I've not found a reliable way of doing the "kill and restart" routine that does not return server errors for a second or two. On LWN, a second's downtime fails 5-20 requests, and that kind of bugs me. It's not the image we like to project....normally we may not know what we're talking about, but at least we can say it reliably... In thinking about this, it occurred to me that you don't really have to kill the main SCGI server process - you just have to make it kill off its children and start over. As long as nothing of important has been imported into the dispatcher process, the new children will pick up any code changes. And the whole thing goes forward with a brief hiccup, and no failed requests. Here's the patch, in case anybody can think of a better way. I'm not running this production yet, but it performs nicely under heavy load on my test box. I doubt it's something that anybody would want to merge into a 1.0beta release, but, if it's useful to others, maybe it could go in post 1.0. (As an aside, does anybody know why the signal module doesn't have sigaction() and sigprocmask()? Is it just that nobody has ever implemented it, or is there a deeper reason?) jon Jonathan Corbet Executive editor, LWN.net corbet@lwn.net --- scgi/scgi_server.py.1.0b1 2003-02-19 16:18:59.000000000 -0700 +++ scgi/scgi_server.py 2003-02-20 10:29:18.000000000 -0700 @@ -12,6 +12,7 @@ import select import errno import fcntl +import signal from scgi import passfd # netstring utility functions @@ -48,11 +49,11 @@ def serve(self): while 1: - os.write(self.parent_fd, "1") # indicates that child is ready try: + os.write(self.parent_fd, "1") # indicates that child is ready fd = passfd.recvfd(self.parent_fd) - except IOError: - # parent probably exited + except (IOError, OSError): + # parent probably exited (EPIPE comes thru as OSError) raise SystemExit conn = socket.fromfd(fd, socket.AF_INET, socket.SOCK_STREAM) # Make sure the socket is blocking. Appearently, on FreeBSD the @@ -100,7 +101,14 @@ self.max_children = max_children self.children = {} # { pid : fd } self.spawn_child() - + self.restart = 0 + + # + # Deal with a hangup signal. All we can really do here is + # note that it happened. + # + def hup_signal(self, signum, frame): + self.restart = 1 def spawn_child(self, conn=None): parent_fd, child_fd = passfd.socketpair(socket.AF_UNIX, @@ -128,6 +136,27 @@ os.close(self.children[pid]) del self.children[pid] + def do_restart(self): + # + # First close connections to the children, which will cause them + # to exit after finishing what they are doing. + # + for fd in self.children.values(): + os.close(fd) + # + # Then do a blocking wait on each until we have cleared the + # slate. + # + for pid in self.children.keys(): + (pid, status) = os.waitpid(pid, 0) + self.children = {} + # + # Fire off a new child, we'll be wanting it soon. + # + self.spawn_child() + self.restart = 0 + + def delegate_request(self, conn): """Pass a request fd to a child process to handle. This method blocks if all the children are busy and we have reached the @@ -147,7 +176,12 @@ timeout = 0 while 1: - r, w, e = select.select(self.children.values(), [], [], timeout) + try: + r, w, e = select.select(self.children.values(), [], [], timeout) + except select.error, e: + if e[0] == errno.EINTR: # got a signal, try again + continue + raise if r: # One or more children look like they are ready. Sort # the file descriptions so that we keep preferring the @@ -209,10 +243,17 @@ s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) s.bind((self.host, self.port)) s.listen(40) + signal.signal(signal.SIGHUP, self.hup_signal) while 1: - conn, addr = s.accept() - self.delegate_request(conn) - conn.close() + try: + conn, addr = s.accept() + self.delegate_request(conn) + conn.close() + except socket.error, e: + if e[0] != errno.EINTR: + raise # something weird + if self.restart: + self.do_restart() def main():