On Mar 25, 2007, at 1:24 AM, David K. Hess wrote: > > Using qp 1.9.1, I on occasion get the following traceback from qp > when stopping a site: > > Traceback (most recent call last): > File "/usr/local/bin/qp", line 102, in> site.stop_durus() > File "/usr/local/lib/python2.5/site-packages/qp/lib/site.py", > line 425, in stop_durus > unlink(self.get_durus_pidfile()) > OSError: [Errno 2] No such file or directory: '[edited].durus.pid' > > It looks like a race condition between the durus process removing > the pidfile and qp also trying to remove the pidfile. Perhaps the > unlink call in Site.stop_durus should be wrapped with a try: clause? Durus itself knows nothing of pidfiles, so this is not a Durus bug or a race between qp qnd something in Durus. If there is a change to make to avoid this problem, the change will be in qp, probably in site.py. > > Also, I've got another odd problem that occurs on occasion related > to the pid file. If the durus server gets killed by a signal (say > SIGTERM), its pidfile gets left around. It won't get cleaned up > with a later call to stop_durus (say via qp) since is_durus_running > blocks that action. Perhaps stop_durus() should always remove the pidfile, it exists. ? Should we register a signal handler to remove the pidfile? > > What can then happen is if you reboot the OS and happen to get a > non-durus long running process with the pid of the previous one > that exited on SIGTERM, is_durus_running will block durus from > starting. You have to remove the .durus.pid file by hand to correct > the situation. > > This would seem to be an extremely unlikely occurrence but I've had > it bite me twice now. > > Would a reasonable fix be to add a call to wait_for_server in > is_durus_running to verify that the process running with that PID > is really the durus server we think it is? That does sound reasonable, but there still could be some *other* process listening on the same address, or the address might possibly have been changed, so is_durus_running() is still not really definitive. I think maybe having the stop_* methods force pidfile removal is the easiest to implement and understand. Then "qp -d" would always clear up any pidfiles left from, say, a power failure. Would that suffice for your situation?