durusmail: qp: Occasional bugs with durus
Occasional bugs with durus
2007-03-25
2007-03-26
2007-03-27
2007-03-27
2007-03-27
Occasional bugs with startup
2007-03-27
2007-03-27
2007-03-26
Occasional bugs with durus
David Binger
2007-03-26
On Mar 25, 2007, at 1:24 AM, David K. Hess wrote:

>
> Using qp 1.9.1, I on occasion get the following traceback from qp
> when stopping a site:
>
> Traceback (most recent call last):
>   File "/usr/local/bin/qp", line 102, in 
>     site.stop_durus()
>   File "/usr/local/lib/python2.5/site-packages/qp/lib/site.py",
> line 425, in stop_durus
>     unlink(self.get_durus_pidfile())
> OSError: [Errno 2] No such file or directory: '[edited].durus.pid'
>
> It looks like a race condition between the durus process removing
> the pidfile and qp also trying to remove the pidfile. Perhaps the
> unlink call in Site.stop_durus should be wrapped with a try: clause?

Durus itself knows nothing of pidfiles, so this is not a Durus bug or a
race between qp qnd something in Durus.
If there is a change to make to avoid this problem,
the change will be in qp, probably in site.py.

>
> Also, I've got another odd problem that occurs on occasion related
> to the pid file. If the durus server gets killed by a signal (say
> SIGTERM), its pidfile gets left around. It won't get cleaned up
> with a later call to stop_durus (say via qp) since is_durus_running
> blocks that action.

Perhaps stop_durus() should always remove the pidfile, it exists. ?
Should we register a signal handler to remove the pidfile?

>
> What can then happen is if you reboot the OS and happen to get a
> non-durus long running process with the pid of the previous one
> that exited on SIGTERM, is_durus_running will block durus from
> starting. You have to remove the .durus.pid file by hand to correct
> the situation.
>
> This would seem to be an extremely unlikely occurrence but I've had
> it bite me twice now.
>
> Would a reasonable fix be to add a call to wait_for_server in
> is_durus_running to verify that the process running with that PID
> is really the durus server we think it is?

That does sound reasonable, but there still could be some *other*
process listening on
the same address, or the address might possibly have been changed, so
is_durus_running()
is still not really definitive.

I think maybe having the stop_* methods force pidfile removal is the
easiest
to implement and understand.  Then "qp -d" would always clear up any
pidfiles left
from, say, a power failure.

Would that suffice for your situation?




reply