durusmail: qp: Occasional bugs with durus
Occasional bugs with durus
2007-03-25
2007-03-26
2007-03-27
2007-03-27
2007-03-27
Occasional bugs with startup
2007-03-27
2007-03-27
2007-03-26
Occasional bugs with durus
David K. Hess
2007-03-27
Thanks for fixing that traceback.

As to the lingering pidfile, I've shied away from the thought of
signal handlers due to situations like SIGSEGV.

I think forcing stop_durus to remove the pidfile could work for me.
I'm manipulating the Site object directly and calling start_durus.
With your change, I could add a call to stop_durus right before the
call to start_durus and I think that will prevent this situation.

However, if durus *did* happen to be running, will I run into a new
race condition between the previous durus giving up the lock on the
database file before the new one tries to lock it?

Pondering how to handle that makes me start to think that maybe
is_durus_running might be best off trying to get a lock on the
database file.

Dave

------
David K. Hess
Verscend Technologies, Inc.
dhess@verscend.com
214-684-5448



On Mar 26, 2007, at 12:39 PM, David Binger wrote:

>
> On Mar 25, 2007, at 1:24 AM, David K. Hess wrote:
>
>>
>> Using qp 1.9.1, I on occasion get the following traceback from qp
>> when stopping a site:
>>
>> Traceback (most recent call last):
>>   File "/usr/local/bin/qp", line 102, in 
>>     site.stop_durus()
>>   File "/usr/local/lib/python2.5/site-packages/qp/lib/site.py",
>> line 425, in stop_durus
>>     unlink(self.get_durus_pidfile())
>> OSError: [Errno 2] No such file or directory: '[edited].durus.pid'
>>
>> It looks like a race condition between the durus process removing
>> the pidfile and qp also trying to remove the pidfile. Perhaps the
>> unlink call in Site.stop_durus should be wrapped with a try: clause?
>
> Durus itself knows nothing of pidfiles, so this is not a Durus bug
> or a
> race between qp qnd something in Durus.
> If there is a change to make to avoid this problem,
> the change will be in qp, probably in site.py.
>
>>
>> Also, I've got another odd problem that occurs on occasion related
>> to the pid file. If the durus server gets killed by a signal (say
>> SIGTERM), its pidfile gets left around. It won't get cleaned up
>> with a later call to stop_durus (say via qp) since
>> is_durus_running blocks that action.
>
> Perhaps stop_durus() should always remove the pidfile, it exists. ?
> Should we register a signal handler to remove the pidfile?
>
>>
>> What can then happen is if you reboot the OS and happen to get a
>> non-durus long running process with the pid of the previous one
>> that exited on SIGTERM, is_durus_running will block durus from
>> starting. You have to remove the .durus.pid file by hand to
>> correct the situation.
>>
>> This would seem to be an extremely unlikely occurrence but I've
>> had it bite me twice now.
>>
>> Would a reasonable fix be to add a call to wait_for_server in
>> is_durus_running to verify that the process running with that PID
>> is really the durus server we think it is?
>
> That does sound reasonable, but there still could be some *other*
> process listening on
> the same address, or the address might possibly have been changed,
> so is_durus_running()
> is still not really definitive.
>
> I think maybe having the stop_* methods force pidfile removal is
> the easiest
> to implement and understand.  Then "qp -d" would always clear up
> any pidfiles left
> from, say, a power failure.
>
> Would that suffice for your situation?
>
>
>
>
> _______________________________________________
> QP mailing list
> QP@mems-exchange.org
> http://mail.mems-exchange.org/mailman/listinfo/qp

reply