durusmail: qp: Occasional bugs with startup
Occasional bugs with durus
2007-03-25
2007-03-26
2007-03-27
2007-03-27
2007-03-27
Occasional bugs with startup
2007-03-27
2007-03-27
2007-03-26
Occasional bugs with startup
David K. Hess
2007-03-27
In this particular case, I need to fix this in older releases of qp
and Durus and I can't use qp as a startup script (I'm the maniac
using qp kind of like a library in another long-running Python process).

So, I think I'll do this then for now in my startup sequence:

site.stop_durus()
time.sleep(1)
site.start_durus()

And I'll patch Site.stop_durus to always remove the pid file. That
should be forward compatible with your changes in future releases.

Just FYI, I've got even more complexity added on top of this. On
CentOS 4.2, os.kill(pid,0) is returning EPERM for a process that
isn't running (it's not even a zombie). It appears to be related to
that process having been a child of a process that is still running
since I was able to get it to stop happening by killing off one at a
time daemons with PIDs lower than the one in question.

Sigh.

Dave

------
David K. Hess
Verscend Technologies, Inc.
dhess@verscend.com
214-684-5448



On Mar 27, 2007, at 12:45 PM, David Binger wrote:

> Here is another idea, that seems really easy and effective
> for startup.  I can change the qp script so that
> "qp start" is actually the same as "qp -du".  That way,
> the pid files are cleaned up, if present, on a
> normal startup, and I think the "qp" script will
> work without changes as a startup script.
>>
>
>> site.stop_durus()
>> while site.is_durus_running():
>>   time.sleep(1)
>> site.start_durus()
>>
>
> I think a RuntimeError is better than this possibly infinite loop.
>
>
>> I'm leaning towards 2. Or, if is_durus_running() doesn't get
>> modified, I could just sleep for say 3 seconds between the
>> stop_durus and start_durus calls to create a big enough window
>> where the odds of a race condition are essentially none. Still a
>> little kludgey but should be effective enough.
>
> The next Durus release delays grabbing the lock until a write
> operation is started, so
> and there is a built-in delay in startup that should be enough to
> make sure that the
> previous process has plenty of time to die and release the lock.
> Also, stop_durus already waits for the bound socket to become
> available, and it seems quite unlikely that that will happen before
> the file lock
> is released.  I doubt if any delay is necessary.
>
>>
>>> I think there will always be the remote possibility of a race,
>>> even if we check that locks are available.    The critical thing
>>> is that only one writer wins the race.
>>
>> True. However, it looks like start_durus doesn't retry if Durus
>> can't lock the database file; Durus just exits with a
>> RuntimeError. Then start_durus calls wait_for_server and after a
>> timeout, exits the parent process raising SystemExit. I suppose I
>> could catch SystemExit and assume what happened to the Durus
>> process and try to start it again....
>
> If wait_for_server times out, I don't think an automatic retry will
> help.
>
>
> _______________________________________________
> QP mailing list
> QP@mems-exchange.org
> http://mail.mems-exchange.org/mailman/listinfo/qp

reply