durusmail: durus-users: server-less multi-process concurrency for durus
server-less multi-process concurrency for durus
2009-09-15
2009-09-15
2009-09-15
2009-09-15
2009-09-15
2009-09-15
server-less multi-process concurrency for durus
Matthew Scott
2009-09-15
David,

Thanks for the feedback.


On Tue, Sep 15, 2009 at 03:28, Binger David wrote:

> The net improvement, then, is that you don't need to maintain a separate
> process for the durus server.
> Maintaining the process seems like no inconvenience at all:  am I right
> that it is having to *start* the
> separate process that is the requirement that we would like to eliminate?
>

Somewhat correct.

Specifically, I'd like to be able to work with one or more Durus databases
directly as files during development, but have the same concurrency
semantics as one would expect to get with a client/server arrangement.

For instance, the way I perform tests with Durus databases is to create and
destroy a database for each test case.  This works rather well for me.
 Having to start a server in a separate thread or process, connect to it,
use it, shut it down, for each test case seems like it would add a bit of
overhead.

Additionally, in some of my tests I reset some objects to an initial state
and keep other objects around, rather than a full destroy/recreate.  This
dictates the use of multiple Durus files which from my understanding is only
feasible with multiple Durus servers.



> An alternative approach would be to have every client attempt to start a
> server process whenever
> none is present, and this attempt will just fail for all except the first.
>  Then the human management can avoid
> thinking about the server process.


This could pose a "handoff" problem, where in process A (an application)
starts the Durus server, process B (a Python shell inspecting some things
under the hood) connects to the server started by process A, then A shuts
down, taking the Durus server with it, now process B is left hanging without
Durus.

Not a problem with a deployment where one would have a long-running server
process, but a potential problem with a more chaotic development or
desktop-app environment.



> == Packing ==
>> I haven't thought through this part as of yet, but I'm not worried about
>> it at the moment.
>>
>
> I seem to recall that this is one of the more difficult issues to solve.
> It is important, too, so I'd suggest worrying about it now.
>

You got me there.  :)  It would be quite a dance indeed, and I still haven't
thought of what the sequence of operations could be.



> The other hard issue is oid allocation.  ShelfStorage gets a big advantage
> from having the entire space of oids be contained within a compact range
> of values, so you can't just spread oids out.  And of course, there must be
> some way to make sure that no two clients use the same new oid.


This is a good point.  Thank you for reminding me of these kinds of details.
 :)



> == commit() objects to a file that has been updated ==
>> 1. acquire exclusive lock
>> 2. seek(SEEK_END), tell(), offset is different from current EOF offset
>> 3. read records from current EOF offset to new EOF, update in-memory state
>>        a. if conflict, raise WriteConflictError
>>
>
> This can be done, but it sounds easier than it really is.  You'll need to
> read the tail, find the oids, and make sure that none of them have states
> loaded in your cache during this transaction.  It isn't enough just to
> look for conflict with the oids you are writing.  This could potentially be
> a slow operation, and it has a cost that the server-based durus avoids.


I'm guessing that using a Durus server avoids this because the Durus server
keeps such information in memory?

I can see where this would be a slowdown across several processes.  If 4
processes are working on the same file, and one of them writes, then 3
processes now have to read+process the tail.



> == Perform a consistent read across a data set ==
>> 1. call pause() on the Durus connection
>> 2. seek(SEEK_END), tell(), if offset is different from current EOF offset,
>> read records and update in-memory state
>>
>
> You must also check for conflicts here and raise an exception if any of the
> loaded objects
> have a new state.
>
> Any time that the length of the file has changed, you must read the oids,
> process them as invalidations, and raise the conflict exception if any of
> the changed or new oids have state that is loaded into your memory.
>

This scenario was more to support a situation where you are querying the
database in a readonly manner, but where the query might last for a period
of time during which a write occurs.

So, rather than read and invalidate on each file length change, we'd just
"pretend" that the file isn't growing at all, and perform an analysis on a
snapshot of the database.  When the client code was finished, it would call
continue(), at which point the database state would be allowed to sync with
latest changes -- client code wouldn't care though, since it is done with
its analysis.

However, as this would add new semantics and API to Durus, perhaps this is
something that would be best tackled some other time. :)


--
Matthew R. Scott
David,

Thanks for the feedback.


=
On Tue, Sep 15, 2009 at 03:28, Binger David <dbinger@mems= -exchange.org> wrote:
The net improvement, then, is that you don&= #39;t need to maintain a separate process for the durus server.
Maintaining the process seems like no inconvenience at all: =A0am I right t= hat it is having to *start* the
separate process that is the requirement that we would like to eliminate?

Somewhat correct.

Specifically, I'd like to be able to work with one or more Durus dat= abases directly as files during development, but have the same concurrency = semantics as one would expect to get with a client/server arrangement.

For instance, the way I perform tests with Durus databa= ses is to create and destroy a database for each test case. =A0This works r= ather well for me. =A0Having to start a server in a separate thread or proc= ess, connect to it, use it, shut it down, for each test case seems like it = would add a bit of overhead.

Additionally, in some of my tests I reset some objects = to an initial state and keep other objects around, rather than a full destr= oy/recreate. =A0This dictates the use of multiple Durus files which from my= understanding is only feasible with multiple Durus servers.

=A0
An alternative approach would be to have every client attempt to start a se= rver process whenever
none is present, and this attempt will just fail for all except the first. = =A0Then the human management can avoid
thinking about the server process.

This cou= ld pose a "handoff" problem, where in process A (an application) = starts the Durus server, process B (a Python shell inspecting some things u= nder the hood) connects to the server started by process A, then A shuts do= wn, taking the Durus server with it, now process B is left hanging without = Durus.

Not a problem with a deployment where one would have a = long-running server process, but a potential problem with a more chaotic de= velopment or desktop-app environment.

=A0
=3D=3D Packing =3D=3D
I haven't thought through this part as of yet, but I'm not worried = about it at the moment.

I seem to recall that this is one of the more difficult issues to solve. It is important, too, so I'd suggest worrying about it now.

You got me there. =A0:) =A0It would be quite a dan= ce indeed, and I still haven't thought of what the sequence of operatio= ns could be.

=A0
The other hard= issue is oid allocation. =A0ShelfStorage gets a big advantage
from having the entire space of oids be contained within a compact range of values, so you can't just spread oids out. =A0And of course, there m= ust be
some way to make sure that no two clients use the same new oid.

This is a good point. =A0Thank you for reminding me of= these kinds of details. =A0:)

=A0
=3D=3D commit() objects to= a file that has been updated =3D=3D
1. acquire exclusive lock
2. seek(SEEK_END), tell(), offset is different from current EOF offset
3. read records from current EOF offset to new EOF, update in-memory state<= br> =A0 =A0 =A0 =A0a. if conflict, raise WriteConflictError

This can be done, but it sounds easier than it really is. =A0You'll nee= d to
read the tail, find the oids, and make sure that none of them have states loaded in your cache during this transaction. =A0It isn't enough just t= o
look for conflict with the oids you are writing. =A0This could potentially = be
a slow operation, and it has a cost that the server-based durus avoids.

I'm guessing that using a Durus server avo= ids this because the Durus server keeps such information in memory?

I can see where this would be a slowdown across several= processes. =A0If 4 processes are working on the same file, and one of them= writes, then 3 processes now have to read+process the tail.

=A0
=3D=3D Perform a consistent read across a data set =3D=3D
1. call pause() on the Durus connection
2. seek(SEEK_END), tell(), if offset is different from current EOF offset, = read records and update in-memory state

You must also check for conflicts here and raise an exception if any of the= loaded objects
have a new state.

Any time that the length of the file has changed, you must read the oids, process them as invalidations, and raise the conflict exception if any of the changed or new oids have state that is loaded into your memory.

This scenario was more to support a situation = where you are querying the database in a readonly manner, but where the que= ry might last for a period of time during which a write occurs.

So, rather than read and invalidate on each file length= change, we'd just "pretend" that the file isn't growing = at all, and perform an analysis on a snapshot of the database. =A0When the = client code was finished, it would call continue(), at which point the data= base state would be allowed to sync with latest changes -- client code woul= dn't care though, since it is done with its analysis.

However, as this would add new semantics and API to Dur= us, perhaps this is something that would be best tackled some other time. := )
=A0

--
Matthew R. Scott
reply