durusmail: durus-users: server-less multi-process concurrency for durus
server-less multi-process concurrency for durus
2009-09-15
2009-09-15
2009-09-15
2009-09-15
2009-09-15
2009-09-15
server-less multi-process concurrency for durus
Binger David
2009-09-15
Hi Matt,

I'm glad to hear from you.

It seems like we've looked down this road before, and never decided
to take it, but maybe it will look different this time.  One thing
that seems
to make it seem more attractive now than before is the on-disk index
provided by
ShelfStorage.  That means that the in-memory index is not much
burden.


On Sep 15, 2009, at 12:28 AM, Matthew Scott wrote:

> David (and anyone else interested),
>
> I'd like you to nitpick this idea I have to add server-less, multi-
> process concurrency to Durus.
>
> Goal:  Allow more than one process and/or thread on a single machine
> to reliably access and write to a Durus file, without having to
> maintain a separate process for a Durus server, and without having
> to install more Python packages.

The net improvement, then, is that you don't need to maintain a
separate process for the durus server.
Maintaining the process seems like no inconvenience at all:  am I
right that it is having to *start* the
separate process that is the requirement that we would like to
eliminate?
An alternative approach would be to have every client attempt to start
a server process whenever
none is present, and this attempt will just fail for all except the
first.  Then the human management can avoid
thinking about the server process.

>
> I'll present several actions that a process or thread might take,
> and how Durus would behave to support those actions.
>
> Please let me know if you think this will work, any details or
> quirks you can think of that might get in the way, etc.  In return I
> will put some time into this to see if I can produce a working patch.
>
> == Lock file ==
> Kept as a separate file, "mydatabase.durus.lock" being locked only
> during write operations.
> This is to allow all processes to at least read up to a "known good"
> EOF marker while another process is writing a transaction.
>
> == Packing ==
> I haven't thought through this part as of yet, but I'm not worried
> about it at the moment.

I seem to recall that this is one of the more difficult issues to solve.
It is important, too, so I'd suggest worrying about it now.

The other hard issue is oid allocation.  ShelfStorage gets a big
advantage
from having the entire space of oids be contained within a compact range
of values, so you can't just spread oids out.  And of course, there
must be
some way to make sure that no two clients use the same new oid.

>
> == Initial opening of a file ==
> 1. seek(SEEK_END), tell(), keep as current EOF offset
> 2. read initial state from file, create in-memory state
>
> == Retrieve from a file that has not been updated ==
> 1. seek(SEEK_END), tell(), offset is the same as the current EOF
> offset
> 2. read requested object
>
> == Retrieve from a file that has been updated ==
> 1. seek(SEEK_END), tell(), offset is different from the current EOF
> offset
> 2. read records from current EOF offset to new EOF, update in-memory
> state
> 3. read requested object
>
> == commit() objects to a file that has not been updated ==
> 1. acquire exclusive lock
> 2. seek(SEEK_END), tell(), offset is the same as the current EOF
> offset
> 3. write records for new objects, update current EOF to new EOF
> 4. upon commit or rollback, release lock
>
> == commit() objects to a file that has been updated ==
> 1. acquire exclusive lock
> 2. seek(SEEK_END), tell(), offset is different from current EOF offset
> 3. read records from current EOF offset to new EOF, update in-memory
> state
>       a. if conflict, raise WriteConflictError

This can be done, but it sounds easier than it really is.  You'll need
to
read the tail, find the oids, and make sure that none of them have
states
loaded in your cache during this transaction.  It isn't enough just to
look for conflict with the oids you are writing.  This could
potentially be
a slow operation, and it has a cost that the server-based durus avoids.


>       b. otherwise, write records for new objects, update current EOF to
> new EOF
> 4. always, release lock
>
> == Perform a consistent read across a data set ==
> 1. call pause() on the Durus connection
> 2. seek(SEEK_END), tell(), if offset is different from current EOF
> offset, read records and update in-memory state

You must also check for conflicts here and raise an exception if any
of the loaded objects
have a new state.

Any time that the length of the file has changed, you must read the
oids,
process them as invalidations, and raise the conflict exception if any
of
the changed or new oids have state that is loaded into your memory.

> 3. read requested objects, never doing seek/tell dance
>     ... some time later ...
> 4. call continue() on the Durus connection
>
>
> Thanks!
>
> - Matt
>
> _______________________________________________
> Durus-users mailing list
> Durus-users@mems-exchange.org
> http://mail.mems-exchange.org/mailman/listinfo/durus-users

reply