durusmail: durus-users: Re: Backup, redundancy etc.
Re: Backup, redundancy etc.
2007-04-11
2007-04-11
2007-04-11
2007-04-11
2007-04-12
2007-04-12
2007-04-11
2007-04-17
2007-04-17
2007-05-02
Re: Backup, redundancy etc.
Andrew Bettison
2007-04-11
Andrew Bettison wrote:
> > I have recently implemented a variant of FileStorage, which I call
> > SharedFileStorage, which allows more than one process to have a
> > single storage file open for reading and writing at once.  [...]

David Binger writes:
> SharedFileStorage does sound fast, and interesting.  One tricky part
> must be in making sure that every process detects and reads every new
> transaction to figure out what records to invalidate before
> continuing.  I guess you use frequent stat calls to determine if the
> file has changed?

I just check the file size on every sync and commit.  If it's increased
since we last synced or committed, then other processes have committed
in the meantime.  We read the new transactions from the last point we
reached, up to the new eof.

SharedFileStorage weathers the stress test that comes with Durus.  It
just runs and runs, even with dozens of processes, some of them packing.
BTW, many many thanks two whomever contributed test/stress.py!  It made
my job a lot easier!

David Binger:
> A downside is that every "client" process would need to hold the
> offset index in RAM.  That can be a significant hit when you get
> into the millions of persistent objects.

Yup.  That's been on my mind a bit.  I am considering two options:

 1) Using some kind of shared memory or file mapping scheme to hold the
 in-memory index.  I haven't yet looked into what Posix offers in those
 areas, but I'm not hopeful.  This may end up having to be a
 Linux-specific solution, which is fine for my needs but not so useful
 for the community.

 2) Using a BTree for the index, like ZODB do.  This idea is growing on
 me.

Andrew Bettison:
> > I have contemplated using SharedFileStorage to implement support for
> > live "mirror" file storages [...]

David Binger:
> I'm not sure what the real advantage of this system would be over
> having rsync maintain constant mirror of the storage file on another
> server.

A mirror file kept up to date using rsync couldn't be used as a live
shared file, because a client process may try to read it while rsync has
partially written it -- the classic race condition.  If rsync could be
made to flock(LOCK_EX) the file while it wrote it, then that might solve
the problem.  But the case of rsyncing a just-packed file opens a can of
worms.

If you don't want to use the mirror as a live (read-only) database, then
rsync would surely be the way to go as a backup solution.

--
Andrew Bettison 
reply