Re: [Durus-users] Replication - thoughts, patches, thinking out loud.

Hi David,

Thanks for having a look and sorry for the slow response.

Underlying a lot of my ongoing experimentation is growing databases,
many GBs so far and continuing to grow. Getting fool proof fast
replication running against big databases is a priority. I currently
use full rsyncs but have been trying thinking of ways to be more
efficient.

One of my hats in my day job is as a sysadmin and we use rsync heavily
on busy machines and I've grown to be wary of the storm of IO it can
generate on lots of data. Getting rsync to not read all of the source
and destination requires the append option that has been spoken about
before which leads to the issue of ensuring that the destination file
is a strict subset of the source one.

This is where my experimentation started; how can we know that the
destination is that strict subset? My first thought was to just make a
unique header for each file and compare those but then realised that
to append anything to the slave the data has to come from the same
point in the master which is hard to know after any interruption of
the appending occurs, eg. slave server is getting an update and is
offline for 10 minutes, from where in the master does the data get read?

The two options to this issue, as I see it, are to run a full rysnc
for each replication run so that the slave state can be anything at
all and it will be cleaned up or to keep track of some structure in
the slave and master and compare where the slave is at with the master
and therefore be able to append cleanly. I was very pleasantly
surprised at how little change to the shelf code was needed to get
enough structure to use.

I hope that clears up where I'm coming from, if nothing else I
continue to learn a lot about Durus which is delightfully small for
what it does, small enough for the concepts and code to fit in my
head ;-)

Peter W.

On 26/06/2009, at 11:00 PM, Binger David wrote:

> It seems like your replication strategy works unnecessarily hard to
> track transaction
> boundaries.  If the master fails and the slave has a partial
> transaction,  then the
> new server process would need to truncate the partial transaction at
> startup,
> just as it would if the same condition happened without replication
> involved.
>
> The careful rsync strategy that I think I've posted here earlier can
> easily
> run every minute, and it recognizes when packs happen.  If you need
> more frequent
> updates, I think you can use the same inode-checking strategy along
> with
> a remote "tail -f" to get the job done.   Is that not right?
>
> I think what you've done is cool, I'm just not sure if it is cool
> enough to change the
> file format.   Am I overlooking something?
>
>