[Durus-users] Speeding up packing, rsync

Speeding up packing, rsync

2010-07-15

Speeding up packing, rsync

Neil Schemenauer

2010-07-15

While waiting for a fairly big Durus DB to pack today, I remembered
an optimization idea I had but never had time to play with.

The layout of objects in the disk file makes a difference for
performance.  For best packing performance, the reads should be
sequential since that would be most efficient for IO scheduling.
Also, it would be nice if packing moved objects around a little as
possible in order to speed up rsyncing of DBs.

I haven't had time to dig into it, but I suspect the relevant logic
is in file_storage.py:gen_oid_record().  It looks like references
are found using a depth first search.  Perhaps it would be better to
use breath-first (e.g. using collection.deque for efficiency).  One
way to approach this would be to have file_storage.py log file
offsets and examine how their distribution is affects by different
packing algorithms.

It's possible that optimizing the layout for packing would slow down
normal access.  Anyhow, I thought it was an interesting idea.

  Neil