Re: [Durus-users] Re: Anybody willing to try BerkeleyDB Storage?

On Mar 19, 2006, at 8:40 PM, Peter Wilkinson wrote:

> or me the bottom line is that a SQL data store is able to offer
> fast start-up and little memory usage, traded off against that is
> the added complexity and overhead of how to handle packing well.
> Personally I think if I were to pursue improving the large object
> count performance of Durus (which is where my travels started) I'd
> focus on just the on disk index of individual records so maybe
> somehow it can be made that such that it just doesn't need to be
> read completely on start-up.

Here is my idea for minimizing FileStorage startup time and memory
usage.

The FileStorage2 format includes an on-disk index of offsets for
records included
in the last pack.  It stores this on-disk index as a compressed
pickle of a dict, a format
which is nice if you want to load the whole thing into memory, but
not useful if you
want to use it directly.

My proposal is to write the index in a format that can be used
directly from
the disk.  The server would maintain an in-memory index of all
records written
since the last pack.  The pre-pack offsets could be obtained as
needed from the disk,
or, if desired, they could be loaded into memory too, as they are now.

The format I'm thinking about for the index is simple:

8 bytes length
8 bytes maximum oid
8 bytes maximum offset
Ordered array of (trimmed_oid, trimmed_offset) records.
(By trimming, I mean with left-side bytes removed
that are null for all values in the array.)