Re: [Durus-users] Re: Anybody willing to try BerkeleyDB Storage?

On Mar 20, 2006, at 7:14 PM, Neil Schemenauer wrote:

> David Binger  wrote:
>> The format I'm thinking about for the index is simple:
>>
>> 8 bytes length
>> 8 bytes maximum oid
>> 8 bytes maximum offset
>> Ordered array of (trimmed_oid, trimmed_offset) records.
>> (By trimming, I mean with left-side bytes removed
>> that are null for all values in the array.)
>
> How would this be read from disk in an efficient manner?  Maybe we
> need something like CDB: http://cr.yp.to/cdb.html.  There is a
> cdb.py module in the spambayes package.

If my reading of the cdb spec is right, the amount of space required
for the index is several times higher because it is made for variable
length
keys and values (which we do not have) and because it is a hash
table with a non-perfect hash, and therefore must have a third or
a half of the allocated slots empty.

Whatever format we use for the on-disk index, it should take advantage
of the fact that we know the lengths of the keys and values in advance,
and we can afford (I think) to take the time to write them down in
order.
My idea for the simple array format is to use a fixed-size
index-index in RAM to "hash" oids to the area of the array
where the corresponding offsets are stored.   Within each
zone, we can use a bisection search or a linear search just
as we would if we had collisions in a hash table.
The fixed-size index-index serves the same purpose as cdb's
initial index of 256 hash tables, except that it is not limited to that
pre-determined number of segments.  We also have an advantage
over cdb in that we know we have a long-running server that is going
to use the index.  The cdb format has to be fast for one-shot
accesses, but ours does not.

With a fixed 100MB of RAM, I think you could obtain
offsets of at least hundreds of millions of oids with pretty close
to one disk read per oid.

>
> If we are serious about making Durus scale to huge databases then I
> think there are other issues to fix as well.  For example, even with
> an on disk index for record offsets, packing still requires memory
> proportional to the number of objects in the DB.

I'm pretty sure we could modify the packing code to avoid that.

>
>   Neil
>
> _______________________________________________
> Durus-users mailing list
> Durus-users@mems-exchange.org
> http://mail.mems-exchange.org/mailman/listinfo/durus-users