David Bingerwrote: > The format I'm thinking about for the index is simple: > > 8 bytes length > 8 bytes maximum oid > 8 bytes maximum offset > Ordered array of (trimmed_oid, trimmed_offset) records. > (By trimming, I mean with left-side bytes removed > that are null for all values in the array.) How would this be read from disk in an efficient manner? Maybe we need something like CDB: http://cr.yp.to/cdb.html. There is a cdb.py module in the spambayes package. If we are serious about making Durus scale to huge databases then I think there are other issues to fix as well. For example, even with an on disk index for record offsets, packing still requires memory proportional to the number of objects in the DB. Neil