Hi Jesus, A few comments. Jesus Ceawrote: > While you are packing a database, access to objects are blocked. Not true in version 3. > - BerkeleyDB doesn't need packing to remove outdated instances. > > - With a proper code, BerkeleyDB doesn't need packing to remove > unreachable instances. For example, keeping a reference count in > objects. If this problem were solved, you couldn't need to pack the > storage, ever. With an object database, there must be some sort of garbage collection in order to determine which objects are no longer reachable. It might be possible to make the gc more efficient when BerkeleyDB is used but there is no way to eliminate it. For your application it sounds like reference counting would work best. However, other applications may have reference cycles within the persistent set of objects. If those applications have similar space efficiency requirements then they probably need a non-copying collector (e.g. mark and sweep). I suspect some current gc research would be applicable since a recent objective is to minimize virtual memory paging. > - BerkeleyDB is not affected by write ratio. I find that hard to believe. In any case, Durus client will be affected by a high write rate. If the write rate is high enough then it's probably more efficient to have a database that can locking. > Moreover BerkeleyDB has several advantages: > > * BerkeleyDB is transactional, so you can garantee no corruption (except > harddisk failure, of course). I know that the append-only file that > Durus currently uses is resiliant to most failures too. Durus filestorage is transactional and no less safe then BerkeleyDB. IMHO, it's more safe because the storage format is so simple that you have a chance of recovering database even when things go horribly wrong (e.g. bad memory that causes random scribbling over the file). > * Backing the log files, BerkeleyDB allows you hot and incremental > backups. Yes, I know that you can use rsync with current Durus to > incrementally backup the file. Again no advantage to BerkeleyDB, AFAIK. There is no tool to do incremental backups included with Durus but writing one would be trivial. > 500 GB of harddisk :-p. I'm planning to migrate my mail system > (currently about 300 GB of user mailboxes, stored under a UGLY mbox > format) to Durus, but I need better write behaviour, since mailbox > read/write is 1:1. I was planning to migrate the mboxes to BerkeleyDB > but developing a new backend for Durus could be more general, more > useful to other people, more "pythonic", and nearly equal cost in > development time. Sounds like a fun project. You might also look into DirectoryStorage as an alternative to BerkeleyDB. In any case, the storage backend will not be the only limitation you run into; 500 GB is far more data than Durus is designed to handle. Durus was designed to be simple and we sacrificed on scalability to achieve that. I see no reason why a Python object database couldn't handle that much data but you are going to have to make some different design tradeoffs. Good luck with your project. Neil