On Dec 15, 2005, at 11:36 AM, Jesus Cea wrote: > > At this point would be enough a commitment to answer questions and > - -perhaps- a future link or reference in Durus webpage :-p We can do that much, for sure. > >> I'd like to hear more about it. > > You already touch the principal points in your email. > > I see these problems in current Durus implementation (3.1): > > * You need memory proportional to the number of objects in the > database, > ever if you don't ever load any object. > > - BerkeleyDB is a proper database, keeping indexes on disk (with > memory > caching, of course). You need a fixed memory size, independient of > object count. > > * Start time is proportional to number of live and dead objects in > database. > > I've read the "file_storage.py" and see the new index object to > improve > start time. Nice :-p. But the index is only updated when the > storage is > packed. > > - BerkeleyDB start time is zero, independent of database size. > > * Since the current storage is append only, you need to pack the > database from time to time. All accurate up to here. > > While you are packing a database, access to objects are blocked. Packing is incremental now, so there isn't any interruption of service during a pack. > > While you are packing a database, you need double disk space. > > - BerkeleyDB doesn't need packing to remove outdated instances. > > - With a proper code, BerkeleyDB doesn't need packing to remove > unreachable instances. For example, keeping a reference count in > objects. If this problem were solved, you couldn't need to pack the > storage, ever. Understood, but this is like memory leaks in general. > > - Ever if you need to pack a BerkeleyDB backend, you can do it lazily > using another thread in the storage server, allowing concurrent > accesses > from clients. And you don't need extra storage. > > * Current Durus "suffers" if objects are modified frequently, since > the > index gets outdated, the storage needs more frequent packs, restarting > are more slow, and so on. The restarts get slower, the longer you go without packing. That's right. > > - BerkeleyDB is not affected by write ratio. > Understood. > Moreover BerkeleyDB has several advantages: > > * BerkeleyDB is transactional, so you can garantee no corruption > (except > harddisk failure, of course). I know that the append-only file that > Durus currently uses is resiliant to most failures too. > > * Backing the log files, BerkeleyDB allows you hot and incremental > backups. Yes, I know that you can use rsync with current Durus to > incrementally backup the file. So on these last two, it seems that both BerkeleyDB and FileStorage have the desired feature. > > * You can use the replication capability of latests BerkeleyDB. > Maybe a > 2007 Durus release :-) could allow native replication, load sharing, > etc. ZODB/ZEO are nice, but you have a single point of failure in > the ZODB. That sounds complicated, but obviously important at some point. > >> How many instances do you want to support? > > 500 GB of harddisk :-p. I'm planning to migrate my mail system > (currently about 300 GB of user mailboxes, stored under a UGLY mbox > format) to Durus, but I need better write behaviour, since mailbox > read/write is 1:1. I was planning to migrate the mboxes to BerkeleyDB > but developing a new backend for Durus could be more general, more > useful to other people, more "pythonic", and nearly equal cost in > development time. I'm all for it. Storage is relatively simple, too, so I bet this will be pretty easy for you, since you are already tuned up on BerkeleyDB. > > Also, I plan to export in my LAN a "generic" persistence storage > system > based in Durus, for things like "Linda" blackboard coordination. The > idea is to provide a persistence system in a "blackbox" in the LAN, > useable for any client Generic in what sense? Blackbox in what sense? Do you mean you'd just use a few primitive Persistent classes, so the clients would not have to be updated? Both of these ideas seem valuable and achievable. Please keep us updated.