durusmail: durus-users: A newcomer and BerkeleyDB
A newcomer and BerkeleyDB
2005-12-15
2005-12-15
2005-12-15
2005-12-15
Alternative storage WAS A newcomer and BerkeleyDB
2005-12-15
Re: Alternative storage WAS A newcomer and BerkeleyDB
2005-12-15
Re: Alternative storage WAS A newcomer and BerkeleyDB
2005-12-15
Re: Alternative storage WAS A newcomer and BerkeleyDB
2005-12-15
Re: Alternative storage WAS A newcomer and BerkeleyDB
2005-12-16
2005-12-16
2005-12-16
Re: A newcomer and BerkeleyDB
2005-12-16
2005-12-15
2005-12-16
2005-12-16
2005-12-16
2005-12-16
2005-12-16
A newcomer and BerkeleyDB
David Binger
2005-12-15
On Dec 15, 2005, at 11:36 AM, Jesus Cea wrote:

>
> At this point would be enough a commitment to answer questions and
> - -perhaps- a future link or reference in Durus webpage :-p

We can do that much, for sure.

>
>> I'd like to hear more about it.
>
> You already touch the principal points in your email.
>
> I see these problems in current Durus implementation (3.1):
>
> * You need memory proportional to the number of objects in the
> database,
> ever if you don't ever load any object.
>
>  - BerkeleyDB is a proper database, keeping indexes on disk (with
> memory
> caching, of course). You need a fixed memory size, independient of
> object count.
>
> * Start time is proportional to number of live and dead objects in
> database.
>
>  I've read the "file_storage.py" and see the new index object to
> improve
> start time. Nice :-p. But the index is only updated when the
> storage is
> packed.
>
>  - BerkeleyDB start time is zero, independent of database size.
>
> * Since the current storage is append only, you need to pack the
> database from time to time.

All accurate up to here.
>
>  While you are packing a database, access to objects are blocked.

Packing is incremental now, so there isn't any interruption of
service during a pack.

>
>  While you are packing a database, you need double disk space.
>
>  - BerkeleyDB doesn't need packing to remove outdated instances.
>
>  - With a proper code, BerkeleyDB doesn't need packing to remove
> unreachable instances. For example, keeping a reference count in
> objects. If this problem were solved, you couldn't need to pack the
> storage, ever.

Understood, but this is like memory leaks in general.

>
>  - Ever if you need to pack a BerkeleyDB backend, you can do it lazily
> using another thread in the storage server, allowing concurrent
> accesses
> from clients. And you don't need extra storage.
>
> * Current Durus "suffers" if objects are modified frequently, since
> the
> index gets outdated, the storage needs more frequent packs, restarting
> are more slow, and so on.

The restarts get slower, the longer you go without packing.  That's
right.

>
>  - BerkeleyDB is not affected by write ratio.
>
Understood.

> Moreover BerkeleyDB has several advantages:
>
> * BerkeleyDB is transactional, so you can garantee no corruption
> (except
> harddisk failure, of course). I know that the append-only file that
> Durus currently uses is resiliant to most failures too.
>
> * Backing the log files, BerkeleyDB allows you hot and incremental
> backups. Yes, I know that you can use rsync with current Durus to
> incrementally backup the file.

So on these last two, it seems that both BerkeleyDB and FileStorage
have the desired feature.

>
> * You can use the replication capability of latests BerkeleyDB.
> Maybe a
> 2007 Durus release :-) could allow native replication, load sharing,
> etc. ZODB/ZEO are nice, but you have a single point of failure in
> the ZODB.

That sounds complicated, but obviously important at some point.

>
>> How many instances do you want to support?
>
> 500 GB of harddisk :-p. I'm planning to migrate my mail system
> (currently about 300 GB of user mailboxes, stored under a UGLY mbox
> format) to Durus, but I need better write behaviour, since mailbox
> read/write is 1:1. I was planning to migrate the mboxes to BerkeleyDB
> but developing a new backend for Durus could be more general, more
> useful to other people, more "pythonic", and nearly equal cost in
> development time.

I'm all for it.  Storage is relatively simple, too, so I bet this
will be
pretty easy for you,  since you are already tuned up on BerkeleyDB.

>
> Also, I plan to export in my LAN a "generic" persistence storage
> system
> based in Durus, for things like "Linda" blackboard coordination. The
> idea is to provide a persistence system in a "blackbox" in the LAN,
> useable for any client

Generic in what sense?
Blackbox in what sense?
Do you mean you'd just use a few primitive Persistent classes, so the
clients would not have to be updated?

Both of these ideas seem valuable and achievable.  Please
keep us updated.


reply