durusmail: durus-users: A newcomer and BerkeleyDB
A newcomer and BerkeleyDB
2005-12-15
2005-12-15
2005-12-15
2005-12-15
Alternative storage WAS A newcomer and BerkeleyDB
2005-12-15
Re: Alternative storage WAS A newcomer and BerkeleyDB
2005-12-15
Re: Alternative storage WAS A newcomer and BerkeleyDB
2005-12-15
Re: Alternative storage WAS A newcomer and BerkeleyDB
2005-12-15
Re: Alternative storage WAS A newcomer and BerkeleyDB
2005-12-16
2005-12-16
2005-12-16
Re: A newcomer and BerkeleyDB
2005-12-16
2005-12-15
2005-12-16
2005-12-16
2005-12-16
2005-12-16
2005-12-16
A newcomer and BerkeleyDB
Jesus Cea
2005-12-15
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

> I don't know of any such work.  I seem to remember that the ZODB
> once had a BerkeleyDB storage, but I think support for it was dropped
> for reasons I don't know.

I've read about the BerkeleyDB support in ZODB. The project is currently
abandoned:

http://www.zope.org/Wikis/ZODB/BerkeleyStorageDocs/BDBStorage.html

They refer several problems they encounter during development. Except
the log issue (that I should investigate) I asume most problems are
caused by the fact that ZODB is very complex, very demanding and very
specific in its requirements.

Durus is fairly more simple, so a BerkeleyDB backend seems doable.

> Personally, I'd be glad if there were such an option for Durus.

Nice.

> In any case, please don't let the inclusion question
> influence your work.

At this point would be enough a commitment to answer questions and
- -perhaps- a future link or reference in Durus webpage :-p

> I'd like to hear more about it.

You already touch the principal points in your email.

I see these problems in current Durus implementation (3.1):

* You need memory proportional to the number of objects in the database,
ever if you don't ever load any object.

 - BerkeleyDB is a proper database, keeping indexes on disk (with memory
caching, of course). You need a fixed memory size, independient of
object count.

* Start time is proportional to number of live and dead objects in database.

 I've read the "file_storage.py" and see the new index object to improve
start time. Nice :-p. But the index is only updated when the storage is
packed.

 - BerkeleyDB start time is zero, independent of database size.

* Since the current storage is append only, you need to pack the
database from time to time.

 While you are packing a database, access to objects are blocked.

 While you are packing a database, you need double disk space.

 - BerkeleyDB doesn't need packing to remove outdated instances.

 - With a proper code, BerkeleyDB doesn't need packing to remove
unreachable instances. For example, keeping a reference count in
objects. If this problem were solved, you couldn't need to pack the
storage, ever.

 - Ever if you need to pack a BerkeleyDB backend, you can do it lazily
using another thread in the storage server, allowing concurrent accesses
from clients. And you don't need extra storage.

* Current Durus "suffers" if objects are modified frequently, since the
index gets outdated, the storage needs more frequent packs, restarting
are more slow, and so on.

 - BerkeleyDB is not affected by write ratio.

Moreover BerkeleyDB has several advantages:

* BerkeleyDB is transactional, so you can garantee no corruption (except
harddisk failure, of course). I know that the append-only file that
Durus currently uses is resiliant to most failures too.

* Backing the log files, BerkeleyDB allows you hot and incremental
backups. Yes, I know that you can use rsync with current Durus to
incrementally backup the file.

* You can use the replication capability of latests BerkeleyDB. Maybe a
2007 Durus release :-) could allow native replication, load sharing,
etc. ZODB/ZEO are nice, but you have a single point of failure in the ZODB.

> How many instances do you want to support?

500 GB of harddisk :-p. I'm planning to migrate my mail system
(currently about 300 GB of user mailboxes, stored under a UGLY mbox
format) to Durus, but I need better write behaviour, since mailbox
read/write is 1:1. I was planning to migrate the mboxes to BerkeleyDB
but developing a new backend for Durus could be more general, more
useful to other people, more "pythonic", and nearly equal cost in
development time.

Also, I plan to export in my LAN a "generic" persistence storage system
based in Durus, for things like "Linda" blackboard coordination. The
idea is to provide a persistence system in a "blackbox" in the LAN,
useable for any client

- --
Jesus Cea Avion                         _/_/      _/_/_/        _/_/_/
jcea@argo.es http://www.argo.es/~jcea/ _/_/    _/_/  _/_/    _/_/  _/_/
                                      _/_/    _/_/          _/_/_/_/_/
PGP Key Available at KeyServ   _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iQCVAwUBQ6GbcJlgi5GaxT1NAQK6GQP/ZFnylQeFaP7vOaPdPi6vJoYSTIquU1Yw
cRROCXmRPngNvQynjrbIlyHXjKaXBMd8gywOkX9tI5Jf7RFz+wUlwDOqlIvDFoyZ
G2harJL6yPrLaHIYRpsnF3sSf8lfdZhyd+QR90bDQjgSrBv+quslyaxE2Whzclyh
n+Y7im3KSzs=
=x4Qc
-----END PGP SIGNATURE-----
reply