Re: [Durus-users] Durus and write intensive applications

Durus and write intensive applications

2004-12-08

Syver Enstad

2004-12-08

2004-12-08

2004-12-08

2004-12-08

2004-12-08

2004-12-08

Durus and write intensive applications

David Binger

2004-12-08

The PersistentDict is a simple, reliable mapping structure.
It just wraps an underlying dict and makes sure that changes
are transactional.

A commit including a change to a PersistentDict (or any Persistent
instance) writes a pickle of the changed instance to disk.
For a PersistentDict, the pickle must include the pickle of the
underlying dict.  The dict's pickle is bigger when there are more items.
Roughly speaking, the space required to commit any change to a
PersistentDict is a linear function of the number of items.

A BTree acts like a PersistentDict, except that the mapping
is divided up into lots of internal BNode instances.
When you add or remove an item from a BTree, some of
these BNode instances must be changed, but only a small number.
The space-cost of changing a BTree is limited to the size
of the pickles of this small number of changed BNodes.

This is where the degree comes in.  BTrees with lower node degree
are divided into smaller parts.  A change will be more likely to
involve more nodes, but the size of each node's pickle is smaller.
There is a trade-off, then, and I don't think there is one answer
that fits all situations.  One thing to consider is that the
*minimum* space cost of the change of a BTree is the size of
the pickle of one of your BNodes.   If you have a high change rate,
that might lead you to have small node degree.  On the other
hand, higher-degree nodes require fewer load operations, so they
should provide for faster access.

If anyone wants to experiment with this, here is an easy way:
Run a durus server with logginglevel=1, so that everything is logged.
I think the log includes size information for loads and commits,
so you can run a client that tinkers with PersistentDict and BTree
instances and see just what is going on.   That might help you
decide what kind of mapping suits your application.