[Durus-users] A final revision for durus Doc

A final revision for durus Doc
2006-04-24
A final revision for durus Doc
Jesus Cea
2006-04-24
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

This document, with the changes you can suggest from now, will be
available in the next release of my BerkeleyDB storage backend for
Durus. If Durus people want to distribute it in future Durus release,
let me know.

If you find any error, please, let me know.



=====

$Id: KNOW_HOW-DURUS 121 2006-04-24 19:29:52Z jcea $
#
# (c) 2006 jcea@argo.es / jcea@hispasec.com
#
# PGP/GPG Public Key:
# pub   1024R/9AC53D4D 1995-01-01
#       Key fingerprint = F4 07 90 C2 58 86 8A 75  45 40 33 1C 72 4C E5 E1
# uid                  Jesus Cea Avion 
# uid                  Jesus Cea Avion 
#
# This product is covered by the GNU PUBLIC LICENSE, VERSION 2.
# For more detailt, read the file "LICENSE" in the distribution.


This document is not related to the BerkeleyDB storage engine
for Durus, but tries to clarify Durus operation and inner working.

This reference describes operation of Durus 3.3 release.

If you find an error in the documentation, or would like it to
be expanded to include new topics, send a mail to jcea@argo.es.

For ease navigation in this document, each section begins with
three "###". You can use such a sequence to go around this text.
Each section, also, documents the date of last modification.


### Concurrency using the Durus Storage Server (20060421)

The Durus Storage Server allows that a single storage be
shared between several (remote) clients. So you can access
the storage remotely, and writes by a client will be
visible to others.

Durus Storage Server will listen for requests from all
clients connected, but when any request arrives, the server
will be busy ONLY with that client alone. Other requests
will be queued until finished. If that client is very slow
or the disk access is slow, the server will sit idle, even
if other clients are demanding attention.

Hope a future Durus release can process multiple read
requests in parallel. Each client would wait less, and
the disk would be better utilized (better sort multiple
seeks to serve several request that a long seek to serve
only one request).

Remember, nevertheless, that Durus clients have a local cache
to avoid hitting the storage server. Sizing that cache, and
learning how to use it in an effective way, are important
issues in any demanding Durus deployment.


### ACID using the Durus Storage Server (20060421)

ACID = Atomicity, Consistency, Isolation, Durability

DSS = Durus Storage Server

Since DSS only processes a request from a single client at
anytime, commits are atomic. No other client will be
served until the commit completes.

The Durability is garanteed by the Storage Backend used by
Durus. Some backends (for example, my BerkeleyDB Storage
backend) can be configured to not garantee Durability
in exchange of (vastly) improved performance. Some applications
can take advantage of that. Some other requires durability.

Transactions under DSS are Isolated. If you don't do any dirty
trick, DSS garantee a "degree 3 isolation". That is, you only
see committed data and reads are repeatable.

You shouldn't do it, but if you manually request
cache shrink, DSS would only garantee "degree 2 isolation".
That is, you could get different data in two reads to the
same object.

This could be a bug:
http://mail.mems-exchange.org/durusmail/durus-users/514/

Consistence is provided also by the Storage Backend used
by Durus. It implies that no transaction can leave the
Storage in an inconsistent state physically. If the
application logic has integrity constraits, it must be
enforced by the application.


### Durus Storage Server conflicts (20060423)

Durus clients implement a local cache to improve
performance, avoiding DSS accesses. Objects
fetched or written are keep in a cache. The cache
size is configurable, and evictions are transparent.

The eviction routine can be directly called
(losing "degree 3 isolation") or, better, automatically
done when you do a transaction commit or abort
(keeping "degree 3 isolation").

Cache consistency is checked when you do a commit or abort.

If you do an abort, locally modified objects are purged.
If the cache has objects that other client modified, they
are also purged. So, after an abort, your cache only keep
unmodified objects, both locally and remotely.

If you do a commit, it will fails if your cache has any
object remotely modified by another client, even if you
didn't use that object in current transaction. That can
be an issue, and I would like to improve that. If your
commit conflicts, the eviction procedure will be like the
abort case.

If your commit successes, your cache was consistent, and
it will remains untouched.

Some discussion about this issue:

http://mail.mems-exchange.org/durusmail/durus-users/508/
http://mail.mems-exchange.org/durusmail/durus-users/514/

Another important issue is that DSS keeps a changeset
per connected client, with the OIDs of the objects
changed (and commited) by the other Durus clients. That
changeset is sent (and then cleared) to its client when
it does a commit or an abort, in order to synchronize
its cache. This system has two consequences:

1. An idle Durus client will have a growing changeset
   stored in the DSS, waiting for a commit/abort. If
   the storage write rate is high, could be advisable
   that idle clients did a periodic "abort" to
   sync its cache and keep the changeset size low
   enough.

   If the "idle" client has a very low "duty" cycle,
   could be better to simply break the link to DSS.

   The changeset size depends of the number of objects
   changed, and the change rate. But if you have a lot
   of writes to a small object set, changeset size
   will be small. It tracks what objects are changed,
   not how many changes where done to an object.

2. If a client is going to start a new transaction, but
   its last activity was time ago, it is advisable to do
   an "abort" just before beginning the transaction,
   to synchronize the "cold" cache and reduce the risk
   of Durus discovering stale data at commit time.

   http://mail.mems-exchange.org/durusmail/durus-users/379/

   Also, keep your transactions as sort as you can,
   to reduce conflict risk.

Remember, you will get a conflict for any object you have
in your cache, if modified remotely, even if you don't
accessed that object in current transaction.

The usual approach to conflicts is to repeat the computation
with up-to-date data, and try again. All the times you need.


### Persistent object references and transactions (20060424)

Keeping around persistent object references between
transactions is calling for trouble. You SHOULDN'T
do that.

Your references can get stale without notice, specially
using Storage backends like my BerkeleyDB Storage
backend, that deletes garbage objects promptly. Example:

http://mail.mems-exchange.org/durusmail/durus-users/397/

Other problems with keeping references around are:

- - Unbounded object cache growing, since a reference keeps
  the object in memory.

You should always access objects from "root" object, without
keeping intermediate references, unless you know what are
you doing and the inner lifecycle of those objects. You can
only safely keep references while in a transaction, but
discard them at commit/abort time.

You can keep weak references to loaded objects, nevertheless,
across transaction boundaries. More info later.


### Durus clients, threading and multiple connections (20060424)

Durus is not threadsafe, but you can use it in a threaded
python program if you take care:

- - In a given moment, only a thread should have access
  to objects fetched tru a given durus connection. That
  thread is the only one adequate to do a commit/abort.

  This is critical, since DSS accesses from multiple
  threads could be intermixed and a crash would be the
  better outcome (you would crash the DSS, also, and
  corrupt your storage database).

- - Different threads can access DSS if each one uses
  a different DSS connections. Objects from different
  connections MUST NOT be shared. The rules of the
  previous point applies.

  You can't coordinate transactions between different
  DSS connections.

  Since sharing objects is forbidden, you can only
  exchange state between connections, even in the
  same program, going tru the DSS, using normal
  transactions.

- - The "do not mix objects fetched from different
  DSS connections" rule applies also for single
  thread processes, if they use multiple DSS
  connections. You can't coordinate transactions
  between connections, but this pattern can be useful
  to keep different data over different connections
  for, for example, better durus cache behaviour.
  Beware, nevertheless.

- - A newly created persistent object can only be linked to
  objects from one connection. When the object is created,
  it it free to be linked from everywhere. But when it is
  linked to other persistent objects and the program does
  a "commit", the new object will be vinculated to that
  connection. More info later.

- - If you commit object changes based on data from objects
  owned by other connections, you risk commiting data
  based on stale info, since conflict logic can't detect
  that dependencies became outdated.

  Don't do that.

- - Same object fetched from different DSS connections will be
  different objects in RAM. If modified, the first "commit"
  wins. The others will get a conflict.


### Data compression in the Storage (20060424)

By default Durus stores object data compressed on disk.
The algorithm used is zlib (http://www.zlib.net/)

In some situations compression could be inconvenient. For
example, the data is big and already compressed (let say,
graphics). Or, perhaps, a better algorithm could be used
with that data.

You can disable compression simply setting
"WRITE_COMPRESSED_STATE_PICKLES" to False in "durus.serialize".
This way Durus will save new and modified objects uncompressed.
Durus will load correctly both compressed and uncompressed
objects, nevertheless, so you don't need to update all
your database.

If you need to personalize your compression, you can follow
advice by David Binger:
(http://mail.mems-exchange.org/durusmail/durus-users/492/)

"""
Here's what I would do.
Set WRITE_COMPRESSED_STATE_PICKLES to False.
Add __getstate__() and __setstate__() methods to your persistent
classes that provide the customized compression behavior.
If you want compressed pickles for a certain class, make
the __getstate__() return a compressed pickle of self.__dict__
instead of the dict itself.  The __setstate__() must
have the corresponding inverse behavior.
"""

A curiosity note: all zlib streams starts with the char "x", so
if your "__getstate__()" returns a string starting with "x", when
loading Durus will try to unzip it. It will fails, of course, and
then your "__setstate__()" will be called. So, if you are worried
about efficiency, be sure your "__getstate__()" strings never
starts with a "x" char :-).


### Weak references (20060424)

Persistent objects can't have weak references to other
persistent objects to ease garbage collection in the Storage. All
interobject references in the Storage will be strong.

Your program can keep internal (non persistent) weak references
to persistent loaded instantes. Those references will be cleared
automatically if necessary: cache shrink, conflicts, etc. You can
keep such references across transactions boundaries.


### Implicit object loading/dump (20060424)

Transparent object loading/dumping is the key to a successful
persistence system. The details are simple to understand
when you "get it". You can read more about this in:

http://mail.mems-exchange.org/durusmail/durus-users/533/

Some random details:

- - A newly created object doesn't get its OID until you
  do a "commit".

- - Objects are loaded in RAM when you access (read or write)
  any of its attributes.

- - When an object is loaded, its state will be in RAM and
  available. Any persistent reference in that object will
  create a "ghost" object. That is, an "empty" object of
  the right class, ready to load its state if you touch it.

  So if you load a persistent object with references to 1000
  other persistent objects, only the state of the parent object
  will be loaded, but 1000 ghost objects will be created.

- - If Durus is going to create a ghost object, but the state
  of such object is in the cache, it will reuse the cached object.
  So the same object loaded from different graph paths will
  be the same object in RAM, also.

- - You can overload "__init__" and "__del__" in your persistent
  classes, but you must remember that "__del__" will be called
  everytime the object is "unloaded" from RAM, and won't be called
  when the object is actually deleted from the Storage. In general
  you shouldn't use "__del__".

  Remember also that "__init__" will be called only when the object
  is created first time, not each time it is loaded in RAM.

### "gen_oid_record()" (20060424)

Durus storages backends usually define a "gen_oid_record()" method.
That method iterates over all the objects in the Storage, in no
particular order. Current backend implementations have the following
caveats: (http://mail.mems-exchange.org/durusmail/durus-users/500/)

- - Don't do any writing to the storage while you are iterating,
  since you could skip or repeat records.

- - You can get deleted objects still not collected.

This method is usually used to convert a storage to other format,
or to update classes of already stored objects. You can use it,
also, for backup purposes.

The usual approach is to iterate over the source storage, loading
objects, and storing them as-is in the destination storage. When the
migration is done, you do a "giant" commit. This approach is doable
when your database is small enough to be loaded in RAM+SWAP but
if your machine is 32 bits, you are ultimate limited by the
addressable space you have, tipically in the 2^30 bytes order.

You can't do multiple smaller "commits" because some storages (for
example, my BerkeleyDB storage backend implementation) would do
a background garbage collection and delete copied but not yet
referenced objects.

Future releases of my BerkeleyDB storage backend will include
utilities to migrate huge databases without eating all your RAM.

Remember also that "gen_oid_record()" in the ClientStorage (the
standard DSS implementation) is very inefficient. Time to
transfer all the objects will be O(MAX_OID) and not to O(N). That
is, time will be proportional to the number of OIDs ever generated,
not to the number of really existant objects.


### ComputedAttribute (20060424)

ComputedAttribute's are especial persistent classes
without state, used to keep (in RAM) cached
values of "costly" functions. That cached values
are discarded if the instance is purged from memory
(for instance, cache shrink) or if any other DSS client
sent an "invalidation".

The access to the cached value is done via a "get()"
method. if the cached value is still current, we will
get it. If the cached value wasn't computed before,
or was invalidated, a new value will be computed and
cached.

The function used to compute the cached value, if necesary,
is passed as a parameter to the "get()" method. That function
MUST NOT take any parameter. This seems to be a huge issue,
but you can use a lambda or closures to pass "hidden"
parameters.

Some comments:

- - Even if a ComputedAttribute has no data, it has an OID.

- - Each time you do an invalidation, an (unnecesary) write
  will be done to the disk. The write is small, but it is
  synchronous. So, the DSS will be busy some time.

- - The function used to recalculate the cached value is
  not stored in the storage, so the application must be
  cautious to keep consistency.


### Non persistent attributes (20060424)

Durus has no support for non-persistent attributes. That is,
all attributes are stored on disk, ever.

See: http://mail.mems-exchange.org/durusmail/durus-users/411/

I guess you can implement them in your own persistent classes
touching "__getstate__".

Keep in mind comment from David Binger:

"""
In my opinion, convenience attributes on
Persistent instances like this invite trouble.
"""


### Newly created persistent objects and conflicts/aborts (20060424)

When you create a new persistent instance, it is not associated
to a particular connection, so transaction related actions (commits,
aborts, conflicts) don't affect the new object.

When you link your new persistent object to an already persistent
object you have, you are linking the new object to the connection
vinculated to that old object. Now you have three cases:

- - Commit: Your new object is committed. Now it is like any other
  persistent object. The new object in vinculated to that
  connection.

- - Abort: Your new object is not touched. It will be free to be
  reasigned to another connection, if you wish. Remember,
  nevertheless, to break first the link from the old object.

- - Conflict: Like abort.

If the object is not touched, you can reuse it as-is in a new
transaction try. You don't need to "recreate it", unless you had
a conflict and the data in the new object was based in stale
objects. Of course, in that case you must redo recalculate the data.

If the object is not vinculated to a connection, you can transfer it
to another one or to another thread. That is, "free" new objects
can be shared between threads. But only while the new object is
not vinculated to a particular connection via a link from another
persistent object.

As a side note, if you have a conflict while commiting a transaction
with new objects, you will "lose" OIDs. Not an issue since you
have 2^64 available...


TODO:

=====

- --
Jesus Cea Avion                         _/_/      _/_/_/        _/_/_/
jcea@argo.es http://www.argo.es/~jcea/ _/_/    _/_/  _/_/    _/_/  _/_/
jabber / xmpp:jcea@jabber.org         _/_/    _/_/          _/_/_/_/_/
                               _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iQCVAwUBRE0tcJlgi5GaxT1NAQJHFAP+PY8iX85immBeRIfB/sUE0s3xsWwEvmMQ
G+Z5mi6HdM3pkG0CPRF8xk/FZ+AcljpU5bEWvHwyn9zZ3fh/W6pD7XwVQnBO4vlA
XzlHzVAJ2y1ghrn8y6SgQf8oPLzcdm0mXverLYjYC7yERvBlq1fmDyxt+tJtB8Uk
9gnq+AA2QHo=
=zhKC
-----END PGP SIGNATURE-----