Re: [Durus-users] OODB basics

Oleg Broytmann wrote:
> Hello!
>
>    I have started to experiment with Durus. First, I'd like to understand
> what are object-oriented databases in general. I have some (very little)
> experience with ZODB; I did a search on papers on OODBs; I spent some time
> reading the source code of some Zope Products that use ZODB, most notable
> is TextIndexNG. Below is what I've found. Would you be so kind to look at
> it and say if I get it right?

I have done extensive research on object databases and, unfortunately,
much that is written merely reflects what current database products are
capable of doing.  I personally don't think they do enough.  So I wrote
Schevo.

>    Thank you in advance!
>
>    My English is a bit rough, sorry. Any correction will be appreciated.
> BTW, is "OODBs" the correct plural form?

There doesn't seem to be much of a standard for OODB terminology.

> Object-oriented databases.
>
> An OODB stores objects; actually serialized images of objects, and
> serializes and deserializes objects at needs. An OODB assigns every object
> an OID (object ID) and stores a global mapping from OIDs to objects.

This sort of thing (and I don't blame you for it) puts too much emphasis
on the storage side of things and too little emphasis on what the
objects look like and how data integrity and other rules are enforced.

> A user can and should create "indices" - mappings from keys to objects (or
> OIDs). These indices help to fetch objects; they also are there to be
> iterated over, so the user does not need to fetch and test an every object
> in the loop (which can be quite memory- and time-consuming) - it is enough
> to iterate over an index and fetch only those objects that the user really
> needs; the index iteration is usually optimized time- and memory-wise.

IMNSHO the database should do these things automatically, based on a
declarative syntax of the desired keys and indexes.  That's what Schevo
does.  Here is a snippet from a Schevo schema:

class Gender(E.Entity):
    """Gender of a person."""

    code = f.string()
    name = f.string()

    @f.integer()
    def count(self):
        return self.sys.count('Person', 'gender')

    _key(code)
    _key(name)

    _initial = [
        ('F', 'Female'),
        ('M', 'Male'),
        ('U', 'Unknown'),
        ]

class Person(E.Entity):
    """Individual person."""

    _plural = 'People'

    name = f.string()
    gender = f.entity('Gender')

    _key(name)

    _index(gender)

> Serialized objects are the only "things" in a database. Indices are also
> such objects.

It doesn't matter what is in the database itself, only what is returned
from the database - if it returns Python objects, then it is an object
database.  This is the same for relational databases and is a form of
duck typing.  The storage can be anything at all.  What matters is what
one gets out of the database.

> Storing and fetching objects are the only operations (beside administration
> tasks, like packing) of an OODB. Removing objects is (sometimes? usually?
> always?) implemented as a modification of the parent object - the OODB
> breaks the link between the parent and the child, and stores the modified
> parent object. Index iteration is not a task of the OODB - indices are just
> special iterable objects (sequences or enumerators), usually implemented as
> (binary) trees for efficiency.

Why shouldn't the database do more, like enforce referential integrity,
automatically maintain bi-directional links, provide cascading (or
restricted, or unassigning) deletes, intelligent fields, enough metadata
to support a dynamically constructed UI, etc?  Schevo does.

> Creating and deleting objects (including indices), and iterating over
> indices are main (if not the only) tasks related to OODBs that a user
> performs. All other manipulations with objects are outside the scope of
> OODBs, though OODBs that support automatic persistence take notes when
> objects are modified and store (serialize) new and modified objects at the
> end of a transaction.

Automatic persistence is the wrong approach and is a waste of time and
effort.  Transactions as explicit objects is the right approach.  You
can guess which approach Schevo supports.  ;-)

> When developing programs a user is mainly confronted with the following 4
> problems related to OODBs:
>
> * devise an object scheme - classes, attributes, object hierarchies;
> * name the objects (create indices);
> * upgrade (change) the object scheme;
> * remember to update all indices whenever an object is created, modified or
>   deleted.

The stuff in your list is all the good stuff.  Leaving all that to the
user is why traditional object databases haven't had much of an impact
on the world of database technology, imnsho.  Schevo provides all of
that and much, much more.  All the low-level operations are provided by
Durus, which does a terrific job.  But the higher-level stuff is
provided by Schevo and I can't imagine creating a database application
without it.

BTW, we are very close to a formal release of Schevo, which has been in
stealth mode for about 4 years.  In the mean time, feel free to get the
code from our Subversion repository and ask any questions if you want.
BTW, yte version that we are about to release uses Durus exclusively as
its storage engine and it is working so well that we may not have a need
to support ZODB, Pypersyst, or any other storage mechanism.

--
Patrick K. O'Brien
Schevo        http://www.schevo.org
Pypersyst     http://www.pypersyst.org
PyDispatcher  http://pydispatcher.sourceforge.net