[Durus-users] Re: OODB vs SQL

* Rodrigo Dias Arruda Senra wrote [2005-10-11 11:05:38 -0300]:

> Some possible rules of thumb (far from an exaustive list):
>
>  Relational - when you have a huge volume of data, sharing a
>               small set of stable data schemes.
>             - when you need ad hoc queries that could be
>               written as SQL+operators
>
>  OODB - when you have a considerable diversity of data schemes,
>         when you have to index textual content (==Information Retrieval),
>         when browsing a structure is enough as a query language
>         (== when you can do without cross products and joins).

A few comments - regarding relational, its ability to manage a huge volume of
data is largely because of the facilities provided to add automatic indexes
that span multiple keys. If you have a large table, unindexed, and do a
sequential scan (retrieval) through every row to find matches for some
criteria, its going to take more or less the same amount of time to do the
same against a large set of objects in an OODB.

One advantage not mentioned with Relational that can't be dismissed easily is
the preponderance of tools, and the facility by which integration with other
applications - most running on some sort of SQL back end - can be made. Of
course the latter point ultimately becomes less and less a distinction as
other integration techniques (REST, SOAP and the like) become more prevalent,
and the chicken-egg scenario with OODB moves forward.

Regarding OODB, is there anything truly in the OODB design that facilitates
more easily textual content management? I don't think so. In my past I worked
as a systems integrator implementing massive document and workflow management
systems, all of them back ended with SQL. A big blob of text, or anything
other big lump of information (as opposed to data), can be just as easily
shoved into a SQL column as into an OODB as a python type, or, better yet in
my opinion, into a darn file on the file system where it belongs.

I think the major advantage of the Relational approach is its been around for
a long time and is well understood. The major disadvantage of the relational
approach is that its painful to map objects on to relational data at times,
particularly when the data model is non trivial and the volume of data is
large.

Back to the trivial, for ad hoc queries, I found, quite to my surprise, that
its much easier in Python than in SQL, because after all I have the full
power of the object and all its methods available to me, where in SQL all I
have is SQL.

I'm sure no one responding to this discussion needs this example but no doubt
there are some lurkers out there that might benefit.  Lets say I want to
query the database in SQL to find out what Monty Python's balance is, who his
sales person is, and what his last shipment date was:

CREATE TABLE customer (
        id                      SEQUENCE,
        name            varchar(200),
        balance         numeric,
        FOREIGN KEY (salesperson_id) REFERENCES employee (id))

.... etc for each table. Note we'll have to at data definition time make all
sorts of
decisions that we would not have to make in Python, most commonly all sorts
of constraints as to string lengths.


        SELECT c.name, balance, e.name, last_shipment_date
        FROM
         customer c,  transactions t, employees e
        WHERE ar.customer_id = c.id AND
              t.customer_id = c.id AND
                  t.completed is True AND
                  c.salesperson_id = e.id AND
                  c.name = "Monty Python"

        ORDER BY t.last_shipment_date
        LIMIT 1

Or, adding a shipment:

        insert (customer_id, date, value) into transactions values(124,
        '2005-10-11', 1234.59)

        of course you needed that customer_id first but lets not go there.

Assuming you have even a simple object model behind you, doing the same in an
oodb becomes trivial:

        monty = customer_db.get('monty python')

        print "Customer Summary For: %s Sales person: %s Last Shipment: %s" \
               (monty.name, monty.salesperson.name, monty.get_transactions()[0])

And that's it.. a simple and perhaps more to the point understandable data
access... useful work done, possible even for a casual user of the system's
API, which is not always possible with complex SQL schemas. Just think back
to any hierarchical data structure modeled in SQL and how mind bending (for
me at least) that is, compared to a hierarchy of objects.

Newcomers to oodb world seem to obsess (I did) over querying and indexing;
but once you get past that, it seems that they are non issues or require only
a little thinking now and then. The benefits seem to outweigh the drawbacks
by a decent enough margin.

Also the "query" language is greatly enhanced in Python because we aren't
constrained to the language itself but to all the features of our objects;
that's something we don't have with straight SQL, although granted stored
procs can deliver some similar benefits.

A simple object like this will support that simple python 'query' above (this
is all off the top of my head and of course not tested or even proofread):

class Customer(Persistent):
        name = None
        receivables_balance = 0
        salesperson = None
        transactions = PersistentList()

        def add_transaction(self, trans):
                self.transactions.append(trans)

        def get_transactions(self):
                L = [trans for trans in self.transactions
                                if trans.completed is True]
                L = attr_sort(L, 'shipment_date')
                L.reverse()
                return L


class CustomerDatabase(Persistent):

        next_account_number = 1

        def add_customer(customer):
                assert customer.account_number is None
                customer.account_number = self.next_account_number
                self.next_account_number += 1

class Person(Persistent):
        name = None

class Transaction(Persistent):
        date = now()
        value = None
        items = PersistentList()

        def compute_value(self):
                value = 0
                for item in items:
                        value += item.value
                self.value = value


db = Database()
db.customer_db = CustomerDatabase()
db.people_db = PersistentDict()

monty = Customer('Monty Python', ... etc)
db.customer_db.add_customer(monty)


etc... I think I better do some real work now.