[Durus-users] Re: OODB vs SQL

* Rodrigo Dias Arruda Senra wrote [2005-10-11 14:45:44 -0300]:

> I simply have no scientific evidence related to whom have
> better performance either Relationl+SQL or Python+Durus.
> I have never tried to migrate relational data to Durus and see how it
> performed. For the time being, I'll gladly take you word on Durus performance
;o)

There are almost certainly limitations in tools such as Durus and ZODB; I've
no experience with other OODBs, commercial or otherwise.

I have at least as much confidence that a complex data structure can be
modeled in an OODB (intuitively we can assume the work will be easier there
anyway) as opposed to SQL.

I've a little less confidence that even a trivial data structure, with a huge
number of object instances, can be managed by an OODB, at least when thinking
about our python favorites.

Intuitively I know I can shove tens of millions of rows into an Oracle (or
these days, a Postgresql) database, and, with appropriate indexing, I can
efficiently query the beast, shovel new data into the thing, and overall have
a high degree of confidence in its robustness and ability to perform under a
range of conditions.

For "fun" a couple of years ago I tried to shovel 20GB of data into a ZODB
database - fairly straight forward stuff - it was stock market data along the
lines of:

        class Symbol(Persistent):
                end_of_day_records = PersistentList()

        class Quote:
                open = None
                high = None
                low = None
                close = None
                volume = None

The initial load looked something like this, looping through 10,000 some odd
symbols and the end of day quote data:

        symbol_db = PersistentDict()

        symbol_db['IBM'] = Symbol()

        symbol_db['IBM'].end_of_day_records.extend(abigorderedlist_of_quotes)

I also built a SQL model for same; importing the data into SQL was fairly
trivial, and relatively fast. Many SQL engines have decent bulk data loading
facilities that work on a lower level than SQL itself.

I can't recall how long it took to load the ZODB up but it was many hours,
perhaps overnight. Regardless, even with use of ZODB btrees and meaningful
keys, simply accessing a random symbol and pulling out its list of quotes was
painfully slow.

I'd been interested in seeing whether I could do something as brain dead as
loading a bunch of relatively static data into ZODB; but there were
alternatives: for the analysis work I wanted to do, I could have easily
written out 10,000 csv files containing the quote records and that would have
performed well enough.

Yet in Postgres (and experience tells me Oracle would have been no different)
the "solution" was workable enough. Clearly they are doing some things at the
file IO level that go beyond simply opening up a big file.

However, I've moved beyond my biases and past experience, and these days I
actively look for reasons to avoid SQL unless it really is the right tool for
the job.

Actually I'm faced with developing a membership/donation system capable of
managing long term historical data on several hundred thousand individuals -
as well as inter-relationships between individuals, polling information, etc.
I'd like to do this in Durus and will be mocking up some data to explore
performance soon.

Perhaps I should resurrect the stock data experiment first; I've kept the raw
data and experiment around; I guess its time to dust it off again...