[QP] Re: Anyone Replaced Durus with MySQL or Postgres in QP?

* David Binger wrote on [2007-09-12 21:37:40 -0400]:
>> One benefit of the Postgresql storage is that pg has a very fast
>> bulk import mechanism (the COPY statement); if you ever need to load
>> many (millions) records on anything more than a one-off basis, it can be
>> a real help.
>
> How could the bulk import mechanism be useful for a Durus Storage?

In practical terms... probably not very often. It could be useful
when:

- migrating an existing storage. I wrote a little file to sql
  storage converter and a sql to sql converter; Postgresql's COPY
  statement is orders of magnitude faster than SQL inserts and thus
  converting already existing object oid/record values takes
  advantage of that. But that implies you have serialized data already...
  unless one is replicating a storage on a regular basis, I don't
  see this COPY ability being used frequently.

- alternatively, an experimental pgsql storage might want to go a
  different route than I did for new_oid(); for simplicities sake I
  create an empty record and let the pg db deal with the oid serial
  number handling.

  If one was to use Postgresql rather like a file system and simply
  keep adding new rows to the DB for any add or change in Durus,
  marking the old rows as invalid, COPY might be used to good
  effect. I've never benchmarked COPY for small numbers of database
  inserts but intuition tells me it would be fast enough to make it
  worth attempting this approach if one wanted the extra layer.

Mostly I was interested in re-implementing the Postgresql storage
because if I were to feel the need to use a SQL engine as a Durus
storage back end, I think I'd want the implied (but not guaranteed)
additional stability of Postgresql over sqlite.

Detour: I discovered pgsql - a useful alternative for the commonly used
psycopg2 DBAPI adaptor ( you can find pgsql at
http://pypi.python.org/pypi/python-pgsql/ ) - for people working
with Postgresql, you might want to look at this fairly recent piece
of work. It apparently builds upon the old PyGreSQL code but uses
some of the useful features Postgres offers like prepared statements
and server-side cursors. Back in the day when I did a lot of
python/SQL work I would have loved this.

In the end, I think the sql-backed storages are useful in the
following cases, all assuming there is a large record count - many
millions:

1. The system implementer has hardware (RAM) limitations that can't
be easily overcome.

2. System startup time is for some reason an issue. It takes minutes
to start up Durus (and probably also ZODB...) when there are many
millions of object instances in the database.

3. Silly: you need to meet some "check box" for a client's
implementation. i.e. "must use a SQL database". LOL

At any rate, if

a) you control the hardware your app runs on, and,
b) you have 100 bytes per record * x millions of records at your
disposal, and,
c) your app isn't troubled by a multi-minute start up time, then,

ShelfStorage or FileStorage might be just the ticket.

Incidentally, I did some very limited testing (more throwing mud at
the wall than anything) of a 5M object Durus with either Shelf or
File storage and didn't come away from that noticing much in the way
of surprising differences. It might be I wasn't paying attention
though... what ought one expect to see with the newer ShelfStorage?

> I guess you realize you can have the Durus StorageServer running
> on another box from your client.   (If anyone does that, however,
> please be certain access to the StorageServer port is strictly restricted
> to trusted clients. )

Yes, fully realize that. When I was thinking of scalability I was
not only thinking of Postgres but also of the original poster's
shared hosting situation - in shared hosting commonly the database
instance is running on another box; I think a Durus StorageServer,
and any clients, running on the shared host would be a good
"citizen" given the caching which comes for free.

(depending on the application of course)