* David Binger wrote on [2007-09-12 21:37:40 -0400]: >> One benefit of the Postgresql storage is that pg has a very fast >> bulk import mechanism (the COPY statement); if you ever need to load >> many (millions) records on anything more than a one-off basis, it can be >> a real help. > > How could the bulk import mechanism be useful for a Durus Storage? In practical terms... probably not very often. It could be useful when: - migrating an existing storage. I wrote a little file to sql storage converter and a sql to sql converter; Postgresql's COPY statement is orders of magnitude faster than SQL inserts and thus converting already existing object oid/record values takes advantage of that. But that implies you have serialized data already... unless one is replicating a storage on a regular basis, I don't see this COPY ability being used frequently. - alternatively, an experimental pgsql storage might want to go a different route than I did for new_oid(); for simplicities sake I create an empty record and let the pg db deal with the oid serial number handling. If one was to use Postgresql rather like a file system and simply keep adding new rows to the DB for any add or change in Durus, marking the old rows as invalid, COPY might be used to good effect. I've never benchmarked COPY for small numbers of database inserts but intuition tells me it would be fast enough to make it worth attempting this approach if one wanted the extra layer. Mostly I was interested in re-implementing the Postgresql storage because if I were to feel the need to use a SQL engine as a Durus storage back end, I think I'd want the implied (but not guaranteed) additional stability of Postgresql over sqlite. Detour: I discovered pgsql - a useful alternative for the commonly used psycopg2 DBAPI adaptor ( you can find pgsql at http://pypi.python.org/pypi/python-pgsql/ ) - for people working with Postgresql, you might want to look at this fairly recent piece of work. It apparently builds upon the old PyGreSQL code but uses some of the useful features Postgres offers like prepared statements and server-side cursors. Back in the day when I did a lot of python/SQL work I would have loved this. In the end, I think the sql-backed storages are useful in the following cases, all assuming there is a large record count - many millions: 1. The system implementer has hardware (RAM) limitations that can't be easily overcome. 2. System startup time is for some reason an issue. It takes minutes to start up Durus (and probably also ZODB...) when there are many millions of object instances in the database. 3. Silly: you need to meet some "check box" for a client's implementation. i.e. "must use a SQL database". LOL At any rate, if a) you control the hardware your app runs on, and, b) you have 100 bytes per record * x millions of records at your disposal, and, c) your app isn't troubled by a multi-minute start up time, then, ShelfStorage or FileStorage might be just the ticket. Incidentally, I did some very limited testing (more throwing mud at the wall than anything) of a 5M object Durus with either Shelf or File storage and didn't come away from that noticing much in the way of surprising differences. It might be I wasn't paying attention though... what ought one expect to see with the newer ShelfStorage? > I guess you realize you can have the Durus StorageServer running > on another box from your client. (If anyone does that, however, > please be certain access to the StorageServer port is strictly restricted > to trusted clients. ) Yes, fully realize that. When I was thinking of scalability I was not only thinking of Postgres but also of the original poster's shared hosting situation - in shared hosting commonly the database instance is running on another box; I think a Durus StorageServer, and any clients, running on the shared host would be a good "citizen" given the caching which comes for free. (depending on the application of course)