Here's a last version of this (I hope). Changes include: - Adding SqliteStorage into the mix - Providing more accurate numbers for BerkeleyDBStorage, PostgresqlStorage, SqliteStorage - Adding access time results using ClientStorage against a StorageServer using each of the storages. The intent was to see how well they play together. - Running "stress.py" from durus.test against StorageServer with each of the storage backends. If I have any conclusion to draw from here its that StorageServer / ClientConnection tends to smooth out differences. With all these alternatives to FileStorage2, noting especially ShelfStorage which will be the new default, perhaps its time to update the Durus README and change "best suited to collections of less than a million instances" to reflect the new situation. Its certainly now possible to have millions of instances, running on fairly cheap commodity hardware without gobbling up all the server's RAM, although with all storages you will have to devise a sane strategy to get the information in there if a bulk load is required. 1. Create a new storage ----------------------- Create a new Durus db with 500,000 "NewsItem" instances (contained in a NewsDatabase which is built around BTree): Seconds RAM consumed (at max) FileStorage2 341 341MB ShelfStorage 535 375 PgsqlStorage 807 296 (Python) 36MB (Postgres process) SqliteStorage 607 490 BerkeleyDBStorage 951 346 Note on file space consumed: Shelf and File storages consume the least diskspace as one might imagine; SqliteStorage adds a little more overhead. I didn't attempt to measure Postgresql. When performing the initial import BerkeleyDBStorage filled up my /tmp partition - during the initial import a number of transaction log files are still present bringing the storage disk consumption to ~ 250MB, as transaction logs were rolled up this dropped to ~ 150MB. By way of comparison file storages consume approx 98 - 105MB. 2. Time to Pack --------------- Following the initial commit of 500,000 new object instances; does not include start up time (see the table in section 3 for startup times). Seconds RAM Consumed During FileStorage2 52 214MB *ShelfStorage 248 75 PgsqlStorage 122 44 (Postgresql server) 36 (Python process) SqliteStorage 84 254 **BerkeleyDBStorage 0.014 - negligible 59 30 (6655 'garbage objects') *ShelfStorage - it was observed that pack times (even for a just packed Storage) vary more than one would expect. **Note that the BerkeleyDBStorage storage tracks objects for garbage collection during normal operation; pack() has nothing to do unless there is garbage to clean up - it does not examine every record in the storage as all the other storage examples do. I deleted several thousand objects to give it something to do, and the second number reflects that. 3. Start Up Times ----------------- (time to get "root") Seconds RAM Consumed FileStorage2 Before pack 12.316 75MB After pack 3.923 104 ShelfStorage Before pack 18.696 75 After pack 0.006 11 PgsqlStorage 0.029 15 (+ Postgres 18MB) SqliteStorage Before pack 0.011 22 After pack 0.011 22 BerkeleyDBStorage Before pack 0.49* 26 After pack 0.081 20 *Not so sure about this number - may have been my system 4. Time to Access Objects ------------------------- Best of three runs; Return one constant (NewsItem 123); random objects are selected from within the 500,000 news items returned. (All times after pack, in seconds) K |--- Random --------- 1 10 100 1000 FileStorage2 0.006 0.054 0.304 2.005 ShelfStorage 0.007 0.059 0.323 2.119 PgsqlStorage 0.009 0.072 0.414 2.690 SqliteStorage 0.021 0.060 0.398 2.491 BerkeleyDBStorage 0.008 0.042 0.343 2.229 ClientStorage accessing StorageServer running: FileStorage2 0.010 0.074 0.430 2.853 ShelfStorage 0.011 0.075 0.499 3.171 PostgresqlStorage 0.025 0.084 0.565 3.647 SqliteStorage 0.012 0.089 0.514 3.483 BerkeleyDBStorage 0.024 0.105 0.946 6.725 Editorial note: Durus caching seems to level the playing field for most storages. 5. StorageServer - stress.py ---------------------------- Results of durus/test/stress.py - 50 loops, 2 runs: (% time python stress.py --max-loops=50) FileStorage2 run 1 - 8.479u 3.503s 0:17.34 69.0%69.01011+61333k 2+0io 0pf+0w run 2 - 1.382u 0.388s 0:04.89 35.9%35.91012+7491k 0+0io 0pf+0w ShelfStorage run 1 - 8.277u 3.509s 0:17.44 67.4%67.41003+60590k 2+0io 0pf+0w run 2 - 1.578u 0.635s 0:06.88 31.9%31.91011+8185k 0+0io 0pf+0w PostgresqlStorage run 1 - 8.187u 3.417s 0:24.09 48.1%48.11006+59575k 0+0io 0pf+0w run 2 - 1.684u 0.604s 0:06.99 32.6%32.6982+7865k 0+0io 0pf+0w SqliteStorage run 1 - 8.376u 3.783s 0:25.17 48.2%48.21002+59796k 2+0io 0pf+0w run 2 - 1.845u 0.792s 0:10.68 24.6%24.61009+8466k 0+0io 0pf+0w BerkeleyDBStorage run 1 - 8.431u 3.448s 1:04.15 18.5%18.5999+60854k 0+0io 0pf+0w run 2 - 1.726u 0.551s 0:08.51 26.6%26.61048+7933k 16+0io 0pf+0w