Here's a last version of this (I hope). Changes include:
- Adding SqliteStorage into the mix
- Providing more accurate numbers for BerkeleyDBStorage,
PostgresqlStorage, SqliteStorage
- Adding access time results using ClientStorage against a
StorageServer using each of the storages. The intent was to see
how well they play together.
- Running "stress.py" from durus.test against StorageServer with
each of the storage backends.
If I have any conclusion to draw from here its that StorageServer /
ClientConnection tends to smooth out differences.
With all these alternatives to FileStorage2, noting especially
ShelfStorage which will be the new default, perhaps its time to
update the Durus README and change "best suited to collections of
less than a million instances" to reflect the new situation. Its
certainly now possible to have millions of instances, running on
fairly cheap commodity hardware without gobbling up all the server's
RAM, although with all storages you will have to devise a sane
strategy to get the information in there if a bulk load is required.
1. Create a new storage
-----------------------
Create a new Durus db with 500,000 "NewsItem" instances (contained
in a NewsDatabase which is built around BTree):
Seconds RAM consumed (at max)
FileStorage2 341 341MB
ShelfStorage 535 375
PgsqlStorage 807 296 (Python) 36MB (Postgres process)
SqliteStorage 607 490
BerkeleyDBStorage 951 346
Note on file space consumed: Shelf and File storages consume the
least diskspace as one might imagine; SqliteStorage adds a little
more overhead. I didn't attempt to measure Postgresql. When
performing the initial import BerkeleyDBStorage filled up my /tmp
partition - during the initial import a number of transaction log
files are still present bringing the storage disk consumption to
~ 250MB, as transaction logs were rolled up this dropped to ~ 150MB.
By way of comparison file storages consume approx 98 - 105MB.
2. Time to Pack
---------------
Following the initial commit of 500,000 new object instances; does
not include start up time (see the table in section 3 for startup
times).
Seconds RAM Consumed During
FileStorage2 52 214MB
*ShelfStorage 248 75
PgsqlStorage 122 44 (Postgresql server)
36 (Python process)
SqliteStorage 84 254
**BerkeleyDBStorage 0.014 - negligible
59 30 (6655 'garbage objects')
*ShelfStorage - it was observed that pack times (even for a just
packed Storage) vary more than one would expect.
**Note that the BerkeleyDBStorage storage tracks objects for garbage
collection during normal operation; pack() has nothing to do unless
there is garbage to clean up - it does not examine every record in
the storage as all the other storage examples do. I deleted several
thousand objects to give it something to do, and the second number
reflects that.
3. Start Up Times
-----------------
(time to get "root")
Seconds RAM Consumed
FileStorage2
Before pack 12.316 75MB
After pack 3.923 104
ShelfStorage
Before pack 18.696 75
After pack 0.006 11
PgsqlStorage 0.029 15 (+ Postgres 18MB)
SqliteStorage
Before pack 0.011 22
After pack 0.011 22
BerkeleyDBStorage
Before pack 0.49* 26
After pack 0.081 20
*Not so sure about this number - may have been my system
4. Time to Access Objects
-------------------------
Best of three runs; Return one constant (NewsItem 123); random
objects are selected from within the 500,000 news items returned.
(All times after pack, in seconds)
K |--- Random ---------
1 10 100 1000
FileStorage2 0.006 0.054 0.304 2.005
ShelfStorage 0.007 0.059 0.323 2.119
PgsqlStorage 0.009 0.072 0.414 2.690
SqliteStorage 0.021 0.060 0.398 2.491
BerkeleyDBStorage 0.008 0.042 0.343 2.229
ClientStorage accessing StorageServer running:
FileStorage2 0.010 0.074 0.430 2.853
ShelfStorage 0.011 0.075 0.499 3.171
PostgresqlStorage 0.025 0.084 0.565 3.647
SqliteStorage 0.012 0.089 0.514 3.483
BerkeleyDBStorage 0.024 0.105 0.946 6.725
Editorial note: Durus caching seems to level the playing field for
most storages.
5. StorageServer - stress.py
----------------------------
Results of durus/test/stress.py - 50 loops, 2 runs:
(% time python stress.py --max-loops=50)
FileStorage2
run 1 - 8.479u 3.503s 0:17.34 69.0%69.01011+61333k 2+0io 0pf+0w
run 2 - 1.382u 0.388s 0:04.89 35.9%35.91012+7491k 0+0io 0pf+0w
ShelfStorage
run 1 - 8.277u 3.509s 0:17.44 67.4%67.41003+60590k 2+0io 0pf+0w
run 2 - 1.578u 0.635s 0:06.88 31.9%31.91011+8185k 0+0io 0pf+0w
PostgresqlStorage
run 1 - 8.187u 3.417s 0:24.09 48.1%48.11006+59575k 0+0io 0pf+0w
run 2 - 1.684u 0.604s 0:06.99 32.6%32.6982+7865k 0+0io 0pf+0w
SqliteStorage
run 1 - 8.376u 3.783s 0:25.17 48.2%48.21002+59796k 2+0io 0pf+0w
run 2 - 1.845u 0.792s 0:10.68 24.6%24.61009+8466k 0+0io 0pf+0w
BerkeleyDBStorage
run 1 - 8.431u 3.448s 1:04.15 18.5%18.5999+60854k 0+0io 0pf+0w
run 2 - 1.726u 0.551s 0:08.51 26.6%26.61048+7933k 16+0io 0pf+0w