Interesting trivia. I modified my code to use a pickle file instead of
Durus and the results were a bit surprising.
Durus with BTree of Persistent objects
DB size: 23 MB
import speed & memory use: 46 seconds 250 MB
search for "CHLOR": 78 seconds 74 MB
Durus with BTree of regular objects:
DB size: 10 MB
import speed & memory use: 46 seconds 236 MB
search for "CHLOR": 78 seconds 74 MB
Pickle with dict of Python objects:
DB size: 40 MB
import speed & memory use: 36 seconds 229 MB
search for "CHLOR": 26 seconds 259 MB
The search is a typical advanced search the user would do that takes
unacceptably long under Durus. The import reads all the data from CSV
files into memory and creates the database. It's only run by admins
to prepare the db so its efficiency doesn't matter.
As expected, Durus adds significant speed overhead to the search,
while Pickle uses more memory. What's surprising is that using
Persistent objects in the BTree doubles the size of the database but
does not affect performance. The Persistent overhead is even more
surprising because each object contains up to 4 KB of strings, which
you think would dwarf the size of Persistent.
Another surprise is that the advanced search which goes through every
data record (6112 of them) takes only three seconds even for complex
search criteria, while the simple search above which uses separate
index objects takes much longer. So I'm tempted to make the simple
search use the advanced search's algorithm. This would raise the
issue of all records being in memory a lot of the time, thus making
the advantages of Durus less. (An advanced search with Durus takes
214 MB, and Python doesn't release the memory when it's done anyway.)
Two issues with that:
- Three Quixote processes on the server would take 645 MB, a good
chunk of the 1 GB total, though one or two would be swapped out
sometimes.
- We'd like to target laptops with < 250 MB memory for the
standalone version, especially since the FileMaker database it's
replacing works on less than 128 MB.
So Durus may be necessary due to the memory issue.
Is there any way to make a multiprocess server access a shared chunk
of memory? Or conversely, has anyone tried Quixote with a threaded
server? That would cut the memory use dramatically in this case.
--
Mike Orr