durusmail: durus-users: PersistentDict vs BTree
PersistentDict vs BTree
2006-10-03
2006-10-03
2006-10-03
2006-10-03
2006-10-05
2006-10-05
2006-10-05
2006-10-05
2006-10-11
2006-10-11
2006-10-11
2006-10-12
2006-10-12
2006-10-12
2006-10-13
2006-10-13
2006-10-13
2006-10-13
2006-10-13
2006-10-13
Re: PersistentDict vs BTree
2006-10-23
2006-10-31
PersistentDict vs BTree
Mike Orr
2006-10-11
Interesting trivia. I modified my code to use a pickle file instead of
Durus and the results were a bit surprising.

Durus with BTree of Persistent objects
    DB size:                                         23 MB
    import speed & memory use:    46 seconds       250 MB
    search for "CHLOR":                  78 seconds       74 MB

Durus with BTree of regular objects:
    DB size:                                          10 MB
    import speed & memory use:     46 seconds        236 MB
    search for "CHLOR":                   78 seconds        74 MB

Pickle with dict of Python objects:
     DB size:                                         40 MB
     import speed & memory use:     36 seconds       229 MB
     search for "CHLOR":                   26 seconds       259 MB

The search is a typical advanced search the user would do that takes
unacceptably long under Durus.  The import reads all the data from CSV
files into memory and creates the database.  It's only run by admins
to prepare the db so its efficiency doesn't matter.

As expected, Durus adds significant speed overhead to the search,
while Pickle uses more memory.  What's surprising is that using
Persistent objects in the BTree doubles the size of the database but
does not affect performance.  The Persistent overhead is even more
surprising because each object contains up to 4 KB of strings, which
you think would dwarf the size of Persistent.

Another surprise is that the advanced search which goes through every
data record (6112 of them) takes only three seconds even for complex
search criteria, while the simple search above which uses separate
index objects takes much longer.  So I'm tempted to make the simple
search use the advanced search's algorithm.  This would raise the
issue of all records being in memory a lot of the time, thus making
the advantages of Durus less.  (An advanced search with Durus takes
214 MB, and Python doesn't release the memory when it's done anyway.)
Two issues with that:

    - Three Quixote processes on the server would take 645 MB, a good
chunk of the 1 GB total, though one or two would be swapped out
sometimes.
    - We'd like to target laptops with < 250 MB memory for the
standalone version, especially since the FileMaker database it's
replacing works on less than 128 MB.

So Durus may be necessary due to the memory issue.

Is there any way to make a multiprocess server access a shared chunk
of memory?  Or conversely, has anyone tried Quixote with a threaded
server?  That would cut the memory use dramatically in this case.

--
Mike Orr 
reply