I managed to cut the memory use in half by dropping the search index
tables and replacing a large string field with a boolean. I've gotten
down to a more reasonable 131 MB from 250. All searches now iterate
the original data records, which you think would be slow but they
complete in 3 seconds.
There seems to be a multiplicative effect with Durus: adding/removing
a string or collection: it has a significantly larger effect on memory
use than the size of the object itself. For instance, every record
has:
.name # unicode
.synonyms # list of unicode
The total number of characters in all names and synonyms is just under
2 million. You might assume these multibyte characters use 4 MB total,
or maybe 8 MB max. However, changing the names to short numeric
strings and replacing the synonyms with an empty list lowers the
memory usage by 60 MB!
I wish Python had a sizeof() operator so you could tell how many bytes
an object really takes up.
--
Mike Orr