durusmail: durus-users: Improving caching performance
Improving caching performance
2005-05-20
2005-05-20
2005-05-20
Improving caching performance
A.M. Kuchling
2005-05-20
We have a GUI that produces a display based on a large data structure
in a Durus database; this GUI is starting to become rather slow on a
very large model, so I'm poking around trying to speed things up.

The slowest step is an operation that loops over a 4153-element
PersistentList, doing a very simple check on each item.  Looking at
this step, it seems to me that the Connection's caching isn't very
effective in certain cases.

Connection.get() maintains a cache mapping OID -> object.  However,
the bulk of accesses to objects don't seem to go through .get(); when
you access a ghost object, Connection.load_state() is called, which
calls .get_stored_pickle(), which calls self.storage.load(), which
then does network I/O.  ClientStorage.load() doesn't do any caching of
its own.

This means that the most commonly called method, get_stored_pickle(),
isn't doing any caching at all, and the result is lots of round trips
and slow performance (a few seconds to pull thousands of objects
across the wire).  My loop is pulling all 4153 items across the wire,
and they never get recorded in any sort of cache.  .get() doesn't get
called very often, so the caching there doesn't buy much.

It seems to me that .get_stored_pickle() should really be doing the
caching here.  As an experiment, I added a cache to
get_stored_pickle(), saving the pickle string.  That didn't help the
first pass through the loop very much (only 6 cache hits), but if I
didn't clear the pickle cache in Connection.sync(), subsequent passes
got faster.  The problem is that I'm not sure where this cache needs
to be cleared; obviously, invalidates need to result in the
corresponding entries being dropped from the cache.

Thoughts?

--amk
reply