On Oct 17, 2005, at 4:20 PM, David Binger wrote: > On Oct 17, 2005, at 8:23 AM, mario ruggier wrote: > >> OK, I have a few stats files if anyone cares to eyeball thru them. >> >> The more interesting is the second ordering, by tottime. Most of the >> time seems to be consumed by _p_load_state, load_state, get_state, by >> get_position, and __new__ (this last one is surprising, as to my >> understanding there should be zero new persistent objects resulting >> from this process). get_position's share of tottime is high for the >> 100K and 200K objects run, but surprisingly goes down for the 300K >> (where logically it should be doing more work). >> >> As the number of objects indexed increases, then get_state and >> __new__ share of time consumption grows to close to all of it. > > When an object is loaded, the __new__ method is called. > All of the persistent instances are "new" to this process. Ah, ok. > I think your tests are measuring read() and paging times > more than anything else. Your rebuild_index function apparently > requires every object to be loaded and kept in RAM. I don't think > you have enough RAM to do that. > > You might see some change (could be better or could be worse) > by calling Connection.shrink_cache() occasionally, or by using a BTree > with a > different degree. Will experiment with the shrink_cache() call. I am using the default 8000 as cache_size. To play with the BTree degree, I'd need to regenerate the db in the first place... maybe on another machine ;) I assume I should try to use a higher degree? As for requiring all objects in memory, here's the pertinent code in the container class for the build_index function. Here the variables self._items and index are BTrees of the same degree. The 3-line iteration over self._items is where it all happens... and it is (almost) pure BTree code. self.get_item_key() just builds a tuple of attr values from item. Looking at BTree's and BNode's __iter__ methods, not clear to me why this will require all items to be in memory. class PersistentContainer(Persistent): ... def __init__(self, minimum_degree=16): self._minimum_degree = minimum_degree self._items = self.mBTree() # uses class BNode%(minimum_degree)s self._indices = PersistentDict() ... def rebuild_index(self, index_key): if index_key == 'id': return None # not allowed to rebuild 'id' index print 'Rebuilding %s index: %s' %(self.__class__.__name__, index_key) index = self._indices[index_key] index.clear() for idkey in self._items: item = self._items[idkey] index.add(self.get_item_key(item, index_key), item) assert len(self._items.keys()) == len(index.keys()) mario