On Oct 19, 2005, at 6:14 PM, mario ruggier wrote: > > On Oct 17, 2005, at 5:15 PM, David Binger wrote: > > >> >> On Oct 17, 2005, at 10:56 AM, mario ruggier wrote: >> >> >>> The 3-line iteration over self._items is where it all happens... >>> and it is (almost) pure BTree code. self.get_item_key() just >>> builds a tuple of attr values from item. Looking at BTree's and >>> BNode's __iter__ methods, not clear to me why this will require >>> all items to be in memory. >>> >> >> It is because you can't get the attr values from the item without >> the item being loaded, >> and Durus does not automatically flush object state from memory >> except when shrink_cache() >> is called. (Note that commit() and abort() both call shrink_cache >> () so this does method >> is not normally called directly by an application.) >> > > I have tinkered further with this. There seems to be very different > behaviors between shrink_cache() and commit(). Here's a specific > illustration. The 2 functions below (rebuild and repair index) have > each been run on a container with 2.2 million objects. I am running > durus with: > - logging level 10, to write out all the info messages > - cache size of 20000 > - chunk size of cache_size/2 (frequency of commits/shrinking => 10000) > > connection.commit() calls connection.shrink_cache()... however, as > can be seem from the info logging from the shrink_cache() in each > case. In the case of commit() the "size" of the cache (len > (cache.objects)) just keeps increasing with every 10K iteration, > at the same rate, causing a machine with limited ram to quickly > become memory-bound. The "loaded" amount (len(loaded_oids)) > stabilizes in both cases at around 60000 (but, with cache_size set > to 20000). shrink_cache() won't remove any object that has been changed since the last commit(). It has the best chance to actually remove objects when it is called at the end of a commit() or abort() call. Each you call shrink_cache(), it only looks at a fraction of the total available objects to see if they can be removed: this avoids long delays. Do the items in your indices have uncommitted changes during these loops? You might also consider calling _p_set_status_ghost() on each item, if you are sure that you don't have anything there that needs saving.