On Oct 17, 2005, at 5:15 PM, David Binger wrote: > > On Oct 17, 2005, at 10:56 AM, mario ruggier wrote: > >> The 3-line iteration over self._items is where it all happens... and >> it is (almost) pure BTree code. self.get_item_key() just builds a >> tuple of attr values from item. Looking at BTree's and BNode's >> __iter__ methods, not clear to me why this will require all items to >> be in memory. > > It is because you can't get the attr values from the item without the > item being loaded, > and Durus does not automatically flush object state from memory except > when shrink_cache() > is called. (Note that commit() and abort() both call shrink_cache() > so this does method > is not normally called directly by an application.) I have tinkered further with this. There seems to be very different behaviors between shrink_cache() and commit(). Here's a specific illustration. The 2 functions below (rebuild and repair index) have each been run on a container with 2.2 million objects. I am running durus with: - logging level 10, to write out all the info messages - cache size of 20000 - chunk size of cache_size/2 (frequency of commits/shrinking => 10000) connection.commit() calls connection.shrink_cache()... however, as can be seem from the info logging from the shrink_cache() in each case. In the case of commit() the "size" of the cache (len(cache.objects)) just keeps increasing with every 10K iteration, at the same rate, causing a machine with limited ram to quickly become memory-bound. The "loaded" amount (len(loaded_oids)) stabilizes in both cases at around 60000 (but, with cache_size set to 20000). Here are the 2 functions, and the shrink_cache() log of some of the first 10k iterations for each: def rebuild_index(self, index_key): if index_key == 'id': return None # not allowed to modify 'id' index log(20, 'Rebuilding %s index: %s' %(self.__class__.__name__, index_key)) index = self._indices[index_key] index.clear() chunk = int(self._p_connection.cache.get_size()/2) for i,idkey in enumerate(self._items): item = self._items.get(idkey) index.add(self.get_item_key(item, index_key), item) if (i%chunk)==0: self._p_connection.shrink_cache() log(20, 'Shrunk, chunk %s'%(i)) assert self._items.get_count() == index.get_count() def repair_index(self, index_key): if index_key == 'id': return None # not allowed to modify 'id' index log(20, 'Repairing %s index: %s'%(self.__class__.__name__, index_key)) log(20, 'Maximal key: %s'%(self._items.get_max_item()[0])) index = self._indices[index_key] chunk = int(self._p_connection.cache.get_size()/2) for i in xrange(0, 1+self._items.get_max_item()[0]): item = self._items.get(i) if item is None: continue item_key = self.get_item_key(item, index_key) if not item is index.get(item_key): index.add(item_key, item) if (i%chunk)==0: self._p_connection.commit() log(20, 'Committed, chunk %s'%(i)) self._p_connection.commit() log(20, 'Committed, chunk %s'%(self._items.get_max_item()[0])) assert self._items.get_count() == index.get_count() #### shrink_cache() log output Rebuilding Quotes index: ('symbol', 'date') [840] cache size 10980 loaded 10686 Shrunk, chunk 10000 [840] shrink 0.028878s aged 1638 removed 0 ghosted 0 loaded 21358 size 21638 Shrunk, chunk 20000 [840] shrink 0.140190s aged 7359 removed 1 ghosted 719 loaded 31312 size 32326 Shrunk, chunk 30000 [840] shrink 0.390878s aged 8202 removed 0 ghosted 1834 loaded 40148 size 42980 Shrunk, chunk 40000 [840] shrink 0.300888s aged 8456 removed 16 ghosted 4886 loaded 45934 size 53651 Shrunk, chunk 50000 [840] shrink 0.375727s aged 10179 removed 55 ghosted 5814 loaded 50790 size 64252 Shrunk, chunk 60000 [840] shrink 0.623882s aged 11259 removed 0 ghosted 7460 loaded 54005 size 74974 Shrunk, chunk 70000 [840] shrink 0.715707s aged 12838 removed 655 ghosted 7881 loaded 56197 size 84991 Shrunk, chunk 80000 [840] shrink 0.809534s aged 9028 removed 401 ghosted 10438 loaded 56044 size 95248 Shrunk, chunk 90000 [840] shrink 5.728094s aged 8206 removed 185 ghosted 6897 loaded 59667 size 105748 Shrunk, chunk 100000 [840] shrink 4.636201s aged 11545 removed 278 ghosted 8520 loaded 61677 size 116124 Shrunk, chunk 110000 [840] shrink 0.974375s aged 16053 removed 677 ghosted 8178 loaded 63916 size 126134 Shrunk, chunk 120000 [840] shrink 1.104684s aged 14608 removed 586 ghosted 7957 loaded 66481 size 136206 Shrunk, chunk 130000 [840] shrink 2.781937s aged 9486 removed 294 ghosted 10408 loaded 66686 size 146630 Shrunk, chunk 140000 .... Repairing Quotes index: ('symbol', 'date') Maximal key: 2200000 [667] shrink 0.068419s aged 7253 removed 0 ghosted 0 loaded 20561 size 29013 Committed, chunk 10000 [667] shrink 0.345206s aged 8331 removed 198 ghosted 1014 loaded 30682 size 40125 Committed, chunk 20000 [667] shrink 0.267090s aged 9482 removed 283 ghosted 2532 loaded 39249 size 51187 Committed, chunk 30000 [667] shrink 0.540040s aged 8758 removed 1496 ghosted 4420 loaded 45133 size 61003 Committed, chunk 40000 [667] shrink 0.555359s aged 9536 removed 947 ghosted 6357 loaded 49268 size 71403 Committed, chunk 50000 [667] shrink 0.751708s aged 8240 removed 5448 ghosted 5279 loaded 50609 size 77267 Committed, chunk 60000 [667] shrink 0.698542s aged 8379 removed 7552 ghosted 2842 loaded 54132 size 81090 Committed, chunk 70000 [667] shrink 0.974925s aged 8105 removed 7499 ghosted 5456 loaded 55953 size 84908 Committed, chunk 80000 [667] shrink 0.620938s aged 7610 removed 8687 ghosted 5311 loaded 58470 size 87535 Committed, chunk 90000 [667] shrink 0.933104s aged 6873 removed 12988 ghosted 3287 loaded 54961 size 85896 Committed, chunk 100000 [667] shrink 0.496229s aged 7372 removed 11629 ghosted 4013 loaded 52873 size 85586 Committed, chunk 110000 [667] shrink 0.624497s aged 5664 removed 13171 ghosted 4742 loaded 50292 size 83768 Committed, chunk 120000 [667] shrink 0.494200s aged 6898 removed 9909 ghosted 4605 loaded 51103 size 85183 Committed, chunk 130000 [667] shrink 0.836211s aged 6437 removed 9592 ghosted 5593 loaded 51572 size 86976 Committed, chunk 140000 .... I may either be missing the logic behind this, or it is a problem.... mario