On Oct 17, 2005, at 4:20 PM, David Binger wrote:
> On Oct 17, 2005, at 8:23 AM, mario ruggier wrote:
>
>> OK, I have a few stats files if anyone cares to eyeball thru them.
>>
>> The more interesting is the second ordering, by tottime. Most of the
>> time seems to be consumed by _p_load_state, load_state, get_state, by
>> get_position, and __new__ (this last one is surprising, as to my
>> understanding there should be zero new persistent objects resulting
>> from this process). get_position's share of tottime is high for the
>> 100K and 200K objects run, but surprisingly goes down for the 300K
>> (where logically it should be doing more work).
>>
>> As the number of objects indexed increases, then get_state and
>> __new__ share of time consumption grows to close to all of it.
>
> When an object is loaded, the __new__ method is called.
> All of the persistent instances are "new" to this process.
Ah, ok.
> I think your tests are measuring read() and paging times
> more than anything else. Your rebuild_index function apparently
> requires every object to be loaded and kept in RAM. I don't think
> you have enough RAM to do that.
>
> You might see some change (could be better or could be worse)
> by calling Connection.shrink_cache() occasionally, or by using a BTree
> with a
> different degree.
Will experiment with the shrink_cache() call. I am using the default
8000 as cache_size.
To play with the BTree degree, I'd need to regenerate the db in the
first place... maybe on another machine ;) I assume I should try to use
a higher degree?
As for requiring all objects in memory, here's the pertinent code in
the container class for the build_index function. Here the variables
self._items and index are BTrees of the same degree.
The 3-line iteration over self._items is where it all happens... and it
is (almost) pure BTree code. self.get_item_key() just builds a tuple of
attr values from item. Looking at BTree's and BNode's __iter__ methods,
not clear to me why this will require all items to be in memory.
class PersistentContainer(Persistent):
...
def __init__(self, minimum_degree=16):
self._minimum_degree = minimum_degree
self._items = self.mBTree() # uses class BNode%(minimum_degree)s
self._indices = PersistentDict()
...
def rebuild_index(self, index_key):
if index_key == 'id':
return None # not allowed to rebuild 'id' index
print 'Rebuilding %s index: %s' %(self.__class__.__name__,
index_key)
index = self._indices[index_key]
index.clear()
for idkey in self._items:
item = self._items[idkey]
index.add(self.get_item_key(item, index_key), item)
assert len(self._items.keys()) == len(index.keys())
mario