Possible bug in cache code: Re: [Durus-users] Doubt about cache and ghost objects

Doubt about cache and ghost objects

2006-05-24

Jesus Cea

2006-05-25

David Binger

Possible bug in cache code: Re: [Durus-users] Doubt about cache and ghost objects

2006-05-25

Jesus Cea

Re: Possible bug in cache code: Re: [Durus-users] Doubt about cache and ghost objects

2006-05-25

Jesus Cea

2006-05-26

David Binger

Re: Possible bug in cache code: Re: [Durus-users] Doubt about cache and ghost objects

2006-05-25

David Binger

Possible bug in cache code: Re: [Durus-users] Doubt about cache and ghost objects

Jesus Cea

2006-05-25

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

David Binger wrote:
> For this purpose, the cache must provide access
> to all of this connection's persistent instances referenced from
> anywhere in this process.

Understood.

The weak/strong references game in the cache code is very smart. It
deserves to be documented better as enlightment of future generations :)

I think I find a bug in the cache shrink code, nevertheless:

"""
slice_size = max(min(self.size - current, current / 4), current / 64)
"""

When reaching this code, "current" is ALWAYS bigger than "self.size". So
the "min()" always gives a negative number, so the "max()"always gives
"current/64". That is, slice_size is ALWAYS "current/64".

I seeing other issues with current code. For example, since "shrink()"
is implicitely called when doing a commit/abort, and rules forbids to
keep persistent references across transactions, the "cache._get_heap()"
call will dispose a lot of objects when doing the
"reference.make_weak()", without seeing if the object was referenced
recently (because we don't keep references to persistent objects between
transactions, the cache could be the only link to an object even if it
is very "fresh"). In fact, if the cache is big and the commits touch few
objects, the "cache._get_heap()" will dispose (statistically) all
objects it touches.

A better way would be to take "X" objects from the cache, sort them by
"_p_touched" (that keeps the commit generation last touched such
object), and do the "make_weak()" game in the "X/2" (or "X/10") more old
objects. So we could keep the more recently touched objects in core.

Also, the object selection should be random. Current approach is
deterministic, and I can think about a cache content and access profile
where the same objects are always analyzed (and other are not examined),
or where a given slice has objects with the same "_p_touched" and so,
they are always purged even if they are recently accessed.

The obvious solution to this is to sort all cache by "_p_touched", do
the "make_weak()"/"ghosting" magic in the first "x/64" objects (let's
say) and increase its "_p_touched" to current level to allow new objects
to be examined in the next shrink. I don't know how expensive could be
to do a global sort by "_p_touched" compared to the "islice()" usage in
"cache._get_heap" when "start" has a high value...

Or maybe simply choose "x/64" random cache objects from the entire
cache, sort them by "_p_touched" and to the "make_weak()"/"ghosting"
miracle in the first (and so older) "x/128".

PS: The "islice()" performance worries me. I did the following experiment:

a) Creates a millon elements dictionary.
b) Use "islice()" to iterate over the last 100.000 elements, null loop.
Time in my machine: 0.0604 seconds.
c) Use "random.sample()" to choose 100.000 random elements: 0.27 seconds.

"islice()" is wonderfully efficient!!.

Nevertheless, in this test the loop was "noop". If the loop body does
"something", the difference with the concrete object used to iterate
over would become irrelevant...

- --
Jesus Cea Avion                         _/_/      _/_/_/        _/_/_/
jcea@argo.es http://www.argo.es/~jcea/ _/_/    _/_/  _/_/    _/_/  _/_/
jabber / xmpp:jcea@jabber.org         _/_/    _/_/          _/_/_/_/_/
                               _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iQCVAwUBRHYG55lgi5GaxT1NAQKQSAP/WDeW6TQEOY7MlV4XZeV90RZhfpzbVDjt
0eycsdTs7DpEWbpQDwgKrsnNypnQH5SUPFhLM9pu8VvItwi64NGIJboo4rg7SZ0d
kn8wL1LLvK/iblDgHYIckGN5T1LkfmPQjG+LkjQYW7T60Qb/Rpq4C9E8fikdiYeX
QiyvzSnHWl4=
=QTQC
-----END PGP SIGNATURE-----