durusmail: durus-users: Revised paging-in patch
Revised paging-in patch
2006-07-07
2006-07-08
2006-07-08
2006-07-08
2006-07-10
2006-07-11
2006-07-14
Revised paging-in patch
A.M. Kuchling
2006-07-10
On Sat, Jul 08, 2006 at 05:15:29AM +0200, Jesus Cea wrote:
....

Thanks for your careful analysis; I'm updating the patch to take
your comments into account.

Responding to your two e-mails:
>Will this code be included in Durus 3.5?

That's up to David.  I'd like to craft something he finds acceptable,
which is why this patch is going through several cycles of revision,
but he's still free to reject the whole idea.  (In which I'll just
maintain a patched version for our product build...)

> Could be the quantity something negotiated between the client and the
> server?. I don't like static numbers, moreover if that numbers could be
> tunable. Currently to change that value we would need to edit both the
> clients and the server.

Maybe.  David, do you have an opinion here?  Is this too much
complexity for your taste?

> I think that filtering OIDs already seen should be done before spliting
> the bulk_load, since current code could do "bulk_load()" of far fewer
> OIDs that 100, if the filtering has a lot of hits. Also probably
> "filter()" functional call would be more efficient. Some profile
> comparison would be nice.

Here's what I used to time it.  filter took 1.7sec, the listcomp took
0.8sec.

import timeit

oid_list = range(256)
t = timeit.Timer('filter(lambda x: x in s, oid_list)',
                 setup='oid_list=range(256) ; s = set(range(0, 250)[::2])')
print 'filter=', t.timeit(number=10000)

t = timeit.Timer('[x for x in oid_list if x in s]',
                 setup='oid_list=range(256) ; s = set(range(0, 250)[::2])')
print 'listcomp=', t.timeit(number=10000)

> > +                    queue.extend(split_oids(refdata))
> > +                    seen.add(oid)
>
> Perhaps you could filter "refdata" against "seen" before adding to
> queue. I'm not sure if that could be profitable.

I think it's not sufficient.  Consider objects A and B, which both
refer to object C.  When A is read, its references are added to the
queue, so C passes the 'seen' test and is added.  When B is read, C is
already in the queue and passes the 'seen' test again.

I think I'll change it to filter the entire queue on every pass, so
that the bulk load always uses 100 items (except on the last one).

> If I understand correctly your code, you could load the entire object
> closure in RAM. So if you do a "page_in()" on the "root" object you
> could load in RAM the entire object database. Am I right?.

Correct.  The first version did everything server-side, but David was
worried about a denial-of-service where you'd request the root object.
The second version is client-side, so if the client wants to read the
entire database in 100-object chunks, the client is free to try it. :)

This DoS issue is why it uses chunks of 100, not 10000, even though
10000 objects probably would fit into memory OK and can be read pretty
quickly.

--amk
reply