durusmail: durus-users: PersistentDict vs BTree
PersistentDict vs BTree
2006-10-03
2006-10-03
2006-10-03
2006-10-03
2006-10-05
2006-10-05
2006-10-05
2006-10-05
2006-10-11
2006-10-11
2006-10-11
2006-10-12
2006-10-12
2006-10-12
2006-10-13
2006-10-13
2006-10-13
2006-10-13
2006-10-13
2006-10-13
Re: PersistentDict vs BTree
2006-10-23
2006-10-31
PersistentDict vs BTree
Jesus Cea
2006-10-05
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Mike Orr wrote:
> It's truly read-only.  If the data is updated, I assemble the database
> on another computer, and stop the server while I'm copying it over.

Understood. Then I would recomend two things:

a) If the application is long lived or you usually touch every data
item, use PersistentDict. This way all data will be in RAM, under a
single object. Load time will be an issue, but if application is long
lived, this wouldn't be a problem. Data access speed, once loaded, will
be native Python.

b) If the applications live-time is very brief or your only touch a few
elements in each invocation, use BTree. Object usage will be a bit more
slow, but the appl will only load in RAM data actually used.

For such an application (static data) I would rather discard Durus and
use a pure python native "pickle" object.

For example, one of my applications:

"""
jcea@castor:~/mnt> ls -la torrents_ya_publicados.pickle
- -rw-r--r--  1 jcea users 2318999 2006-10-05 21:00
torrents_ya_publicados.pickle
"""

So this is an approach I actually use:

"""
jcea@castor:~/mnt> python
Python 2.5 (r25:51908, Sep 20 2006, 16:19:18)
[GCC 4.0.2 20050901 (prerelease) (SUSE Linux)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> a=open("torrents_ya_publicados.pickle")
>>> a=pickle.load(a)
>>> len(a)
4438
>>> a.items()[0]
('d66cd66cd23b440d520d53e4374f354298021891', (1109172706.2036231,
'El_Mundo_en_Guerra-(16)-Dentro_del_Reich-
by_^Cronos^-(www.fwbz.net).avi.torrent',
{'creation date': 1109172179, 'announce': '*MASKED*', 'info': {'length':
365522944, 'piece length': 262144, 'name':
'El_Mundo_en_Guerra-(16)-Dentro_del_Reich-by_^Cronos^-(www.fwbz.net).avi'},
'Autor P2P Privado': 'jcea'}))
"""

(this concrete application is waiting in line to migrate to a durus
backend :-) )

> One thing I've been stunned about is the advanced search, which I just
> finished writing.  It goes through every record aplying arbitrary
> evaluators to it (field1 contains "foo"? field2 > 5.0?), and it takes
> less than three seconds even with complex criteria that return 1000+
> results. Something in the simple search is bogging down with terms
> that return a lot of results (e.g., "lor" for chlorine, chlorazol,
> etc), in spite of the index dictionaries/tables I wrote to speed it
> up. So I may have to implement the simple search in terms of the
> advanced search, pulling all records and not using indexes,
> counterintiutive as it sounds.

If your data is entirely in RAM, access is fast. Any python managed
index you add will be in the way of the (fast) cpython internal
implementation.

- --
Jesus Cea Avion                         _/_/      _/_/_/        _/_/_/
jcea@argo.es http://www.argo.es/~jcea/ _/_/    _/_/  _/_/    _/_/  _/_/
jabber / xmpp:jcea@jabber.org         _/_/    _/_/          _/_/_/_/_/
                               _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iQCVAwUBRSVg0Zlgi5GaxT1NAQJzGAQAitGNQ21POzgzitWzARPed30QGdcCJ3XY
k+9rBu6EZQ2ewr81BHzC++K3ltf42n8vnWh1vOv/AaBKt249T6OGnx1ZLt5Y1007
DRTgLcFrM7Ld0Bwd9b3sehyVm4FDCWfUbj2ZMrHWxJfGH67V6cvrOxm3pB8NO+P4
dvLqiCtKFL4=
=DaAl
-----END PGP SIGNATURE-----
reply