Re: [Quixote-users] Quixote 3

On 3/7/06, mario ruggier  wrote:
> On Mar 7, 2006, at 12:29 AM, Mike Orr wrote:
>
> > My own project is having problems with SQLite
> > performance, and I may have to substitute Durus or put the 30 MB
> > database into memory.
>
> Do you mean you consider switching to Durus from SQLite due to a
> performance issue with this particular project?  SQLite is already a
> lean and mean db, and with such a small amount of data I would
> certainly be very curious what specific issues you are having that
> would make you consider such a switch... (and that could be a big
> separate discussion, so may be over on the durus list?).

My application will have two modes: an ordinary web app using MySQL,
and a standalone application using SQLite for laptops without Internet
access.  I'm using SQLObject as a front end, so the queries are
supposedly as identical as possible.  But SQLite is being a
bottleneck.  My 30 MB database has a main table with a modest number
of chemical records (6000), but they are large records (200 fields).
It's linked to several many:many tables, the biggest containing 85,000
records.  Each chemical has an official name and zero or more
synonyms; the same synonym can match multiple chemicals.  A foreign
key in the main table points to the official name, which is also
included in the many:many of all names.

I built a standalone import routine.  It takes 4 minutes for MySQL.
It takes 52 minutes for SQLite on a good day, or 2-4 hours on a bad
day.  This is all due to swapping on my 512 MB workstation; the
virtual memory size goes to 450-650 MB.  Meanwhile, MySQL never goes
above 20 MB in its process.

I built a demo app with web pages to look up a chemical by ID#; list
all official names starting with a certain letter; and to do a
substring search of both official names and synonyms, ranking the
results.  All searches take <= 4 seconds in MySQL.  In SQLite the
substring search (which joins three tables) hangs the server and
blocks all subsequent requests.  I don't know whether it really hangs
or it just takes > 5 minutes to calcuate the answer, but either is
unacceptable.  Apparently this is because it's joining three tables.
It takes the same amount of time in the SQLite command-line tool, and
indexes don't help.  The hang is in a C routine so when I stop the
main server (TG/CherryPy), the threads continue until manually killed
or until the browser does "Stop".  Until these threads are killed, the
server can be restarted but it won't serve requests.

I avoided Durus because I thought the overhead of searching in Python
(and unpickling the records) would be too high, but David thinks it's
not unreasonable for this data size, so I'll try it and see.  I'm also
going to try replacing the many:many relations with 1:many.  Since
this is just a copy of the data, it doesn't have to be normalized, and
we are about to eliminate a large field anyway, which will offset the
larger file size.

> W.r.t your comments on the framework's "db component", working with QP
> I realize that the interface  between QP and the special data classes
> it expects (i.e. user and session) is really quite minimal, and I'd say
> very feasibly replaceable by your ORM of choice, or any other OO layer
> of course. (There is of course a lot of interaction within the classes
> themselves, but with QP "on the outside" of them there is surprisingly
> little). I mention this because I think this openness is a point in
> QP's favour (I'd hazard a guess that replacing the ORM in django or TG
> will be a whole other story), and because it makes me think that the
> conceptual difference between Quixote and QP is as big as one (me;)
> might expect. Of course QP has numerous improvements, such as QPY, but
> Quixote would anyway have to improve in the same ways...
>
> So, you talk about Quixote 3 -- I think that would be another name for
> QP made slightly more open, in particular giving QP the option to run
> without a db, and formalizing the interaction it would require from the
> db when it is used with one. I think that would be an interesting route
> down which to merge the 2 frameworks.

Neil has specific ideas about what QP should be, and it's tightly
integrated with Durus, Dulcinea, QPY, and the MEMS Exchange's other
applications.  These base technologies are a radical departure from
the Python web mainstream, and thus unacceptable to a large percentage
of programmers.  If you've done a bit of work with SQLObject, MySQL,
and Cheetah before -- enough to trust them -- you are unlikely to
accept a different system like Durus that would have to be thoroughly
tested and doesn't have that many users.  Of course, one can use an
alternate template or database system with QP, but this would have to
be shown in the manual.  I don't think the use of QPY and Durus
internally is a showstopper; users can consider them an
"implementation detail".  Since Durus is small, it can be considered
an acceptable baseline for persistent sessions.  (Dictionary sessions,
Quixote's default, are incompatible with multiprocess deployments like
SCGI.)

I had assumed that QP expected you to use Durus for your application
data as well as for the internal session stuff, but when I mentioned
this to David he said, "Well, I don't see any reason you can't do
that."  Meaning he'd just as likely expect an application to use some
other database for its application data.  Of course, this other
database can be anything.

Another thing about QP is the convention of URL callables (please, can
we call them "controller methods" instead?) calling header and footer
functions to build the entire HTML page.  This adds some clutter.  In
Cheetah or Kid you'd use template inheritance to take care of this, or
in my case I made ._q_traverse do it.  But of course, this is just a
convention in QP.  It's just something to mention in the manual, that
you don't have to do it that way.

The main thing in QP I see as a potential problem is that you have to
make a symlink from the QP installation to your site directories.
This implies you can have only one set of QP sites on a system, and
users can't have their own in their home directories.  Is this symlink
really necessary, or can the sites directory be specified another way,
such as relative to the startup script or configuration file?

--
Mike Orr 
(mso@oz.net address is semi-reliable)