On 3/7/06, mario ruggierwrote: > On Mar 7, 2006, at 12:29 AM, Mike Orr wrote: > > > My own project is having problems with SQLite > > performance, and I may have to substitute Durus or put the 30 MB > > database into memory. > > Do you mean you consider switching to Durus from SQLite due to a > performance issue with this particular project? SQLite is already a > lean and mean db, and with such a small amount of data I would > certainly be very curious what specific issues you are having that > would make you consider such a switch... (and that could be a big > separate discussion, so may be over on the durus list?). My application will have two modes: an ordinary web app using MySQL, and a standalone application using SQLite for laptops without Internet access. I'm using SQLObject as a front end, so the queries are supposedly as identical as possible. But SQLite is being a bottleneck. My 30 MB database has a main table with a modest number of chemical records (6000), but they are large records (200 fields). It's linked to several many:many tables, the biggest containing 85,000 records. Each chemical has an official name and zero or more synonyms; the same synonym can match multiple chemicals. A foreign key in the main table points to the official name, which is also included in the many:many of all names. I built a standalone import routine. It takes 4 minutes for MySQL. It takes 52 minutes for SQLite on a good day, or 2-4 hours on a bad day. This is all due to swapping on my 512 MB workstation; the virtual memory size goes to 450-650 MB. Meanwhile, MySQL never goes above 20 MB in its process. I built a demo app with web pages to look up a chemical by ID#; list all official names starting with a certain letter; and to do a substring search of both official names and synonyms, ranking the results. All searches take <= 4 seconds in MySQL. In SQLite the substring search (which joins three tables) hangs the server and blocks all subsequent requests. I don't know whether it really hangs or it just takes > 5 minutes to calcuate the answer, but either is unacceptable. Apparently this is because it's joining three tables. It takes the same amount of time in the SQLite command-line tool, and indexes don't help. The hang is in a C routine so when I stop the main server (TG/CherryPy), the threads continue until manually killed or until the browser does "Stop". Until these threads are killed, the server can be restarted but it won't serve requests. I avoided Durus because I thought the overhead of searching in Python (and unpickling the records) would be too high, but David thinks it's not unreasonable for this data size, so I'll try it and see. I'm also going to try replacing the many:many relations with 1:many. Since this is just a copy of the data, it doesn't have to be normalized, and we are about to eliminate a large field anyway, which will offset the larger file size. > W.r.t your comments on the framework's "db component", working with QP > I realize that the interface between QP and the special data classes > it expects (i.e. user and session) is really quite minimal, and I'd say > very feasibly replaceable by your ORM of choice, or any other OO layer > of course. (There is of course a lot of interaction within the classes > themselves, but with QP "on the outside" of them there is surprisingly > little). I mention this because I think this openness is a point in > QP's favour (I'd hazard a guess that replacing the ORM in django or TG > will be a whole other story), and because it makes me think that the > conceptual difference between Quixote and QP is as big as one (me;) > might expect. Of course QP has numerous improvements, such as QPY, but > Quixote would anyway have to improve in the same ways... > > So, you talk about Quixote 3 -- I think that would be another name for > QP made slightly more open, in particular giving QP the option to run > without a db, and formalizing the interaction it would require from the > db when it is used with one. I think that would be an interesting route > down which to merge the 2 frameworks. Neil has specific ideas about what QP should be, and it's tightly integrated with Durus, Dulcinea, QPY, and the MEMS Exchange's other applications. These base technologies are a radical departure from the Python web mainstream, and thus unacceptable to a large percentage of programmers. If you've done a bit of work with SQLObject, MySQL, and Cheetah before -- enough to trust them -- you are unlikely to accept a different system like Durus that would have to be thoroughly tested and doesn't have that many users. Of course, one can use an alternate template or database system with QP, but this would have to be shown in the manual. I don't think the use of QPY and Durus internally is a showstopper; users can consider them an "implementation detail". Since Durus is small, it can be considered an acceptable baseline for persistent sessions. (Dictionary sessions, Quixote's default, are incompatible with multiprocess deployments like SCGI.) I had assumed that QP expected you to use Durus for your application data as well as for the internal session stuff, but when I mentioned this to David he said, "Well, I don't see any reason you can't do that." Meaning he'd just as likely expect an application to use some other database for its application data. Of course, this other database can be anything. Another thing about QP is the convention of URL callables (please, can we call them "controller methods" instead?) calling header and footer functions to build the entire HTML page. This adds some clutter. In Cheetah or Kid you'd use template inheritance to take care of this, or in my case I made ._q_traverse do it. But of course, this is just a convention in QP. It's just something to mention in the manual, that you don't have to do it that way. The main thing in QP I see as a potential problem is that you have to make a symlink from the QP installation to your site directories. This implies you can have only one set of QP sites on a system, and users can't have their own in their home directories. Is this symlink really necessary, or can the sites directory be specified another way, such as relative to the startup script or configuration file? -- Mike Orr (mso@oz.net address is semi-reliable)