[ apologies if this seems OT; just yell at me and we'll move ;) ] On Thu, Mar 04, 2004 at 01:52:01PM -0500, Graham Fawcett wrote: -> Titus Brown wrote: [ munch ] -> >Overall, the system works quite well, with only one exception: most of -> >the jobs involve calling out to an external binary program (I need to run -> >several closed source binaries, groan), and if that program uses up a lot -> >of memory or otherwise dies badly, the job can die w/o any record. There -> >are ways to control for that, but they all involve adding complexity; -> >controlling and monitoring the remote processes seems to be one of the -> >places where Linda tuple spaces restrict your options. -> -> As I feared. :-( I also will need calls out to remote binaries. Mine sometimes involve jobs that run for ~20-30 hours (sequence database searches) and that can crash. I've opted to "force" manual intervention in those cases because they're relatively rare and that way I get some idea of how frequently they occur. Obviously not a long term solution... I do retain jobs in the database (with a 'taken' flag to indicate that they're no longer available) so it's a simple matter to reset that. It's merely annoying to people waiting for the job to finish ;). [ munch of leases ] -> I've considered having the agent process, which communicates with the -> tuple space, spawn a second process to do the actual work. If anything -> is going to die, my reasoning goes, it will be the heavy-lifting -> process. If that process doesn't return an OK to the communications -> process within an expected time, then the comms process could push the -> job's tuple back into the space, and perhaps kill itself. Essentially -> it's a "rollback on timeout" in a long-running transaction. I don't have -> a whole lot of heavily synchronized stuff going on, so this is about as -> transactional as I would need to get. I have toyed with this idea, and will probably do this. It would more than quadruple the amount of job-generic code running on the client, though, and I've held off because of that. (OK, OK, quadrupling 10 lines isn't so serious...) -> I'm a bit leery of my ideas, and that's why I'm sharing them with you; -> feedback is most welcome! The whole tuple space idea is clean and -> elegant, and my ideas are messy and roughshod. I feel like a Visigoth -> draping bearskins in the temple of Pallas, to make it feel more like home... I just try to be an effective Visigoth these days ;). -> >I've thought about using Pyro to do some inter-node communication, but -> >I haven't had great luck with Pyro and am unwilling to add it in. You -> >might consider taking a look at it, though; I suspect I just haven't put -> >in the time to understand it properly. -> -> I really would like to leave the door open for agents to be written in -> any language (though certainly in Python for the near future), so Pyro -> might be a step in the wrong direction for me. And I'm really taken with -> the simplicity factor of tuple spaces. If I can write or find a more -> efficient implementation down the road, it should be relatively easy to -> integrate into my other work, since the tuple-space semantics (and API) -> should be almost unchanged. There might be room for a Web services-style implementation; your project sounds interesting! OTOH tuple spaces are *so* simple that unless you really solve a hard problem... well, you get the idea. If you have any references you have found useful, I'd be interested in them. I didn't even realize JavaSpaces existed; heck, I built the system before I had ever heard of a tuple space... cheers, --titus