Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
News

A Fast Start For openMosix 83

axehind writes "Dr. Moshe Bar recently announced the creation of openMosix, a new OpenSource project. The project has quickly attracted a team of volunteers developers from around the globe and is off to a very fast start. openMosix, is an extension of the Linux kernel. openMosix is a Linux kernel extension for single-system image clustering. openMosix is perfectly scalable and adaptive. Once you have installed openMosix, the nodes in the cluster start talking to one another and the cluster adapts itself to the workload. "
This discussion has been archived. No new comments can be posted.

A Fast Start For openMosix

Comments Filter:
  • by Gerdts ( 125105 ) on Monday April 15, 2002 @12:34PM (#3343833)

    Under some workloads, I can go along with the assertion that a MOSIX cluster is just like having a big machine with a lot of CPU's. It seems to be great for those workloads and I would love to try it out. Those loads tend to be multiple long running (more than a few seconds) and not multithreaded. For MOSIX to be most efficient, there also needs to be fewer jobs than there are CPUs to run them.

    Other workloads, however, will not benefit from MOSIX. These statements are based on reading the docs a couple weeks back, not on actual experience.

    Under the MOSIX model, when a process forks, the child may run on the current machine or it may migrate somewhere else. If the job is short lived (ls, echo whatever | sed s/blah/baz, you get the point) MOSIX will perform poorly because it will spend more time trying to figure out where the process should run than would have if it had just run the program on the local host.

    If you need more CPU time than one CPU can provide and your program is multi-threaded, a single multiprocessor machine will also work better. This is because MOSIX does not yet support threads running on different machines. A 128-node cluster of 386's is going to run Netscape slower than a single 486 because you will only be using one 386 CPU.

    For cases where you just have too many jobs for the resources available (CPU or memory), you may be better off with something like Condor [wisc.edu]. It is great for submitting batch jobs, migrating those jobs around, and only running the number of jobs that the system can handle.

  • by JungleBoy ( 7578 ) on Monday April 15, 2002 @01:01PM (#3343963)
    I tried (vanilla)mosix a while back. It was cool, but had some real world drawbacks. If you start a process on a node and that process opens a socket, opens a file, or uses shared memory, then that process is stuck on that node. So if you start 10 dnet processes on one node, they won't migrate to idle nodes because they have open sockets (to the key server).

    I don't know if this is the case any longer, I heard rumor that all these things were going to be implimented, so it'll be an interesting project to watch.

    Good Luck Open Mosix!

    -The JungleBoy
  • by siemce ( 544739 ) on Monday April 15, 2002 @01:12PM (#3344033) Homepage
    The main difference is that Mosix doen't work with threads. You can spawn a separate process on a node and it can migrate to different nodes. But if your application is threaded all the threads will run on one node, or migrating between nodes.
  • by JeffL ( 5070 ) on Monday April 15, 2002 @01:26PM (#3344174) Homepage
    Under some workloads, I can go along with the assertion that a MOSIX cluster is just like having a big machine with a lot of CPU's. It seems to be great for those workloads and I would love to try it out. Those loads tend to be multiple long running (more than a few seconds) and not multithreaded. For MOSIX to be most efficient, there also needs to be fewer jobs than there are CPUs to run them.

    Other workloads, however, will not benefit from MOSIX. These statements are based on reading the docs a couple weeks back, not on actual experience.

    Speaking from experience, you are pretty much correct. Jobs that use lots of CPU, but have little IO are good for mosix clusters, but jobs that have high IO are bad. The mosix filesystem and other things can partly get around the IO problems if the users plan carefully, but mostly they just want to start 30 jobs and forget about it for a few days.

    There is no reason that a mosix cluster can't be combined with a batch/queueing system. This lets lazy/stupid users run their CPU bound jobs and lets mosix distribute them, but more savy users can script their IO jobs to run on particular machines and use local disk for IO.

    It took a few months for the users of the cluster I setup to get trained into what jobs work well, and which kill the cluster. The problem is that launching 40 "good" jobs on a single machine is not a problem, because they just shoot out to the other nodes, but launching 40 "bad" jobs on a single machine will make that machine almost unusable.

    This can have adverse effects on the cluster if the good jobs were started from the overloaded machine; for example the good jobs might have to check back with their originating machine every few minutes to update a checkpoint file.

    Basically, mosix isn't some magic bullet to solve machine limitations, but it is a very cheap and effective way to solve certain problems.

"Ninety percent of baseball is half mental." -- Yogi Berra

Working...