pith. sign in

arxiv: 1907.00194 · v1 · pith:MNL65NXFnew · submitted 2019-06-29 · 💻 cs.DC · cs.NI· cs.PF

Open-MPI over MOSIX: paralleled computing in a clustered world

Pith reviewed 2026-05-25 13:19 UTC · model grok-4.3

classification 💻 cs.DC cs.NIcs.PF
keywords Open-MPIMOSIXprocess migrationload balancingDiCOMparallel computingcluster computingdirect communication
0
0 comments X

The pith

Integrating MOSIX migration into Open-MPI plus a DiCOM module reduces run-time by better resource allocation on clusters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper combines MOSIX preemptive process migration with Open-MPI to move running processes between nodes for load balancing when multiple jobs share a cluster. Migration alone would raise communication costs between processes, so the work adds a DiCOM module that provides direct communication to bypass extra TCP/IP latency. The result is an overall drop in execution time because resources can be reallocated more efficiently without the usual migration penalty.

Core claim

Incorporating the process migration capability of MOSIX into Open-MPI and adding a module for direct communication (DiCOM) between migrated processes overcomes the increased communication latency of TCP/IP, producing reduced run-time through improved resource allocation.

What carries the argument

The DiCOM module, which supplies direct communication between migrated Open-MPI processes to avoid TCP/IP overhead.

If this is right

  • Multiple concurrent jobs can share cluster nodes with dynamic reallocation of processes.
  • Load balancing improves without the communication penalty that normally follows migration.
  • Preemptive migration becomes practical for Open-MPI applications running on shared hardware.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach may extend to other MPI libraries if similar direct-communication hooks can be added after migration.
  • In cloud settings with variable node availability, the same migration-plus-direct-comm pattern could cut idle time across jobs.
  • Further work could test whether DiCOM-style shortcuts remain effective when migration frequency rises.

Load-bearing premise

Adding the DiCOM module will produce a net reduction in communication overhead and overall run time rather than introducing new costs that offset the migration benefit.

What would settle it

A side-by-side timing of the same parallel job on a cluster using standard Open-MPI versus the MOSIX-integrated version with DiCOM; if run-time does not decrease, the claim fails.

Figures

Figures reproduced from arXiv: 1907.00194 by Adam Lev-Libfeld, Alex Margolin, Amnon Barak.

Figure 1
Figure 1. Figure 1: DiCOM compared to TCP/IP TCP send/recv latency with and without migration (red and orange respectively) and DiCOM send/recv latency (blue) [mSec] as a function of message size [KiB]. iv [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 1.1
Figure 1.1. Figure 1.1: Computer clusters, now and in the near future [PITH_FULL_IMAGE:figures/full_fig_p007_1_1.png] view at source ↗
Figure 1.2
Figure 1.2. Figure 1.2: Load imbalance example Due to imbalanced load both job A and job B are slowed down, migration of processes from nodes 1 and 2 to nodes 5 and 6 will allow better performance to all the jobs. 1.4.2 When introducing MOSIX preemptive process migration Normally, MOSIX processes do all their I/O (and most system-calls) via their home-node. This can be slow since operations are limited by the network speed and … view at source ↗
Figure 1.3
Figure 1.3. Figure 1.3: DiCOM and TCP/IP in a ring topology Migrated TCP/IP(blue) and DiCOM (orange) connections in a ring topology, with the central node as the home node and all the processes are individually migrated to the outer ring of nodes [PITH_FULL_IMAGE:figures/full_fig_p009_1_3.png] view at source ↗
Figure 1.4
Figure 1.4. Figure 1.4: Network view before and after introducing DiCOM [PITH_FULL_IMAGE:figures/full_fig_p009_1_4.png] view at source ↗
Figure 2.1
Figure 2.1. Figure 2.1: TCP/IP and DiCOM communication TCP/IP (blue) and DiCOM (orange) communication path in a migrated configuration and the appropriate protocol diagram. The use of path optimizing mechanisms will shorten the communication path, improving latency and net fault rate. 2.1.2 Design concepts Since this project is part of both the MOSIX and the OMPI projects, it is necessary that the solution will be as simple and… view at source ↗
Figure 2.2
Figure 2.2. Figure 2.2: The project’s layer model The OMPI relevant modular structure with new MOSIX (DiCOM) addition. 2.1.3 Information Dissemination (Gossip) Algorithm The MOSIX system uses a style of gossip protocol called Information Dissemination protocol (rumor-mongering protocols). Which use gossip2 to spread information (see [PITH_FULL_IMAGE:figures/full_fig_p012_2_2.png] view at source ↗
Figure 2.3
Figure 2.3. Figure 2.3: Gossip Algorithms easily tell the world your deepest secrets with some help from your well connected friends. 2 “Gossip” has the same meaning in both computer and human communication – a bounded (or small), unreliable, semi random transfer of data between pairs of agents who share a common trait (Cluster in computer science, clique in humans etc.). For a more formal description (that more or less reflect… view at source ↗
Figure 2.4
Figure 2.4. Figure 2.4: Graphic view of the DiCOM protocol The Mailboxing and addresing concepts Transparency - DiCOM makes the location of processes transparent; so that the sender does not need to know where the receivers run, only to identify them by their home-node and process-ID (PID) in their home-node. Consistency - DiCOM guarantees that the order of messages per receiver is preserved, even when the sender(s) and receive… view at source ↗
Figure 2.5
Figure 2.5. Figure 2.5: Hardware Some of the hardware used in this project. 9 [PITH_FULL_IMAGE:figures/full_fig_p014_2_5.png] view at source ↗
Figure 3.1
Figure 3.1. Figure 3.1: Latency test: small messages TCP/IP with and without migration (red and orange respectively) and DiCOM (blue) send/recv latency [mSec] as a function of small message sizes [PITH_FULL_IMAGE:figures/full_fig_p016_3_1.png] view at source ↗
Figure 3.2
Figure 3.2. Figure 3.2: Latency test: typical messages TCP/IP with and without migration (red and orange respectively) and DiCOM (blue) send/recv latency [mSec] as a function of typical message size [KiB]. 11 [PITH_FULL_IMAGE:figures/full_fig_p016_3_2.png] view at source ↗
Figure 3.3
Figure 3.3. Figure 3.3: Latency test: large messages TCP/IP with and without migration (red and orange respectively) and DiCOM (blue) send/recv latency [mSec] as a function of large message sizes [Mbyte] [PITH_FULL_IMAGE:figures/full_fig_p017_3_3.png] view at source ↗
Figure 3.4
Figure 3.4. Figure 3.4: Limit test TCP/IP send/recv latency without migration (red) and DiCOM send/recv latency (blue) [Sec] as a function of large message size [Mbyte]. 12 [PITH_FULL_IMAGE:figures/full_fig_p017_3_4.png] view at source ↗
Figure 4.1
Figure 4.1. Figure 4.1: Future work - Suggested structure Suggested structure of mosix embedded DiCOM module with communication manner optimization mechanism. 14 [PITH_FULL_IMAGE:figures/full_fig_p019_4_1.png] view at source ↗
read the original abstract

Recent increased interest in Cloud computing emphasizes the need to find an adequate solution to the load-balancing problem in parallel computing -- efficiently running several jobs concurrently on a cluster of shared computers (nodes). One approach to solve this problem is by preemptive process migration -- the transfer of running processes between nodes. A possible drawback of this approach is the increased overhead between heavily communicating processes. This project presents a solution to this last problem by incorporating the process migration capability of MOSIX into Open-MPI and by reducing the resulting communication overhead. Specifically, we developed a module for direct communication (DiCOM) between migrated Open-MPI processes, to overcome the increased communication latency of TCP/IP between such processes. The outcome is reduced run-time by improved resource allocation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper proposes integrating MOSIX preemptive process migration into Open-MPI, augmented by a new DiCOM module for direct inter-process communication, to enable dynamic load balancing across cluster nodes while mitigating the TCP/IP latency penalty that would otherwise arise from migration; the central claim is that this yields a net reduction in application run-time.

Significance. A working implementation of this architecture could supply a practical, migration-based alternative to static scheduling or checkpoint-restart techniques for MPI workloads on shared clusters, directly addressing load imbalance without requiring application changes. The design itself is a concrete engineering contribution, but the lack of any performance model, overhead measurements, or comparative experiments prevents assessment of whether the claimed net benefit materializes.

major comments (1)
  1. [Abstract] Abstract: the assertion that 'the outcome is reduced run-time by improved resource allocation' is presented as an achieved result, yet the manuscript supplies neither runtime measurements, baseline comparisons, error bars, nor any analysis of DiCOM module overhead versus the migration benefit; this leaves the central empirical claim without support.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree that the abstract overstates the empirical outcome and will revise the manuscript to align claims with the presented content.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the assertion that 'the outcome is reduced run-time by improved resource allocation' is presented as an achieved result, yet the manuscript supplies neither runtime measurements, baseline comparisons, error bars, nor any analysis of DiCOM module overhead versus the migration benefit; this leaves the central empirical claim without support.

    Authors: We agree with this assessment. The manuscript describes the architecture, the integration of MOSIX preemptive migration into Open-MPI, and the design of the DiCOM module for direct communication after migration, but contains no runtime measurements, baselines, or overhead analysis. The abstract claim will be revised to state that the approach is intended to enable reduced run-time via improved resource allocation, rather than asserting it as a demonstrated result. This change will be made in the next version. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper is a systems/engineering description of integrating MOSIX preemptive migration into Open-MPI plus a new DiCOM module for direct inter-process communication. No equations, fitted parameters, predictions, uniqueness theorems, or ansatzes appear in the provided abstract or described content. The outcome claim (reduced run-time via better allocation) is presented as the direct result of the architecture rather than any derivation that reduces to its own inputs by construction. No self-citation load-bearing steps exist. This is a standard non-circular implementation paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no mathematical model, fitted constants, or postulated entities; ledger is empty.

pith-pipeline@v0.9.0 · 5659 in / 943 out tokens · 24089 ms · 2026-05-25T13:19:50.754907+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages

  1. [2]

    low-to-middle-end desktop computers rather then specialized high-end HPC hardware, which is referred to as just a \Cluster

    DiCOM is slightly slower than migrated TCP/IP for small message sizes (below 32k). For larger message sizes, DiCOM is increasingly better than TCP/IP between migrated processes, with an average improvement of 52% for all of the measured message sizes. Figure 1: DiCOM compared to TCP/IP TCP send/recv latency with and without migration (red and orange respe...

  2. [3]

    Today the DiCOM module is ready to be sub- mitted

    Open-MPI source code - Downloaded from openMPI.org, provided with some common usage code examples, no special relation to MOSIX existed. Today the DiCOM module is ready to be sub- mitted

  3. [4]

    Latest version of MOSIX - Updated several times during the project, including change from 32 bit to 64, some source code was also available. Mosix provides a well build information dissemination daemon (\infod"), and as for today usage of Direct Communication is enabled, but without TCP/IP compatibility (even though it exists and works awlessly, as can be...

  4. [5]

    CLIP - Cluster management tool

  5. [6]

    Standard Linux libraries and kernel - Due to the nature of MOSIX, parts of it are compiled in the Linux kernel; therefore, with every change done to the part of DiCOM which is embedded in the kernel, we had to recompile the entire OS

  6. [7]

    Matlab - Output data analysis and presentation

  7. [8]

    Virtual Box - virtual machine client. 2.2.2 Development and Test Environment In order to comply with very high compatibility, we developed and tested DiCOM on a verity of envi- ronments including: Clusters of raging sizes (2 to 50 nodes), Heterogeneous and homogeneous Clusters, Busy and free Clusters, 32 bit and 64 bit architectures, Intel, SPARC and virt...

  8. [9]

    The average slowdown is about 17% for all message sizes

    The performance of TCP/IP between non-migrated process is slightly better then DiCOM for all message sizes. The average slowdown is about 17% for all message sizes. This is due to the xed overhead of the DiCOM protocol

  9. [10]

    For larger message sizes, DiCOM is increasingly better than TCP/IP between migrated processes, with an average improve- ment of 52% for all of the measured message sizes

    DiCOM is slightly slower than migrated TCP/IP for small message sizes (below 10k). For larger message sizes, DiCOM is increasingly better than TCP/IP between migrated processes, with an average improve- ment of 52% for all of the measured message sizes. From Figure 3.4, which present the latencies for di erent message sizes of TCP/IP and DiCOM without pro...

  10. [11]

    We have solved problems concerning the integration of two vastly used scienti c tools, Open-MPI and MOSIX

  11. [12]

    We have constructed a tool enabling simple integration of DiCOM into various TCP/IP enabled applica- tions running on MOSIX

  12. [13]

    Extensive tests and measurements indicate that the developed tool is stable; it indeed reduces the total runtime of parallel OMPI jobs over MOSIX, and is ready for deployment . 4.3 Future Work The most signi cant of possible future developments of this project very well may be the integration of the DiCOM module into the standard MOSIX TCP (see Figure 4.1...

  13. [14]

    Parallel computing oriented scheduler built into the MOSIX system

  14. [15]

    Introduction of the DiCOM module to other parallel computing environments, such as OpenMP and MPICH

  15. [16]

    instance

    Creation socket oriented mailbox, this will enable MOSIX programmers to create a lter free, faster DiCOM module. Figure 4.1: Future work - Suggested structure Suggested structure of mosix embedded DiCOM module with communication manner optimization mechanism. 14 Appendix A MPI & Open-MPI (OMPI) MPI (Message Passing Interface) is a language-independent com...

  16. [17]

    The core of the protocol involves periodic, paired, inter-process interactions

  17. [18]

    The information exchanged during these interactions is of bounded size

  18. [19]

    A gossip interaction does not occur when A pings B just to measure the response time, as this does not involve the transmittal of state between agents

    When agents interact, the state of at least one agent changes to re ect the state of the other. A gossip interaction does not occur when A pings B just to measure the response time, as this does not involve the transmittal of state between agents

  19. [20]

    Reliable communication is not assumed

  20. [21]

    The frequency of the interactions is low compared to typical message latencies so that the protocol costs are negligible

  21. [22]

    neighbors

    There is some form of randomness in the peer selection. Peers might be selected from the full set of nodes or from a smaller set of "neighbors". As mention before, MOSIX is using a speci c gossip protocol: Dissemination protocols (or rumor- mongering protocols) - These use gossip to spread information; they basically work by ooding agents in the network, ...

  22. [23]

    R. H. Castain, T. S. Woodall, D. J. Daniel, J. M. Squyres, B. Barrett, and G. E. Fagg. The open run- time environment (OpenRTE): A transparent multi-cluster environment for high-performance com- puting. In Proceedings, 12th European PVM/MPI Users' Group Meeting, Sorrento, Italy, September 2005

  23. [24]

    K. M. Chandy and J. Misra. Parallel program design: a foundation . Addison-Wesley, 1988

  24. [25]

    R. K. Chellappa. Cloud computing - emerging paradigm for computing. in INFORMS, Dallas, TX, 1997

  25. [26]

    R. L. Graham, G. M. Shipman, B. W. Barrett, R. H. Castain, G. Bosilca, and A. Lumsdaine. Open MPI: A high-performance, heterogeneous MPI. In Proceedings, Fifth International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Networks , Barcelona, Spain, September 2006

  26. [27]

    F. Halsall. Data communications, computer networks and open systems (4th ed.) . Addison Wesley Longman Publishing Co., Inc., Redwood City, CA, USA, 1995

  27. [28]

    R. H. Katz. Tech titans building boom - the race to build the mega data centers that will power cloud computing. IEEE Spectrum, 46(2):36 { 39, 46 { 49, February 2009

  28. [29]

    Kauhaus, A

    C. Kauhaus, A. Knoth, T. Peiselt, and D. Fey. E cient message passing on multi-clusters: An ipv6 extension to Open MPI. In Proceedings of KiCC'07, Chemnitzer Informatik Berichte , February 2007

  29. [30]

    Keren and A

    A. Keren and A. Barak. Opportunity cost algorithms for reduction of I/O and interprocess com- munication overhead in a computing cluster. IEEE Trans. Parallel Distrib. Syst. , 14(1):39{50, 2003

  30. [31]

    A. L., B. A., D. Z., and O. M. Randomized gossip algorithms for maintaining a distributed bulletin board with guaranteed age properties. Concurrency and Computation: Practice and Experience , 21(15):1907 { 1927, March 2009

  31. [32]

    MOSIX man Pages , 2009

    MOSIX.org. MOSIX man Pages , 2009. Retrieved from mosix.org: http://www.mosix.org/wiki/index.php/MOSIX-wiki

  32. [33]

    W. R. Stevens and G. R. Wright. TCP/IP Illustrated: The implementation . Addison-Wesley, 1995

  33. [34]

    C. H. Still. Scienti c Programming: Portable parallel computing via the MPI message-passing standard. COMPUTERS IN PHYSICS , 8(5):533{538, sep/oct 1994

  34. [35]

    L. M. Vaquero, L. Rodero-Merino, J. Caceres, and M. Lindner. A break in the clouds: towards a cloud de nition. SIGCOMM Comput. Commun. Rev. , 39(1):50{55, 2009

  35. [36]

    L. Wall, T. Christiansen, and J. Orwant. Programming Perl. O'Reilly Media, Inc., 3 edition, 2000

  36. [37]

    D. Z. and B. A. E cient algorithms for routing information in a multicomputer system. In Dis- tributed Algorithms on Graphs: Proceedings of the 1st International Workshop on Distributed Algo- rithms, pages 41{48, Ottawa, Canada, August 1985. Carleton Univ. Press. 19 Index algorithm, gossip, 3, 4, 7, 17 ANSI C, 6, 9 balancing, load, iv, 2, 3, 6 BTL, 16, 18...