Efficient and fault tolerant HLA-based simulation
Date of Issue2012
School of Computer Engineering
Parallel and Distributed Computing Centre
Distributed simulation subdivides a complex simulation (federation) into a group of simulation components (federates) and executes them in distributed manner. The High Level Architecture (HLA), an IEEE 1516 standard, provides a general framework for developing large-scale distributed simulations. The Runtime Infrastructure (RTI) is a middleware that controls the communication among federates according to the HLA interface specification. The simulation executions may involve a large number of computationally intensive federates and thus are time and resource consuming. What is worse, these federates may be subject to crash-stop and Byzantine failures and the risk of federation failure increases with the federation scale. In this thesis, we propose mechanisms to support efficient and fault tolerant HLA-based simulation by exploiting the advantages of decoupled federate architecture, in which a federate connects to federation through its corresponding Decoupled RTI Component (DRC). Workload imbalance generally leads to poor distributed simulation performance. To achieve load balancing, we propose to migrate federates from heavily-loaded computing nodes to lightly-loaded ones. Using the decoupled federate architecture, only needs the federate to be migrated to the destination computing node; whereas the DRC can stay at the same place and keep the connection to the federation. One-phase migration protocol is first proposed to illustrate the federate migration process. Then, two-phases and relay-based migration protocols are further developed to reduce migration overhead by overlapping federate migration with continuous federate execution.
DRNTU::Engineering::Computer science and engineering::Computer systems organization::Computer system implementation