A Summary on “Improving Fault Tolerance by Virtualization and Software Rejuvenation”

Sanjog Sigdel, Computer Science, Kathmandu University
E-mail:- sigdelsanjog@gmail.com


The phenomenon that the state of software degrades with time is known as software aging. And the primary method to fight aging is software rejuvenation. Software aging phenomenon is the one in which state of a software system gradually degrades with time and eventually leads to performance degradation, transient failures or even crashes of applications. Fault tolerance is designed to allow a system to tolerate software faults that remain in the system after its development Primary causes of this aging/degradation are exhaustion of operating system resources, data corruption, and numerical error accumulation. Some common examples are memory bloating and leaking, unreleased file-lock, data corruption, storage space fragmentation and accumulation of round-off errors. Software rejuvenation involved occasionally stopping the software application, cleaning its internal state and/or environment and then restarting it.


Software rejuvenation are the most natural procedure to counteract software aging. Virtualization is a hot topic in the technology world which enables a single computer to run multiple operating systems simultaneously. Virtual Machine Monitor(VMM) offers a degree of flexibility for management of complete IT-environments and helps to increase system reliability. Virtualization technology and software rejuvenation can be used to prolong the availability of services. Paper proposes a way of incorporating three virtual machines on top of virtualization middleware layer for each application server. A software load balancer(VLM-LB) will be run on the VM1 which will be responsible for the detection of software aging. Active VM) primary will be used to run the main application server and a standby Virtual machine will be used as a replica of the application server which acts as hot-standby to make sure the system needn’t face failure time. Virtualization technology can improve the software rejuvenation action and can reduce an application’s downtime and the cost due to downtime.


Following statements were identified on the paper which were found to be promising as well as contradicting:

  • Paper stated that during the rejuvenation process, the system can provide continuous service except non-virtualized scenario. But at the same time it also stated that system is not available in rejuvenation state which is somehow not promising.
  • Paper proposes segregated models of virtual machines on top of virtualization middleware which looks effective to adapt in complex and high scale software applications.
  • Detailed model od system with calculation equations for Availability(A), Downtime(DT) and Cost(C) looks promising for software companies to adapt virtual machines for fault tolerance.
  • Analysis such as “Predictable shutdown cost is far less than that of unexpected shutdown (Cf >> Cr)” can be very helpful for enterprises to determine whether or not to use any software platform. Such results also make a promising point to move towards virtualization system and adapt preventive measures to overcome fault tolerance.