High Availability and Fault Tolerance

Each and every organization wants to provide seamless and continuous service without any interruptions to their internal as well as external clients in case of planned or unplanned maintenance activity.

When it comes to unplanned or unpredictable circumstances, there comes "Fault Tolerance." So, now the question is what is Fault Tolerance, what kind of solution this is and how it can help any organization to provide seamless service. To explain in simple terms, I will give you an example of a physical server with some hard drives with RAID configuration on it. In such case, if any of the hard drives fails, server will still be functional without any production impact, and monitoring system in place, for example, if SCOM agent is installed on the server, SCOM agent will generate an alert that the hard drive on that physical server is defunc, so later, the support team can replace the faulty hard drive. Benefit here is no impact to any clients. No clients would know that the hard drive was failed, and later, replaced and rebuilt. That is fault tolerance, but bear in mind, this solution is quite expensive than the high availability solution. The same kind of solutions - HA and FT - are available in VMWare and Microsoft's virtualization environments, but as I said, FT in VMWare, for example, is hard to maintain, because you need two copies of each VM on separate ESXi host machines. In short, there won't be service interruption in Fault Tolerant environment, but comes with high cost.

High Availability is being used for planned maintenance. For example, highly available clusters in VMWare and Microsoft's Hyper-V environments. For example, you have a failover cluster of five host machines, and you need to deploy the patches to all host machines. In this case, you will put the host machines in maintenance mode, and patch them one by one. You start with the first host machine, you drain it meaning vMotion all VMs from this host machine to other or in case of Hyper-V, you live migrate the VMs from one host to other hosts to drain it before patching it for high availability and no impact. One thing to note here is that the VMs being live migrated from one host to other hosts will lose the heartbeats for fraction of seconds, which means there will be a little bit of hiccups. Comparatively, this solution is less expensive than Fault Tolerance. If ESXi host machine is failed and stopped, vMotion won't work in that case. Or if one of host machines in Failover cluster has failed or bugchecked (i.e. BSOD), the VMs running off that host machine will be unavailable and will appear to be down. In short, high availability comes with minimal service interruptions. Some organizations prefer to absorb the minimal downtime with high availability than paying lot more money for fault tolerance.

It's all about Server Infrastructure Support - by Nirav Soni

Search This Blog

High Availability and Fault Tolerance

Comments

Post a Comment

Popular posts from this blog

Microsoft Azure Administrator (AZ-104) Course in Hindi

Working with Server Core machine

Setup Wireshark capturing for a remote Windows server