Fault tolerance is ensuring if a failure happens, a system is will able to operate.
To achieve fault tolerance, redundancy of systems and components can be used to ensure a system or network to remain operational.
Multiple storage devices, network interface cards, power supply units, network devices, internet connections, etc.
Defines how close a system or platform can achieve a 100% uptime annually.
Redundancy Array of Independent Disks
Allows a system you have multiple hard drives and data is written across all drives.
If a disk fails, all the data is not lost.
Uses 2 disk on a system.
Data is written into blocks.
Each block is written across each disk.
Provides improved performance.
Does not provide redundancy.
Uses 2 disks on a system.
Data is written equally on both disk.
Does not provide performance.
Provides data redundancy.
Uses 3 disks on a system.
Writes data into blocks across each disk.
Creates a parity block which it stored on a different disk.
Provides improved performance.
Provides data redundancy.
It's RAID 1 + 0.
Uses 4 disks on a system.
A pair of disk operates in RAID 0.
Each pair of disk is mirrored using RAID 1.
A Storage Area Network (SAN) is a dedicated area within an organization contains multiple servers, networking devices to provide high-availability and high-speed data transfers.
All networking devices such as switches and servers are using multiple network interface cards to provide redundancy.
NIC Teaming allows a server with multiple NICs to function in active/standby.
This allows both NICs to function by a logical network interface card to provide high bandwidth.
A cluster allows multiple servers to pool their resources together.
Using a load balancer can distribute the load to multiple servers.
Dual power supply units on a system.
Uninterruptible power supply (UPS).
Use of power generators.
Power distribution units (PDUs) are installed within a server rack and can be remotely managed.
This is a duplication of the organizations such as hardware, software and data.
This type of site has everything duplicated and ready-to-go when needed.
This types of recovery site may have server racks and is just enough to be up and running.
The company has to bring their data and hardware resources.
This site has no hardware at all.
It’s an empty building or room space.
The company will be required to bring their data, software, hardware, network components and their employees.
When restoring applications, it's important to give priority to those applications which are mission-critical to the organization.
Organizations should clearly defined the list of more important applications and data which should be restored primarily if a security incident or disaster should occur.
Full - A full backup is when an administrator backups up everything on a system such as the operating system and all the files.
Incremental - An incremental backup is all the files that has been changed or created since the last incremental backup.
Differential - A differential backup is all the files that has been changed or created since the last full backup.
When a virtual environment, hypervisors supports the creation of snapshots.
Snapshots allows you to save everything details about a guest operating system or virtual machine.
When planning a recovery site in the event of a security event or disaster, it's important to consider many factors such as:
Environmental threats
Geolocation of recovery site
Legal implications
Having an off-site location to back up your data is known as Vaulting.
The off-site location may be owned by the organization or by a service provider.
Consider both the physical and digital security of the off-site location.
Consider of the off-site location may be complaint to various regulatory practices and standard.
When considering the location of the recovery site, it's important to consider there a little to no history of disasters occurred in the area.
It's also important to consider the time to commute to the recovery site by support staff.
Consider
When choosing a recovery location you need to consider the legal implications of where the data is being stored at the facility.
Ensure your legal team is consultant before choosing a location.
Data sovereignty means if your data is stored within a facility that is located in another country, the data will be subject to any legal laws of that country.
Some laws may require your data stays within your native country for legal purposes.
When planning for a business continuity, it's important to perform regular exercise to ensure everyone is prepared.
These exercises may cost a lot of money and can be very time consuming.
A Tabletop exercise allows an organization to reduce cost and time by simply discussing a simulated disaster.
In a tabletop exercise, persons does not physically participate but rather discusses that happens at reach stage of the plan.
After completing the a disaster recovery exercise, the after-action report is required.
The report may contain the details of each step of the methodology and any explanations through the procedures.
Ensure details about everything that worked smoothly and those that did not work as expected.
Having a failover site is important in the event disaster occurs, it's easy to migrate to the failover site.
Ensure all data is full replicated or synchronized between the organization and the failover site.
During a disaster, things may not always go as planned. It's important to alternate between different methods of achieving the same task.
This technique is useful in the event the network or devices such as printers are not available to print a receipt for a customer.
It's important to ensure proper documentation is kept for all the primary and alternate business process before a disaster occurs.
Boots the system to a good working configuration.
Affects the boot process of a system.
The operating system allows system administrators to revert the system back to a known good working state.
Reverting back a snapshot when working with virtualization technologies.
Windows System Restore is an example of reverting back to a known state.
Allows a system administrator to boot an operating system a bootable media such as a CD, DVD or USB.
The operating system boots from the USB drive and loads into RAM.
Provides recovery options to repair the host operating system.
Boots an operating system across a network.