A Case for Redundant Array of Inexpensive Disks (RAID)

An example of Amdahl’s Law. Consider a disk bound application that spends 9 out of every 10 seconds waiting on disk and 1 second performing computation. A 2x faster CPU would provide only a 5% reduction in overall application run time, because the application would still spend 9 out of every 10 seconds waiting for disk.
The key idea behind RAID is to replace a larger costlier disk with an array of smaller cheaper (per byte) disks. Here we consider replacing one large disk with 4 small disks each of which has 1/4th the capacity, 1/2 the throughput, and 1/8th the price (i.e., 1/2 the cost per byte). Such a replacement would result in higher throughput at lower cost. The problem (akin to RAID-0 discussed later) is that the reliability of data would reduce (even assuming that each of the individual cheaper disks had the same reliability as the expensive disk to begin with). This is because the more the number of disks, the higher the probability that at least one of them will fail, resulting in a data loss. RAID fixes this problem by adding redundant data in the array of disks.
RAID-0: Stripe data across disks without any redundancy. Here we are considering a sector granular striping (S-N means sector N). A collection of stripe units from all the disks (e.g., S-0, S-1, S-2, and S-3) form a stripe.
RAID-1: Data is mirrored across the two disks. An interesting thing to note is that each “disk” in a RAID-1 organization can be a RAID-array (e.g., RAID-5 array of 5 disks) itself. In fact, any disk in any RAID-array can itself be a RAID-array. As such, RAID is a nomenclature for storage organization and hence organizations like a RAID-1 of RAID-5s, or a RAID-5 of RAID-1s are valid storage architectures.
RAID-2: Data is striped at a byte granularity and an error correcting code (ECC) is used for redundancy. Here, B-N represents byte-N and we consider a 512 byte sector, which means 64 bytes per sector.
RAID-3: Data is striped at a byte granularity and parity a.k.a., XOR (i.e., addition in binary) is used as redundancy. Here, B-N represents byte-N and we consider a 512 byte sector, which means 64 bytes per sector.
RAID-4: Data is striped at a sector granularity and parity a.k.a XOR is used for redundancy. Here S-N represents sector-N.
RAID-5: Data is striped at a sector granularity and parity a.k.a XOR is used for redundancy. The difference between RAID-4 and RAID-5 is that there is no single disk that holds the parity sectors in RAID-5. Instead, the parity (as well as data) sectors are spread across the disks in a round-robin fashion.
  • A key design choice for RAID systems is that of the stripe unit size, i.e., the granularity at which data is striped across the disks. For example, RAID-2 and RAID-3 stripe data at a byte granularity, which is a sub-optimal choice in hindsight, and are rarely, if ever, used in practice today. Most systems use a stripe unit size of a sector (like RAID-4 and RAID-5) or larger, but the optimal stripe unit size is hard to determine as it is workload dependent. Smaller stripe units leverage parallelism for large accesses leading to improved throughput. However it can also increase latency because the disk heads in all the disk must perform a seek and the largest seek time across all the disks determines the latency. In contrast, with large stripe units, some of the accesses require seeking on only a single disk, improving latency but at the cost of reduced parallelism.
  • Another interesting performance metric for RAID systems is their degraded read performance. This is the read performance when one of the disk has failed. For example, for RAID-4 and RAID-5, the degraded read performance for large reads is comparable to that of large reads without any failure — the controller reads the same number of sectors, just that one of the sectors is a parity sector for a degraded read and the controller has to perform some computation to construct the failed sector using the parity and existing data sectors. The degraded read performance is particularly relevant for repairing a failed disk in a RAID system. Upon failure, data must be recovered and written on a new disk using degraded reads.
  • The RAID levels discussed here can tolerate only a single disk failure. However, this became a problem with increasing disk capacities. As disk capacities increased, so did the time to recover data for a failed disk onto a new disk. Soon enough, this time for recovery was large enough that there was a real risk of another disk failure during the repair process, which would render the RAID system irrecoverable. Consequently, there were RAID systems developed to tolerate 2 (or even more) simultaneous disk failures.
  • RAID systems also face the challenge of crash consistency (the data on disks in the array is inter-related and must be updated atomically) and solve it using one of the standard mechanisms (e.g., battery backed cache or journaling).

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store