| Originally, as envisaged in 1987 by Patterson, | | | | files, those less than the sum of a strip each |
| Gibson and Katz from the University of California | | | | from the working drive there will be files that are |
| in Berkeley, the acronym RAID stood for a | | | | fortunately intact, for larger files (e.g. Exchange or |
| "Redundant Array of Inexpensive Disks". In short | | | | SQL databases) there will be considerable data |
| a larger number of smaller cheaper disks could be | | | | loss and structural damage and low level work will |
| used in place of a single much more expensive | | | | be required to salvage any useful data from |
| large hard disk, or even to create a disk that was | | | | them. |
| larger than any currently available. | | | | For RAID levels where there is parity and the |
| They went a stage further and postulated a | | | | chance to recover from a single disk failure then |
| variety of options that would not only result in | | | | the most common problems were see are: |
| getting a big disk for a lower cost, but could | | | | Degraded running |
| improve performance, or increase reliability at the | | | | A single disk fails and is ignored, or there is not a |
| same time. Partly the options for improved | | | | spare available and so one is ordered. Either way |
| reliability were required as using multiple disks | | | | the RAID unit stays in operation but with a disk |
| gave a reduction in the | | | | missing so there is no longer any redundancy. |
| Mean-Time-Between-Failure, divide the MTBF for a | | | | Usually the hard disks in a RAID are part of the |
| drive in the array by the number of drives and | | | | same manufacturing batch, have been stored and |
| theoretically a RAID will fail more quickly than a | | | | run in the same environment, if the unit has been |
| single disk. | | | | mis-handled then each disk in the RAID has been |
| Today RAID is usually described as a "Redundant | | | | mis-handled. So, there is quite a good chance that |
| Array of Independent Disks", technology has | | | | another drive will fail sometime soon, if not for |
| moved on and even the most costly disks are | | | | any of the reasons just given but because bad |
| not particularly expensive. | | | | things don't happen singly. |
| Six levels of RAID were originally defined, some | | | | Multiple failure |
| geared towards performance, others to improved | | | | Striped RAID is fault tolerant if a single drive fails |
| fault tolerance, though the first of these did not | | | | nice and cleanly. If multiple drives fail then the |
| have any redundancy or fault-tolerance so might | | | | RAID is lost, but also if one drive fails and |
| not truly be considered RAID. | | | | de-stabilises the SCSI bus. This can result in |
| RAID 0 - Striped and not really "RAID" | | | | multiple drives appearing to fail, the RAID unit |
| RAID 0 provides capacity and speed but not | | | | believes that they have failed, and so the RAID |
| redundancy, data is striped across the drives with | | | | will not operate. |
| all of the benefits that gives, but if one drive fails | | | | Configuration loss |
| the RAID is dead just as if a single hard disk drive | | | | When a RAID is configured information is stored |
| fails. | | | | about the order of the disks the size of a strip of |
| This is good for transient storage where | | | | data and so on. If there is a failure within the |
| performance matters but the data is either | | | | RAID controller and this information is lost then |
| non-critical or a copy is also kept elsewhere. Other | | | | the RAID will no operate, and it is not always |
| RAID levels are more suited for critical systems | | | | practicable to re-instate it. |
| where backups might not be up-to-the-minute, or | | | | Some RAID controllers will consider |
| down-time is undesirable. | | | | re-programming the RAID configuration as a |
| RAID 1 - Mirroring | | | | rebuild request and re-write to each of the disks |
| RAID 1 is often used for the boot devices in | | | | destroying the data. |
| servers or for critical data where reliability | | | | People making it worse |
| requirements are paramount. Usually 2 hard disk | | | | One of the worst sounds we hear with RAID |
| drives are used and any data written to one disk | | | | problems is that of human panic, and frantic |
| is also written to the other. | | | | attempts to repair the problem. "We're just going |
| In the event of a failure of one drive the system | | | | to try one more thing" is often the sound that |
| can switch to single drive operation, the failed | | | | signals the end of the data as a RAID is repaired |
| drive replaced and the data transferred to a | | | | with the disks in the wrong slots, or rebuild and |
| replacement drive to rebuild the mirror. | | | | set back to its original state. |
| RAID 2 | | | | What to do when a RAID fails |
| RAID 2 introduced error correction code | | | | STOP |
| generation to compensate for drives that did not | | | | THINK |
| have their own error detection. There are no | | | | Make sure that anything you do is going to be |
| such drives now, and have not been for a long | | | | non-destructive. |
| time. RAID 2 is not really used anywhere. | | | | Get Advice |
| RAID 3 - Dedicated Parity | | | | Do not let anyone push you into precipitous |
| RAID 3 uses striping, down to the byte level. This | | | | action, they might have a deadline and be applying |
| adds a hardware overhead for no apparent | | | | pressure but they will quickly forget their part in |
| benefit. It also introduces "parity" or error | | | | driving proceedings when the RAID is fatally |
| correction data on a separate drive so an | | | | damaged by a hurried repair attempt. |
| additional hard disk is needed that gives greater | | | | How can data be recovered from a RAID? |
| security but no additional space. | | | | Much of RAID recovery is the same as for a |
| RAID 4 - Dedicated Parity | | | | single disk recovery, data must be secured and |
| RAID 4 stripes to the block level, and like RAID 3 | | | | backed up to guarantee that the problem will not |
| stores parity information on a dedicated drive. | | | | be exacerbated. For logical problems the difficult |
| RAID 5 - The most common format | | | | work is all on the analysis of the file system, that |
| RAID 5 stripes at the block level but does not | | | | it is from a RAID makes no major difference |
| use a single dedicated drive for storing parity. | | | | once the RAID scheme has been identified and |
| Instead, parity is interspersed within the data, so | | | | the correct access to it worked out. |
| after each run of data stripes there is a strip of | | | | For mirrored RAID data can be "mixed and |
| parity data, but this changes then for the next | | | | matched" from the good sectors of two drives |
| set of stripes. | | | | to rebuild a good drive. With striped RAID |
| This could means, for example, that in a 3 disk | | | | schemes that use parity then data can be rebuild |
| RAID 5 there are data strips on disks 0 and 1 | | | | at the stripe level rather than on a per drive basis |
| followed by a parity strip on disk 2. For the next | | | | so if there are bad sectors throughout more than |
| set of stripes the data is on disks 0 and 2 with | | | | one drive these can be corrected individually. |
| the parity on disk 1, then data on disks 1 and 2 | | | | With non-redundant RAID schemes each sector |
| with parity on disk 0. | | | | that cannot read from a disk results in data loss |
| RAID 5 is generally faster for smaller reads, so | | | | from the RAID set. For redundant RAID |
| eminently suitable for server systems being | | | | schemes, however, there is much that can be |
| shared by large numbers of users created smaller | | | | done to rebuild when data is missing. Whilst a |
| data files or accessing smaller amounts of data | | | | RAID controller will take a disk off-line when it |
| each time. For other applications, however, RAID | | | | fails and operate in degraded mode rebuilding the |
| 4 will outperform RAID 5 quite considerably. | | | | data from the missing disk on demand, a data |
| Beyond RAID 5? | | | | recovery process can be somewhat more |
| Advances on RAID 5 do exist, though in general | | | | sophisticated. With properly written recovery |
| these use RAID 5 techniques and enhance them, | | | | software the level of granularity can be one |
| for example by mirroring two RAID 5 arrays, or | | | | sector rather than one disk so for each sector |
| by having 2 parity stripes. | | | | that fails the data can be rebuild so long as all |
| RAID data recovery | | | | sectors can be recovered from the remainder of |
| It might be imaged that with all of this fault | | | | the disks. Even if the next failed sector is on a |
| tolerance that data recovery would not be a | | | | different drive in the set, so long as the same |
| requirement, but things will still go wrong. | | | | sector can be read from the other disks then a |
| With all RAID levels logical corruption, damage to | | | | complete rebuild can be made. |
| the file system, has just as devastating effect as | | | | For levels of RAID that have greater redundancy, |
| with a single hard disk. You might have a robustly | | | | the number of failed sectors across a set of |
| stored file system, but it is a robustly stored and | | | | disks can be even greater without data loss. |
| corrupted file system. | | | | Even as data recovery specialists we are, |
| With RAID 0 the result of a failure of one disk is | | | | however, still bound by the rules of mathematics. |
| terminal for the RAID, if data cannot be | | | | If sector 99 is missing from both disks 0 and 4 in |
| recovered from the failed disk then a percentage | | | | a RAID5 set then rebuilding of the missing data is |
| of the data is lost for good, and since RAID uses | | | | not a possibility. |
| data striping, this could be like losing 1 MB of data | | | | Once the raid/disk issues have been resolved |
| out of every 4 MB, and the chances of that | | | | then the data recovery process can continue just |
| leaving any major files intact are low. For smaller | | | | as it would for a single disk. |