Linux Software RAID Performance Comparisons
The goal of this study is to determine the cheapest reasonably
performant solution for a 5-spindle software RAID configuration
using Linux as an NFS file server for a home office. Normal I/O
includes home directory service, mostly-read-only large file service
(e.g., MP3s), and nightly rsync-based backup.
Those with more spindles, more users, or more complex I/O
patterns should not expect these results to apply to or scale
to their environment.
Experimental details are provided below. Here is a summary of
the recommendations.
When using software RAID, on mostly-single-threaded workloads,
the Supermicro AOC-SAT2-MV8 has superior price/performance
compared with the Promise SuperTrak STTX8650.
The chunksize for 5-spindle RAID-5
should be 128KB ("mdadm ... -c 128").
RAID-5 and RAID-6 have nearly identical performance for I/O
sizes less than about 256KB. Never use RAID-0.
AES-128 is fastest, although AES-256 is comparable for most
workloads and should be considered for potentially better security.
When using a chunksize of 128KB for 5-spindle RAID-5, use the
recommended full-stripe size of 512KB (1024 sectors) for the
align-payload parameter for crypsetup.
When using a 5-spindle RAID-5 with a chunksize of 128KB and a
value of 512KB for the align-payload parameter, using the
recommended stride of 128KB and stripe-width of 512KB is
reasonable for both ext3 and ext4.
Quite similar for these uncached tests. Ext4dev may provide
better multi-threaded and mixed read/write performance.
Memory speed doesn't much matter.
CPU speed is significant.
Larger [rw]size is better for multi-threaded workloads, but
worse for single threaded workloads. 64KB is a good
compromise.
For 5-spindle RAID-5, chunk size should be 128k. The AES-256 encrypted
file system payload alignment should be 4 times this value, or
512k. The file system should be informed of these values as
a stride of 128k and a stripe-width of 512k. For NFS, the rsize
and wsize values should be at least 64k. NFSv4 should be used
because it avoids lockd issues.
As a departure from this summary, consider AES-128 for
improved write performance.
Final Build
After this analysis, the file server was built with the following
commands:
- fdisk -H 224 -S 56 /dev/sd[bcdef], start 1, end 155000
The fdisk parameters were recommended by Ted T'so
for SSDs and should not help with spinning media.
- badblocks -b 4096 -s -v -w -t random /dev/sd[bcdef]1
When
running 5 copies simultaneously, 2 drives sustained
30MB/s writes, and the other 3 sustained 15MB/s
-- the difference being in the controller's ports. Reads were 40MB/s
on the two drives attached to the faster ports, and 20MB/s on the
other ports.
- mdadm -C /dev/md0 /dev/sd[bcdef]1 -c 128 -n 5 -l 5
The -c
parameter sets the chunk size to 128k. RAID reconstruction proceeded
by reading at about 24MB/s from 4 disks and writing at 24MB/s to
the other disk.
- cryptsetup -c aes-cbc-essiv:sha256 -s 128 -h sha256 --align-payload=1024 luksFormat /dev/md0
This uses aes-128 with a 512k payload alignment.
- cryptsetup luksOpen /dev/md0 r0
- pvcreate --metadatasize 506k /dev/mapper/r0
This is also
from Ted's blog, but in this case I want to align the boundaries
to 512k, since that is the payload alignment for encryption.
- pvs /dev/mapper/r0 -o+pe_start
- vgcreate -s 1g v0 /dev/mapper/r0
- lvcreate -L 2t -n m v0
- lvcreate -L 300g -n o v0
- mke2fs -t ext4 -i1048576 -m0 -E stride=32,stripe-width=128 /dev/mapper/v0-m
- mke2fs -t ext4 -i65536 -m0 -E stride=32,stripe-width=128 /dev/mapper/v0-o