A Comparison of Cache Size for Software RAID-5

Linux Software RAID Performance Comparisons (2012)

The Problem

Various web sites recommend setting /sys/block/md0/md/stripe_cache_size to 32768 for improved read performance without providing compelling evidence that changing the default from 256 has broad benefits.

The Controller

LSI SAS9211-8i (SAS2008)
- 8 6Gbph ports
- PCIe 2.0
- Chipset: Fusion-MPT
- Linux driver: mpt2sas
- Cost: about $230 from amazon.com
- Configuration: JBOD

The Test System

Motherboard: Supermicro MBD-H8DCL-IF-O
Processors: Two 3.3GHz Opteron 4238 (Socket C32)
RAM: 64GB 1600MHz DDR3 (PC3-12800)
Slots: PCIe x8 (4000MB/s)
Drives: Seagate Barracuda 7200 3000Gbytes ST3000DM001
Drive cage: Supermicro CSE-M35T-1B 5-Bay Enclosure (fits in three 5-inch chassis bays; sells for about $100-$120 from newegg.com)
Debian Wheezy, Linux 3.2.0-3-amd64

The Test Matrix

Read Percentage: 100% (pure read), 0% (pure write)
Random Percentage: 100% (random)
Thread counts: 1
Small block sizes: 4k, 8k, 16k, 32k, 64k, 128k, 256k, 512k, 1m, 2m, 4m
Large block sizes: 4k, 8k, 16k, 32k, 64k, 128k, 256k, 512k, 1m, 2m, 4m, 8m, 16m, 32m, 64m, 128m, 256m, 512m, 1024m
Targets: All 5 driver were tested simultaneously, as well as each drive individually.
All small block I/Os are issued using the O_DIRECT flag.
All large block I/Os use a sequence of 8KB blocks, followed by an fsync, followed by a seek to the next "large block". This simulates random I/O at small block sizes and sequential I/O at large block sizes. By not using O_DIRECT and calling fsync instead, the Linux block system is tested, which simulated real-world NFS performance.
All tests last 30 seconds.

Conclusion

As expected, the setting of stripe_cache_size did not impact read performance. For random small blocks, a smaller cache was was helpful, providing a 54% improvement over the suggested 32768. However, the absolute performance difference at this point was under 100KB/s.

For large sequential writes, however, the default setting of 256 limited throughput to about 200MB/s, while larger cache sizes allowed thoughput to scale with write size, with a 59% improvement for 1GB writes using 32768 instead of 256.

(Not shown here, but availabe in the raw data, the difference for large sequential writes was smaller with smaller chunk sizes and larger with larger chunk sizes; with similar anomalies for the 64MB point; and without significant change in peak performance.)

Recommendation: 32768

A Comparison of Cache Size for Software RAID-5

The Problem

The Controller

The Test System

The Test Matrix

Conclusion

Small Block Tests Using Default Chunk Size

Large Block Tests Using Default Chunk Size

Small Block Tests Using Legacy Default Chunk Size

Large Block Tests Using Legacy Default Chunk Size