A Comparison of NFSv4 rsize and wsize Values

Linux Software RAID Performance Comparisons (2012)

The Problem

Various web sites recommend very large rsize and wsize values (and Linux defaults to 1MB).

The Controller

The Test System

The Test Matrix


For small block reads, r/wsize=32k should be avoided because 64k block read performance is degraded. For small block writes, performance was similar across r/wsizes, with 256k, 128k, and 1m performing better at some load points.

Large block read performance showed significant differences with changes in r/wsizes, with values ofr 32k, 64k, and 128k having the highest and most consistent performance.

For large block writes, behavior for all r/wsizes is consistent.

Asynchronous (the "async" mount option) was disappointing. The nfs server timed out often, making test results difficult to obtain (those shown are the best of 2 runs, and even then, several data points are missing). These timeouts may have been caused because the underlying disc subsystem could not keep up with the pending IO stream -- however, for large seuqential writes, our earlier benchmarking showed local ext4fs performance several times higher than wire speed. Even if this is the problem, NFS should not timeout.

When asynchronous IO did not cause server timeouts, peak sequential write throughput matched wire speed for smaller sequences. Hence, bursts in IO may be easier to accept by the server, but sustained IO caused timeouts instead of throttling, which ended up decreasing overall throughput below fully synchronous levels.

Based on various recommendations found on the Internet, another asynchronous run was done with 32 server threads (vs. the default of 8) and with the following sysctl settings:

These settings did not eliminate timeouts. Further, an additional run with synchronous IO showed no significant change in IO behavior, as demonstrated at the 64KB r/wsize point in the graphs below.

Recommendation: Avoid 32k and 1m. Use 64k or 128k. Use "sync" and other defaults.


Small Block Tests

Large Block Tests

Asynchronous (SEE NOTE)

NOTE: With only 8 nfs threads on the server, there were many NFS timeouts, resulting in poor performance. The following graphs are made from two different run, taking the best value between the two runs, and removing values that were obviously taken during a timeout.

Small Block Tests

Large Block Tests

Synchronous with 32-threads and r/wmem* Changes

Small Block Tests

Large Block Tests