How to deal with latency introduced by disk I/O activity…

The techniques used by Linux to get dirty pages onto persistent media have changed over the years.  Most recently the change was from a gang of threads called (pdflush) to a per-backing-device thread model.  Basically one-thread-per-LUN/mount.  While neither is perfect, the fact of the matter is that you shouldn’t really have to care about disk interference with your latency-sensitive app.  Right now, we can’t cleanly apply the normal affinity tuning to the flush-* bdi kthreads and thus cannot effectively shield the latency-sensitive app entirely.

I’m going to stop short of handing out specific tuning advice, because I have no idea what your I/O pattern looks like, and that matters.  A lot.  Suffice it to say that (just like your latency-sensitive application), you’d prefer more frequent smaller transfers/writes over less frequent, larger transfers (which are optimized for throughput).
Going a step further, you often hear about using tools like numactl or taskset to affinitize your application to certain cores, and chrt/nice to control the task’s priority and policy related to the scheduler.  These flusher threads are not easy to deal with.  We can’t apply the normal tuning using any of the above tools, because the flush-* threads are kernel threads, created using the kthread infrastructure.  bdi flush threads take a fundamentally different approach than other
kthreads, which are instantiated on boot (like migration), or module insertion time (like nfs). There’s no way to set a “default affinity mask” on kthreads, and kthreads are not subject to isolcpus.

Even up to the current upstream kernel version, the flush-* threads are started on-demand, (like when you mount a new filesystem), and then they go away after some idle time. When they come back, they have a new pid.  That behavior doesn’t mesh well with affinity tuning.

For example, in the case of nfsd kthreads, since they do not come and go after they are first instantiated, you can apply typical affinity tuning and get measurable performance gains.

For now:

  • Write as little as possible.  And/or write to (shared)memory, then flush it later.
    • Take good care of your needed data, though!  Memory contents go bye-bye in a “power event”.
  • Get faster storage like a PCI-based RAM/SSD disk
  • Reduce the amount of dirty pages kept in cache.
  • Increase the frequency at which dirty pages are flushed, so that there is less written each time.
Further reading:
http://lwn.net/Articles/352144/ (subscription required)
http://lwn.net/Articles/324833/ (subscription required)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s