Skip to content

[Perf] Implement async IO for the block device using io_uring #1600

Open
@ioanachirca

Description

@ioanachirca

After some more debugging there is no doubt that there are 2 major issues which bring down the overall performance of our block device emulation:

  • serialized block file I/O
  • high guest interrupt rate

Our block device emulation works by sequentially processing guest I/O requests. It will also serialize the latency of each request and this results in very low CPU (host/guest) usage when running fio benchmarks. The Firecracker emulation thread is mostly sleeping/waiting for I/O to complete and fio maxes out 4-5K IOPS.
To improve on this we will need to add block async I/O support in Firecracker.

The increased guest interrupt overhead becomes significant only when the Firecracker block device is doing a lot of IOPS (70-80k). That happens only if it is backed by a RAMdisk to minimize latencies. As an improvement we will need to implement virtio driver and device event suppression / interrupt mitigation. I've experimented a bit with this and got a nice performance boost.

The current plan is to use io_uring to parallelize the block I/O operations.
The io_uring interface was added in kernel 5.1 so this will be available only when running on host kernels that support io_uring, otherwise the emulation will work serially as before.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Roadmap: TrackedItems tracked on the roadmap project.

    Type

    No type

    Projects

    Status

    Developer Preview

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions