Description
After some more debugging there is no doubt that there are 2 major issues which bring down the overall performance of our block device emulation:
- serialized block file I/O
- high guest interrupt rate
Our block device emulation works by sequentially processing guest I/O requests. It will also serialize the latency of each request and this results in very low CPU (host/guest) usage when running fio benchmarks. The Firecracker emulation thread is mostly sleeping/waiting for I/O to complete and fio maxes out 4-5K IOPS.
To improve on this we will need to add block async I/O support in Firecracker.
The increased guest interrupt overhead becomes significant only when the Firecracker block device is doing a lot of IOPS (70-80k). That happens only if it is backed by a RAMdisk to minimize latencies. As an improvement we will need to implement virtio driver and device event suppression / interrupt mitigation. I've experimented a bit with this and got a nice performance boost.
The current plan is to use io_uring to parallelize the block I/O operations.
The io_uring interface was added in kernel 5.1 so this will be available only when running on host kernels that support io_uring, otherwise the emulation will work serially as before.
Metadata
Metadata
Assignees
Type
Projects
Status