In the Linux kernel, the following vulnerability has been resolved:
nbd: fix io hung while disconnecting device
In our tests, "qemu-nbd" triggers a io hung:
INFO: task qemu-nbd:11445 blocked for more than 368 seconds. Not tainted 5.18.0-rc3-next-20220422-00003-g2176915513ca #884 "echo 0 > /proc/sys/kernel/hungtasktimeoutsecs" disables this message. task:qemu-nbd state:D stack: 0 pid:11445 ppid: 1 flags:0x00000000 Call Trace: <TASK> _schedule+0x480/0x1050 ? rawspinlockirqsave+0x3e/0xb0 schedule+0x9c/0x1b0 blkmqfreezequeuewait+0x9d/0xf0 ? ipirseq+0x70/0x70 blkmqfreezequeue+0x2b/0x40 nbdaddsocket+0x6b/0x270 [nbd] nbdioctl+0x383/0x510 [nbd] blkdevioctl+0x18e/0x3e0 _x64sysioctl+0xac/0x120 dosyscall64+0x35/0x80 entrySYSCALL64afterhwframe+0x44/0xae RIP: 0033:0x7fd8ff706577 RSP: 002b:00007fd8fcdfebf8 EFLAGS: 00000246 ORIGRAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000000040000000 RCX: 00007fd8ff706577 RDX: 000000000000000d RSI: 000000000000ab00 RDI: 000000000000000f RBP: 000000000000000f R08: 000000000000fbe8 R09: 000055fe497c62b0 R10: 00000002aff20000 R11: 0000000000000246 R12: 000000000000006d R13: 0000000000000000 R14: 00007ffe82dc5e70 R15: 00007fd8fcdff9c0
"qemu-ndb -d" will call ioctl 'NBD_DISCONNECT' first, however, following message was found:
block nbd0: Send disconnect failed -32
Which indicate that something is wrong with the server. Then, "qemu-nbd -d" will call ioctl 'NBDCLEARSOCK', however ioctl can't clear requests after commit 2516ab1543fd("nbd: only clear the queue on device teardown"). And in the meantime, request can't complete through timeout because nbdxmittimeout() will always return 'BLKEHRESET_TIMER', which means such request will never be completed in this situation.
Now that the flag 'NBDCMDINFLIGHT' can make sure requests won't complete multiple times, switch back to call nbdclearsock() in nbdclearsock_ioctl(), so that inflight requests can be cleared.