In the Linux kernel, the following vulnerability has been resolved:
nvme-rdma: unquiesce admin_q before destroy it
Kernel will hang on destroy admin_q while we create ctrl failed, such as following calltrace:
PID: 23644 TASK: ff2d52b40f439fc0 CPU: 2 COMMAND: "nvme" #0 [ff61d23de260fb78] _schedule at ffffffff8323bc15 #1 [ff61d23de260fc08] schedule at ffffffff8323c014 #2 [ff61d23de260fc28] blkmqfreezequeuewait at ffffffff82a3dba1 #3 [ff61d23de260fc78] blkfreezequeue at ffffffff82a4113a #4 [ff61d23de260fc90] blkcleanupqueue at ffffffff82a33006 #5 [ff61d23de260fcb0] nvmerdmadestroyadminqueue at ffffffffc12686ce #6 [ff61d23de260fcc8] nvmerdmasetupctrl at ffffffffc1268ced #7 [ff61d23de260fd28] nvmerdmacreatectrl at ffffffffc126919b #8 [ff61d23de260fd68] nvmfdevwrite at ffffffffc024f362 #9 [ff61d23de260fe38] vfswrite at ffffffff827d5f25 RIP: 00007fda7891d574 RSP: 00007ffe2ef06958 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 000055e8122a4d90 RCX: 00007fda7891d574 RDX: 000000000000012b RSI: 000055e8122a4d90 RDI: 0000000000000004 RBP: 00007ffe2ef079c0 R8: 000000000000012b R9: 000055e8122a4d90 R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000004 R13: 000055e8122923c0 R14: 000000000000012b R15: 00007fda78a54500 ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
This due to we have quiesced admiq before cancel requests, but forgot to unquiesce before destroy it, as a result we fail to drain the pending requests, and hang on blkmqfreezequeuewait() forever. Here try to reuse nvmerdmateardownadmin_queue() to fix this issue and simplify the code.