In the Linux kernel, the following vulnerability has been resolved:
dpll: fix possible deadlock during netlink dump operation
Recently, I've been hitting following deadlock warning during dpll pin dump:
[52804.637962] ====================================================== [52804.638536] WARNING: possible circular locking dependency detected [52804.639111] 6.8.0-rc2jiri+ #1 Not tainted [52804.639529] ------------------------------------------------------ [52804.640104] python3/2984 is trying to acquire lock: [52804.640581] ffff88810e642678 (nlkcbmutex-GENERIC){+.+.}-{3:3}, at: netlinkdump+0xb3/0x780 [52804.641417] but task is already holding lock: [52804.642010] ffffffff83bde4c8 (dplllock){+.+.}-{3:3}, at: dplllockdumpit+0x13/0x20 [52804.642747] which lock already depends on the new lock.
[52804.643551] the existing dependency chain (in reverse order) is: [52804.644259] -> #1 (dplllock){+.+.}-{3:3}: [52804.644836] lockacquire+0x174/0x3e0 [52804.645271] _mutexlock+0x119/0x1150 [52804.645723] dplllockdumpit+0x13/0x20 [52804.646169] genlstart+0x266/0x320 [52804.646578] _netlinkdumpstart+0x321/0x450 [52804.647056] genlfamilyrcvmsgdumpit+0x155/0x1e0 [52804.647575] genlrcvmsg+0x1ed/0x3b0 [52804.648001] netlinkrcvskb+0xdc/0x210 [52804.648440] genlrcv+0x24/0x40 [52804.648831] netlinkunicast+0x2f1/0x490 [52804.649290] netlinksendmsg+0x36d/0x660 [52804.649742] _socksendmsg+0x73/0xc0 [52804.650165] _syssendto+0x184/0x210 [52804.650597] _x64syssendto+0x72/0x80 [52804.651045] dosyscall64+0x6f/0x140 [52804.651474] entrySYSCALL64afterhwframe+0x46/0x4e [52804.652001] -> #0 (nlkcbmutex-GENERIC){+.+.}-{3:3}: [52804.652650] checkprevadd+0x1ae/0x1280 [52804.653107] _lockacquire+0x1ed3/0x29a0 [52804.653559] lockacquire+0x174/0x3e0 [52804.653984] _mutexlock+0x119/0x1150 [52804.654423] netlinkdump+0xb3/0x780 [52804.654845] _netlinkdumpstart+0x389/0x450 [52804.655321] genlfamilyrcvmsgdumpit+0x155/0x1e0 [52804.655842] genlrcvmsg+0x1ed/0x3b0 [52804.656272] netlinkrcvskb+0xdc/0x210 [52804.656721] genlrcv+0x24/0x40 [52804.657119] netlinkunicast+0x2f1/0x490 [52804.657570] netlinksendmsg+0x36d/0x660 [52804.658022] _socksendmsg+0x73/0xc0 [52804.658450] _syssendto+0x184/0x210 [52804.658877] _x64syssendto+0x72/0x80 [52804.659322] dosyscall64+0x6f/0x140 [52804.659752] entrySYSCALL64after_hwframe+0x46/0x4e [52804.660281] other info that might help us debug this:
[52804.661077] Possible unsafe locking scenario:
[52804.661671] CPU0 CPU1 [52804.662129] ---- ---- [52804.662577] lock(dplllock); [52804.662924] lock(nlkcbmutex-GENERIC); [52804.663538] lock(dplllock); [52804.664073] lock(nlkcbmutex-GENERIC); [52804.664490]
The issue as follows: _netlinkdumpstart() calls control->start(cb) with nlk->cbmutex held. In control->start(cb) the dplllock is taken. Then nlk->cbmutex is released and taken again in netlinkdump(), while dplllock still being held. That leads to ABBA deadlock when another CPU races with the same operation.
Fix this by moving dpll_lock taking into dumpit() callback which ensures correct lock taking order.