In the Linux kernel, the following vulnerability has been resolved:
bonding: change ipsec_lock from spin lock to mutex
In the cited commit, bond->ipseclock is added to protect ipseclist, hence xdodevstateadd and xdodevstatedelete are called inside this lock. As ipsec_lock is a spin lock and such xfrmdev ops may sleep, "scheduling while atomic" will be triggered when changing bond's active slave.
[ 101.055189] BUG: scheduling while atomic: bash/902/0x00000200 [ 101.055726] Modules linked in: [ 101.058211] CPU: 3 PID: 902 Comm: bash Not tainted 6.9.0-rc4+ #1 [ 101.058760] Hardware name: [ 101.059434] Call Trace: [ 101.059436] <TASK> [ 101.060873] dumpstacklvl+0x51/0x60 [ 101.061275] _schedulebug+0x4e/0x60 [ 101.061682] _schedule+0x612/0x7c0 [ 101.062078] ? _modtimer+0x25c/0x370 [ 101.062486] schedule+0x25/0xd0 [ 101.062845] scheduletimeout+0x77/0xf0 [ 101.063265] ? asmcommoninterrupt+0x22/0x40 [ 101.063724] ? _bpftraceitimerstate+0x10/0x10 [ 101.064215] _waitforcommon+0x87/0x190 [ 101.064648] ? usleeprangestate+0x90/0x90 [ 101.065091] cmdexec+0x437/0xb20 [mlx5core] [ 101.065569] mlx5cmddo+0x1e/0x40 [mlx5core] [ 101.066051] mlx5cmdexec+0x18/0x30 [mlx5core] [ 101.066552] mlx5cryptocreatedekkey+0xea/0x120 [mlx5core] [ 101.067163] ? bondingsysfsstoreoption+0x4d/0x80 [bonding] [ 101.067738] ? kmalloctrace+0x4d/0x350 [ 101.068156] mlx5ipseccreatesactx+0x33/0x100 [mlx5core] [ 101.068747] mlx5exfrmaddstate+0x47b/0xaa0 [mlx5core] [ 101.069312] bondchangeactiveslave+0x392/0x900 [bonding] [ 101.069868] bondoptionactiveslaveset+0x1c2/0x240 [bonding] [ 101.070454] _bondoptset+0xa6/0x430 [bonding] [ 101.070935] _bondoptsetnotify+0x2f/0x90 [bonding] [ 101.071453] bondopttrysetrtnl+0x72/0xb0 [bonding] [ 101.071965] bondingsysfsstoreoption+0x4d/0x80 [bonding] [ 101.072567] kernfsfopwriteiter+0x10c/0x1a0 [ 101.073033] vfswrite+0x2d8/0x400 [ 101.073416] ? allocfd+0x48/0x180 [ 101.073798] ksyswrite+0x5f/0xe0 [ 101.074175] dosyscall64+0x52/0x110 [ 101.074576] entrySYSCALL64after_hwframe+0x4b/0x53
As bondipsecaddsaall and bondipsecdelsaall are only called from bondchangeactiveslave, which requires holding the RTNL lock. And bondipsecaddsa and bondipsecdelsa are xfrm state xdodevstateadd and xdodevstatedelete APIs, which are in user context. So ipseclock doesn't have to be spin lock, change it to mutex, and thus the above issue can be resolved.