In the Linux kernel, the following vulnerability has been resolved:
mm/vmalloc: combine all TLB flush operations of KASAN shadow virtual address into one operation
When compiling kernel source 'make -j $(nproc)' with the up-and-running KASAN-enabled kernel on a 256-core machine, the following soft lockup is shown:
watchdog: BUG: soft lockup - CPU#28 stuck for 22s! [kworker/28:1:1760] CPU: 28 PID: 1760 Comm: kworker/28:1 Kdump: loaded Not tainted 6.10.0-rc5 #95 Workqueue: events drainvmapareawork RIP: 0010:smpcallfunctionmanycond+0x1d8/0xbb0 Code: 38 c8 7c 08 84 c9 0f 85 49 08 00 00 8b 45 08 a8 01 74 2e 48 89 f1 49 89 f7 48 c1 e9 03 41 83 e7 07 4c 01 e9 41 83 c7 03 f3 90 <0f> b6 01 41 38 c7 7c 08 84 c0 0f 85 d4 06 00 00 8b 45 08 a8 01 75 RSP: 0018:ffffc9000cb3fb60 EFLAGS: 00000202 RAX: 0000000000000011 RBX: ffff8883bc4469c0 RCX: ffffed10776e9949 RDX: 0000000000000002 RSI: ffff8883bb74ca48 RDI: ffffffff8434dc50 RBP: ffff8883bb74ca40 R08: ffff888103585dc0 R09: ffff8884533a1800 R10: 0000000000000004 R11: ffffffffffffffff R12: ffffed1077888d39 R13: dffffc0000000000 R14: ffffed1077888d38 R15: 0000000000000003 FS: 0000000000000000(0000) GS:ffff8883bc400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005577b5c8d158 CR3: 0000000004850000 CR4: 0000000000350ef0 Call Trace: <IRQ> ? watchdogtimerfn+0x2cd/0x390 ? pfxwatchdogtimerfn+0x10/0x10 ? _hrtimerrunqueues+0x300/0x6d0 ? schedclockcpu+0x69/0x4e0 ? _pfxhrtimerrunqueues+0x10/0x10 ? srsoreturnthunk+0x5/0x5f ? ktimegetupdateoffsetsnow+0x7f/0x2a0 ? srsoreturnthunk+0x5/0x5f ? srsoreturnthunk+0x5/0x5f ? hrtimerinterrupt+0x2ca/0x760 ? _sysvecapictimerinterrupt+0x8c/0x2b0 ? sysvecapictimerinterrupt+0x6a/0x90 </IRQ> <TASK> ? asmsysvecapictimerinterrupt+0x16/0x20 ? smpcallfunctionmanycond+0x1d8/0xbb0 ? _pfxdokernelrangeflush+0x10/0x10 oneachcpucondmask+0x20/0x40 flushtlbkernelrange+0x19b/0x250 ? srsoreturnthunk+0x5/0x5f ? kasanreleasevmalloc+0xa7/0xc0 purgevmapnode+0x357/0x820 ? _pfxpurgevmapnode+0x10/0x10 _purgevmaparealazy+0x5b8/0xa10 drainvmapareawork+0x21/0x30 processonework+0x661/0x10b0 workerthread+0x844/0x10e0 ? srsoreturnthunk+0x5/0x5f ? _kthreadparkme+0x82/0x140 ? _pfxworkerthread+0x10/0x10 kthread+0x2a5/0x370 ? _pfxkthread+0x10/0x10 retfromfork+0x30/0x70 ? _pfxkthread+0x10/0x10 retfromfork_asm+0x1a/0x30 </TASK>
Debugging Analysis:
The following ftrace log shows that the lockup CPU spends too much time iterating vmapnodes and flushing TLB when purging vmarea structures. (Some info is trimmed).
kworker: funcgraphentry: | drainvmapareawork() { kworker: funcgraphentry: | mutexlock() { kworker: funcgraphentry: 1.092 us | _condresched(); kworker: funcgraphexit: 3.306 us | } ... ... kworker: funcgraphentry: | flushtlbkernelrange() { ... ... kworker: funcgraphexit: # 7533.649 us | } ... ... kworker: funcgraphentry: 2.344 us | mutexunlock(); kworker: funcgraphexit: $ 23871554 us | }
The drainvmaparea_work() spends over 23 seconds.
There are 2805 flushtlbkernel_range() calls in the ftrace log.
Extending the soft lockup time can work around the issue (For example, # echo ---truncated---