In the Linux kernel, the following vulnerability has been resolved:
memcg: fix soft lockup in the OOM process
A soft lockup issue was found in the product with about 56,000 tasks were in the OOM cgroup, it was traversing them when the soft lockup was triggered.
watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [VM Thread:1503066] CPU: 2 PID: 1503066 Comm: VM Thread Kdump: loaded Tainted: G Hardware name: Huawei Cloud OpenStack Nova, BIOS RIP: 0010:consoleunlock+0x343/0x540 RSP: 0000:ffffb751447db9a0 EFLAGS: 00000247 ORIGRAX: ffffffffffffff13 RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000ffffffff RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000247 RBP: ffffffffafc71f90 R08: 0000000000000000 R09: 0000000000000040 R10: 0000000000000080 R11: 0000000000000000 R12: ffffffffafc74bd0 R13: ffffffffaf60a220 R14: 0000000000000247 R15: 0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f2fe6ad91f0 CR3: 00000004b2076003 CR4: 0000000000360ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: vprintkemit+0x193/0x280 printk+0x52/0x6e dumptask+0x114/0x130 memcgroupscantasks+0x76/0x100 dumpheader+0x1fe/0x210 oomkillprocess+0xd1/0x100 outofmemory+0x125/0x570 memcgroupoutofmemory+0xb5/0xd0 trycharge+0x720/0x770 memcgrouptrycharge+0x86/0x180 memcgrouptrychargedelay+0x1c/0x40 doanonymouspage+0xb5/0x390 handlemmfault+0xc4/0x1f0
This is because thousands of processes are in the OOM cgroup, it takes a long time to traverse all of them. As a result, this lead to soft lockup in the OOM process.
To fix this issue, call 'condresched' in the 'memcgroupscantasks' function per 1000 iterations. For global OOM, call 'touchsoftlockupwatchdog' per 1000 iterations to avoid this issue.