The Linux Kernel, the operating system core itself.
Security Fix(es):
In the Linux kernel, the following vulnerability has been resolved:
i40e: Fix kernel crash during module removal
The driver incorrectly frees client instance and subsequent i40e module removal leads to kernel crash.
Reproducer: 1. Do ethtool offline test followed immediately by another one host# ethtool -t eth0 offline; ethtool -t eth0 offline 2. Remove recursively irdma module that also removes i40e module host# modprobe -r irdma
Result: [ 8675.035651] i40e 0000:3d:00.0 eno1: offline testing starting [ 8675.193774] i40e 0000:3d:00.0 eno1: testing finished [ 8675.201316] i40e 0000:3d:00.0 eno1: offline testing starting [ 8675.358921] i40e 0000:3d:00.0 eno1: testing finished [ 8675.496921] i40e 0000:3d:00.0: IRDMA hardware initialization FAILED initstate=2 status=-110 [ 8686.188955] i40e 0000:3d:00.1: i40eptpstop: removed PHC on eno2 [ 8686.943890] i40e 0000:3d:00.1: Deleted LAN device PF1 bus=0x3d dev=0x00 func=0x01 [ 8686.952669] i40e 0000:3d:00.0: i40eptpstop: removed PHC on eno1 [ 8687.761787] BUG: kernel NULL pointer dereference, address: 0000000000000030 [ 8687.768755] #PF: supervisor read access in kernel mode [ 8687.773895] #PF: errorcode(0x0000) - not-present page [ 8687.779034] PGD 0 P4D 0 [ 8687.781575] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 8687.785935] CPU: 51 PID: 172891 Comm: rmmod Kdump: loaded Tainted: G W I 5.19.0+ #2 [ 8687.794800] Hardware name: Intel Corporation S2600WFD/S2600WFD, BIOS SE5C620.86B.0X.02.0001.051420190324 05/14/2019 [ 8687.805222] RIP: 0010:i40elandeldevice+0x13/0xb0 [i40e] [ 8687.810719] Code: d4 84 c0 0f 84 b8 25 01 00 e9 9c 25 01 00 41 bc f4 ff ff ff eb 91 90 0f 1f 44 00 00 41 54 55 53 48 8b 87 58 08 00 00 48 89 fb <48> 8b 68 30 48 89 ef e8 21 8a 0f d5 48 89 ef e8 a9 78 0f d5 48 8b [ 8687.829462] RSP: 0018:ffffa604072efce0 EFLAGS: 00010202 [ 8687.834689] RAX: 0000000000000000 RBX: ffff8f43833b2000 RCX: 0000000000000000 [ 8687.841821] RDX: 0000000000000000 RSI: ffff8f4b0545b298 RDI: ffff8f43833b2000 [ 8687.848955] RBP: ffff8f43833b2000 R08: 0000000000000001 R09: 0000000000000000 [ 8687.856086] R10: 0000000000000000 R11: 000ffffffffff000 R12: ffff8f43833b2ef0 [ 8687.863218] R13: ffff8f43833b2ef0 R14: ffff915103966000 R15: ffff8f43833b2008 [ 8687.870342] FS: 00007f79501c3740(0000) GS:ffff8f4adffc0000(0000) knlGS:0000000000000000 [ 8687.878427] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8687.884174] CR2: 0000000000000030 CR3: 000000014276e004 CR4: 00000000007706e0 [ 8687.891306] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 8687.898441] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 8687.905572] PKRU: 55555554 [ 8687.908286] Call Trace: [ 8687.910737] <TASK> [ 8687.912843] i40eremove+0x2c0/0x330 [i40e] [ 8687.917040] pcideviceremove+0x33/0xa0 [ 8687.920962] devicereleasedriverinternal+0x1aa/0x230 [ 8687.926188] driverdetach+0x44/0x90 [ 8687.929770] busremovedriver+0x55/0xe0 [ 8687.933693] pciunregisterdriver+0x2a/0xb0 [ 8687.937967] i40eexitmodule+0xc/0xf48 [i40e]
Two offline tests cause IRDMA driver failure (ETIMEDOUT) and this failure is indicated back to i40eclientsubtask() that calls i40eclientdelinstance() to free client instance referenced by pf->cinst and sets this pointer to NULL. During the module removal i40eremove() calls i40elandeldevice() that dereferences pf->cinst that is NULL -> crash. Do not remove client instance when client open callbacks fails and just clear _I40ECLIENTINSTANCEOPENED bit. The driver also needs to take care about this situation (when netdev is up and client is NOT opened) in i40enotifyclientofnetdevclose() and calls client close callback only when _I40ECLIENTINSTANCEOPENED is set.(CVE-2022-48688)
In the Linux kernel, the following vulnerability has been resolved:
tracing/osnoise: Do not unregister events twice
Nicolas reported that using:
# trace-cmd record -e all -M 10 -p osnoise --poll
Resulted in the following kernel warning:
------------[ cut here ]------------ WARNING: CPU: 0 PID: 1217 at kernel/tracepoint.c:404 tracepointprobeunregister+0x280/0x370 [...] CPU: 0 PID: 1217 Comm: trace-cmd Not tainted 5.17.0-rc6-next-20220307-nico+ #19 RIP: 0010:tracepointprobeunregister+0x280/0x370 [...] CR2: 00007ff919b29497 CR3: 0000000109da4005 CR4: 0000000000170ef0 Call Trace: <TASK> osnoiseworkloadstop+0x36/0x90 tracingsettracer+0x108/0x260 tracingsettracewrite+0x94/0xd0 ? _checkobjectsize.part.0+0x10a/0x150 ? selinuxfilepermission+0x104/0x150 vfswrite+0xb5/0x290 ksyswrite+0x5f/0xe0 dosyscall64+0x3b/0x90 entrySYSCALL64afterhwframe+0x44/0xae RIP: 0033:0x7ff919a18127 [...] ---[ end trace 0000000000000000 ]---
The warning complains about an attempt to unregister an unregistered tracepoint.
This happens on trace-cmd because it first stops tracing, and then switches the tracer to nop. Which is equivalent to:
# cd /sys/kernel/tracing/ # echo osnoise > currenttracer # echo 0 > tracingon # echo nop > current_tracer
The osnoise tracer stops the workload when no trace instance is actually collecting data. This can be caused both by disabling tracing or disabling the tracer itself.
To avoid unregistering events twice, use the existing traceosnoisecallback_enabled variable to check if the events (and the workload) are actually active before trying to deactivate them.(CVE-2022-48848)
In the Linux kernel, the following vulnerability has been resolved:
USB: gadgetfs: Fix race between mounting and unmounting
The syzbot fuzzer and Gerald Lee have identified a use-after-free bug in the gadgetfs driver, involving processes concurrently mounting and unmounting the gadgetfs filesystem. In particular, gadgetfsfillsuper() can race with gadgetfskillsb(), causing the latter to deallocate the_device while the former is using it. The output from KASAN says, in part:
BUG: KASAN: use-after-free in instrumentatomicreadwrite include/linux/instrumented.h:102 [inline] BUG: KASAN: use-after-free in atomicfetchsubrelease include/linux/atomic/atomic-instrumented.h:176 [inline] BUG: KASAN: use-after-free in _refcountsubandtest include/linux/refcount.h:272 [inline] BUG: KASAN: use-after-free in _refcountdecandtest include/linux/refcount.h:315 [inline] BUG: KASAN: use-after-free in refcountdecandtest include/linux/refcount.h:333 [inline] BUG: KASAN: use-after-free in putdev drivers/usb/gadget/legacy/inode.c:159 [inline] BUG: KASAN: use-after-free in gadgetfskillsb+0x33/0x100 drivers/usb/gadget/legacy/inode.c:2086 Write of size 4 at addr ffff8880276d7840 by task syz-executor126/18689
CPU: 0 PID: 18689 Comm: syz-executor126 Not tainted 6.1.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022 Call Trace: <TASK> ... atomicfetchsubrelease include/linux/atomic/atomic-instrumented.h:176 [inline] _refcountsubandtest include/linux/refcount.h:272 [inline] _refcountdecandtest include/linux/refcount.h:315 [inline] refcountdecandtest include/linux/refcount.h:333 [inline] putdev drivers/usb/gadget/legacy/inode.c:159 [inline] gadgetfskillsb+0x33/0x100 drivers/usb/gadget/legacy/inode.c:2086 deactivatelockedsuper+0xa7/0xf0 fs/super.c:332 vfsgetsuper fs/super.c:1190 [inline] gettreesingle+0xd0/0x160 fs/super.c:1207 vfsgettree+0x88/0x270 fs/super.c:1531 vfsfsconfig_locked fs/fsopen.c:232 [inline]
The simplest solution is to ensure that gadgetfsfillsuper() and gadgetfskillsb() are serialized by making them both acquire a new mutex.(CVE-2022-48869)
In the Linux kernel, the following vulnerability has been resolved:
efi: fix NULL-deref in init error path
In cases where runtime services are not supported or have been disabled, the runtime services workqueue will never have been allocated.
Do not try to destroy the workqueue unconditionally in the unlikely event that EFI initialisation fails to avoid dereferencing a NULL pointer.(CVE-2022-48879)
In the Linux kernel, the following vulnerability has been resolved:
drm/i915/gt: Cleanup partial engine discovery failures
If we abort driver initialisation in the middle of gt/engine discovery, some engines will be fully setup and some not. Those incompletely setup engines only have 'engine->release == NULL' and so will leak any of the common objects allocated.
v2: - Drop the destroypinnedcontext() helper for now. It's not really worth it with just a single callsite at the moment. (Janusz)(CVE-2022-48893)
In the Linux kernel, the following vulnerability has been resolved:
media: vivid: fix compose size exceed boundary
syzkaller found a bug:
BUG: unable to handle page fault for address: ffffc9000a3b1000 #PF: supervisor write access in kernel mode #PF: errorcode(0x0002) - not-present page PGD 100000067 P4D 100000067 PUD 10015f067 PMD 1121ca067 PTE 0 Oops: 0002 [#1] PREEMPT SMP CPU: 0 PID: 23489 Comm: vivid-000-vid-c Not tainted 6.1.0-rc1+ #512 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 RIP: 0010:memcpyerms+0x6/0x10 [...] Call Trace: <TASK> ? tpgfillplanebuffer+0x856/0x15b0 vividfillbuff+0x8ac/0x1110 vividthreadvidcaptick+0x361/0xc90 vividthreadvidcap+0x21a/0x3a0 kthread+0x143/0x180 retfrom_fork+0x1f/0x30 </TASK>
This is because we forget to check boundary after adjust compose->height int V4L2SELTGTCROP case. Add v4l2rectmapinside() to fix this problem for this case.(CVE-2022-48945)
A flaw possibility of memory leak in the Linux kernel cpuentryarea mapping of X86 CPU data to memory was found in the way user can guess location of exception stack(s) or other important data. A local user could use this flaw to get access to some important data with expected location in memory.(CVE-2023-0597)
From the upstream fix below: The watchdogtimer can schedule txtimeouttask and watchdogwork can also arm watchdogtimer [..] Although deltimersync() and cancelworksync() are called in cyttsp4remove(), the timer and workqueue could still be rearmed. As a result, the possible use after free bugs could happen.
Upstream commit: https://github.com/torvalds/linux/commit/dbe836576f12743a7d2d170ad4ad4fd324c4d47a(CVE-2023-4134)
In the Linux kernel, the following vulnerability has been resolved:
efivarfs: force RO when remounting if SetVariable is not supported
If SetVariable at runtime is not supported by the firmware we never assign a callback for that function. At the same time mount the efivarfs as RO so no one can call that. However, we never check the permission flags when someone remounts the filesystem as RW. As a result this leads to a crash looking like this:
$ mount -o remount,rw /sys/firmware/efi/efivars $ efi-updatevar -f PK.auth PK
[ 303.279166] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 [ 303.280482] Mem abort info: [ 303.280854] ESR = 0x0000000086000004 [ 303.281338] EC = 0x21: IABT (current EL), IL = 32 bits [ 303.282016] SET = 0, FnV = 0 [ 303.282414] EA = 0, S1PTW = 0 [ 303.282821] FSC = 0x04: level 0 translation fault [ 303.283771] user pgtable: 4k pages, 48-bit VAs, pgdp=000000004258c000 [ 303.284913] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000 [ 303.286076] Internal error: Oops: 0000000086000004 [#1] PREEMPT SMP [ 303.286936] Modules linked in: qrtr tpmtis tpmtiscore crct10difce armsmccctrng rngcore drm fuse iptables xtables ipv6 [ 303.288586] CPU: 1 PID: 755 Comm: efi-updatevar Not tainted 6.3.0-rc1-00108-gc7d0c4695c68 #1 [ 303.289748] Hardware name: Unknown Unknown Product/Unknown Product, BIOS 2023.04-00627-g88336918701d 04/01/2023 [ 303.291150] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 303.292123] pc : 0x0 [ 303.292443] lr : efivarsetvariablelocked+0x74/0xec [ 303.293156] sp : ffff800008673c10 [ 303.293619] x29: ffff800008673c10 x28: ffff0000037e8000 x27: 0000000000000000 [ 303.294592] x26: 0000000000000800 x25: ffff000002467400 x24: 0000000000000027 [ 303.295572] x23: ffffd49ea9832000 x22: ffff0000020c9800 x21: ffff000002467000 [ 303.296566] x20: 0000000000000001 x19: 00000000000007fc x18: 0000000000000000 [ 303.297531] x17: 0000000000000000 x16: 0000000000000000 x15: 0000aaaac807ab54 [ 303.298495] x14: ed37489f673633c0 x13: 71c45c606de13f80 x12: 47464259e219acf4 [ 303.299453] x11: ffff000002af7b01 x10: 0000000000000003 x9 : 0000000000000002 [ 303.300431] x8 : 0000000000000010 x7 : ffffd49ea8973230 x6 : 0000000000a85201 [ 303.301412] x5 : 0000000000000000 x4 : ffff0000020c9800 x3 : 00000000000007fc [ 303.302370] x2 : 0000000000000027 x1 : ffff000002467400 x0 : ffff000002467000 [ 303.303341] Call trace: [ 303.303679] 0x0 [ 303.303938] efivarentrysetgetsize+0x98/0x16c [ 303.304585] efivarfsfilewrite+0xd0/0x1a4 [ 303.305148] vfswrite+0xc4/0x2e4 [ 303.305601] ksyswrite+0x70/0x104 [ 303.306073] _arm64syswrite+0x1c/0x28 [ 303.306622] invokesyscall+0x48/0x114 [ 303.307156] el0svccommon.constprop.0+0x44/0xec [ 303.307803] doel0svc+0x38/0x98 [ 303.308268] el0svc+0x2c/0x84 [ 303.308702] el0t64synchandler+0xf4/0x120 [ 303.309293] el0t64sync+0x190/0x194 [ 303.309794] Code: ???????? ???????? ???????? ???????? (????????) [ 303.310612] ---[ end trace 0000000000000000 ]---
Fix this by adding a .reconfigure() function to the fs operations which we can use to check the requested flags and deny anything that's not RO if the firmware doesn't implement SetVariable at runtime.(CVE-2023-52463)
In the Linux kernel, the following vulnerability has been resolved:
sched/psi: Fix use-after-free in epremovewait_queue()
If a non-root cgroup gets removed when there is a thread that registered trigger and is polling on a pressure file within the cgroup, the polling waitqueue gets freed in the following path:
dormdir cgrouprmdir kernfsdrainopenfiles cgroupfilerelease cgrouppressurerelease psitrigger_destroy
However, the polling thread still has a reference to the pressure file and will access the freed waitqueue when the file is closed or upon exit:
fput epeventpollrelease epfree epremovewaitqueue removewaitqueue
This results in use-after-free as pasted below.
The fundamental problem here is that cgroupfilerelease() (and consequently waitqueue's lifetime) is not tied to the file's real lifetime. Using wakeuppollfree() here might be less than ideal, but it is in line with the comment at commit 42288cb44c4b ("wait: add wakeuppollfree()") since the waitqueue's lifetime is not tied to file's one and can be considered as another special case. While this would be fixable by somehow making cgroupfilerelease() be tied to the fput(), it would require sizable refactoring at cgroups or higher layer which might be more justifiable if we identify more cases like this.
BUG: KASAN: use-after-free in rawspinlockirqsave+0x60/0xc0 Write of size 4 at addr ffff88810e625328 by task a.out/4404
CPU: 19 PID: 4404 Comm: a.out Not tainted 6.2.0-rc6 #38
Hardware name: Amazon EC2 c5a.8xlarge/, BIOS 1.0 10/16/2017
Call Trace:
<TASK>
dump_stack_lvl+0x73/0xa0
print_report+0x16c/0x4e0
kasan_report+0xc3/0xf0
kasan_check_range+0x2d2/0x310
_raw_spin_lock_irqsave+0x60/0xc0
remove_wait_queue+0x1a/0xa0
ep_free+0x12c/0x170
ep_eventpoll_release+0x26/0x30
__fput+0x202/0x400
task_work_run+0x11d/0x170
do_exit+0x495/0x1130
do_group_exit+0x100/0x100
get_signal+0xd67/0xde0
arch_do_signal_or_restart+0x2a/0x2b0
exit_to_user_mode_prepare+0x94/0x100
syscall_exit_to_user_mode+0x20/0x40
do_syscall_64+0x52/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
</TASK>
Allocated by task 4404:
kasan_set_track+0x3d/0x60
__kasan_kmalloc+0x85/0x90
psi_trigger_create+0x113/0x3e0
pressure_write+0x146/0x2e0
cgroup_file_write+0x11c/0x250
kernfs_fop_write_iter+0x186/0x220
vfs_write+0x3d8/0x5c0
ksys_write+0x90/0x110
do_syscall_64+0x43/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Freed by task 4407:
kasan_set_track+0x3d/0x60
kasan_save_free_info+0x27/0x40
____kasan_slab_free+0x11d/0x170
slab_free_freelist_hook+0x87/0x150
__kmem_cache_free+0xcb/0x180
psi_trigger_destroy+0x2e8/0x310
cgroup_file_release+0x4f/0xb0
kernfs_drain_open_files+0x165/0x1f0
kernfs_drain+0x162/0x1a0
__kernfs_remove+0x1fb/0x310
kernfs_remove_by_name_ns+0x95/0xe0
cgroup_addrm_files+0x67f/0x700
cgroup_destroy_locked+0x283/0x3c0
cgroup_rmdir+0x29/0x100
kernfs_iop_rmdir+0xd1/0x140
vfs_rmdir+0xfe/0x240
do_rmdir+0x13d/0x280
__x64_sys_rmdir+0x2c/0x30
do_syscall_64+0x43/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd(CVE-2023-52707)
In the Linux kernel, the following vulnerability has been resolved:
cifs: Fix use-after-free in rdata->readintopages()
When the network status is unstable, use-after-free may occur when read data from the server.
BUG: KASAN: use-after-free in readpagesfillpages+0x14c/0x7e0
Call Trace: <TASK> dumpstacklvl+0x38/0x4c printreport+0x16f/0x4a6 kasanreport+0xb7/0x130 readpagesfillpages+0x14c/0x7e0 cifsreadvreceive+0x46d/0xa40 cifsdemultiplexthread+0x121c/0x1490 kthread+0x16b/0x1a0 retfromfork+0x2c/0x50 </TASK>
Allocated by task 2535: kasansavestack+0x22/0x50 kasansettrack+0x25/0x30 _kasankmalloc+0x82/0x90 cifsreaddatadirectalloc+0x2c/0x110 cifsreaddataalloc+0x2d/0x60 cifsreadahead+0x393/0xfe0 readpages+0x12f/0x470 pagecacheraunbounded+0x1b1/0x240 filemapgetpages+0x1c8/0x9a0 filemapread+0x1c0/0x540 cifsstrictreadv+0x21b/0x240 vfsread+0x395/0x4b0 ksysread+0xb8/0x150 dosyscall64+0x3f/0x90 entrySYSCALL64after_hwframe+0x72/0xdc
Freed by task 79: kasansavestack+0x22/0x50 kasansettrack+0x25/0x30 kasansavefreeinfo+0x2e/0x50 _kasanslabfree+0x10e/0x1a0 _kmemcachefree+0x7a/0x1a0 cifsreaddatarelease+0x49/0x60 processonework+0x46c/0x760 workerthread+0x2a4/0x6f0 kthread+0x16b/0x1a0 retfromfork+0x2c/0x50
Last potentially related work creation: kasansavestack+0x22/0x50 _kasanrecordauxstack+0x95/0xb0 insertwork+0x2b/0x130 _queuework+0x1fe/0x660 queueworkon+0x4b/0x60 smb2readvcallback+0x396/0x800 cifsabortconnection+0x474/0x6a0 cifsreconnect+0x5cb/0xa50 cifsreadvfromsocket.cold+0x22/0x6c cifsreadpagefromsocket+0xc1/0x100 readpagesfillpages.cold+0x2f/0x46 cifsreadvreceive+0x46d/0xa40 cifsdemultiplexthread+0x121c/0x1490 kthread+0x16b/0x1a0 retfrom_fork+0x2c/0x50
The following function calls will cause UAF of the rdata pointer.
readpagesfillpages cifsreadpagefromsocket cifsreadvfromsocket cifsreconnect _cifsreconnect cifsabortconnection mid->callback() --> smb2readvcallback queuework(&rdata->work) # if the worker completes first, # the rdata is freed cifsreadvcomplete krefput cifsreaddatarelease kfree(rdata) return rdata->... # UAF in readpagesfillpages()
Similarly, this problem also occurs in the uncachefillpages().
Fix this by adjusts the order of condition judgment in the return statement.(CVE-2023-52741)
In the Linux kernel, the following vulnerability has been resolved:
tty: ngsm: require CAPNETADMIN to attach NGSM0710 ldisc
Any unprivileged user can attach NGSM0710 ldisc, but it requires CAPNET_ADMIN to create a GSM network anyway.
Require initial namespace CAPNETADMIN to do that.(CVE-2023-52880)
In the Linux kernel, the following vulnerability has been resolved:
keys: Fix overwrite of key expiration on instantiation
The expiry time of a key is unconditionally overwritten during instantiation, defaulting to turn it permanent. This causes a problem for DNS resolution as the expiration set by user-space is overwritten to TIME64MAX, disabling further DNS updates. Fix this by restoring the condition that keyset_expiry is only called when the pre-parser sets a specific expiry.(CVE-2024-36031)
In the Linux kernel, the following vulnerability has been resolved:
ext4: fix uninitialized ratelimitstate->lock access in _ext4fillsuper()
In the following concurrency we will access the uninitialized rs->lock:
ext4fillsuper ext4registersysfs // sysfs registered msgratelimitintervalms // Other processes modify rs->interval to // non-zero via msgratelimitintervalms ext4orphancleanup ext4msg(sb, KERNINFO, "Errors on filesystem, " ext4msg _ratelimit(&(EXT4SB(sb)->smsgratelimitstate) if (!rs->interval) // do nothing if interval is 0 return 1; rawspintrylockirqsave(&rs->lock, flags) rawspintrylock(lock) rawspintrylock _rawspintrylock spinacquire(&lock->depmap, 0, 1, RETIP) lockacquire _lockacquire registerlockclass assignlockkey dumpstack(); ratelimitstateinit(&sbi->smsgratelimitstate, 5 * HZ, 10); rawspinlock_init(&rs->lock); // init rs->lock here
and get the following dump_stack:
========================================================= INFO: trying to register non-static key. The code is fine but needs lockdep annotation, or maybe you didn't initialize this object before use? turning off the locking correctness validator. CPU: 12 PID: 753 Comm: mount Tainted: G E 6.7.0-rc6-next-20231222 #504 [...] Call Trace: dumpstacklvl+0xc5/0x170 dumpstack+0x18/0x30 registerlockclass+0x740/0x7c0 lockacquire+0x69/0x13a0 lockacquire+0x120/0x450 _rawspintrylock+0x98/0xd0 _ratelimit+0xf6/0x220 _ext4msg+0x7f/0x160 [ext4] ext4orphancleanup+0x665/0x740 [ext4] _ext4fillsuper+0x21ea/0x2b10 [ext4] ext4fillsuper+0x14d/0x360 [ext4]
Normally interval is 0 until smsgratelimitstate is initialized, so _ratelimit() does nothing. But registering sysfs precedes initializing rs->lock, so it is possible to change rs->interval to a non-zero value via the msgratelimitintervalms interface of sysfs while rs->lock is uninitialized, and then a call to ext4_msg triggers the problem by accessing an uninitialized rs->lock. Therefore register sysfs after all initializations are complete to avoid such problems.(CVE-2024-40998)
In the Linux kernel, the following vulnerability has been resolved:
bpf: Take return from setmemoryrox() into account with bpfjitbinarylockro()
setmemoryrox() can fail, leaving memory unprotected.
Check return and bail out when bpfjitbinarylockro() returns an error.(CVE-2024-42067)
In the Linux kernel, the following vulnerability has been resolved:
net: nexthop: Initialize all fields in dumped nexthops
struct nexthopgrp contains two reserved fields that are not initialized by nlaputnhgroup(), and carry garbage. This can be observed e.g. with strace (edited for clarity):
# ip nexthop add id 1 dev lo
# ip nexthop add id 101 group 1
# strace -e recvmsg ip nexthop get id 101
...
recvmsg(... [{nla_len=12, nla_type=NHA_GROUP},
[{id=1, weight=0, resvd1=0x69, resvd2=0x67}]] ...) = 52
The fields are reserved and therefore not currently used. But as they are, they leak kernel memory, and the fact they are not just zero complicates repurposing of the fields for new ends. Initialize the full structure.(CVE-2024-42283)
In the Linux kernel, the following vulnerability has been resolved:
irqchip/imx-irqsteer: Handle runtime power management correctly
The power domain is automatically activated from clk_prepare(). However, on certain platforms like i.MX8QM and i.MX8QXP, the power-on handling invokes sleeping functions, which triggers the 'scheduling while atomic' bug in the context switch path during device probing:
BUG: scheduling while atomic: kworker/u13:1/48/0x00000002 Call trace: _schedulebug+0x54/0x6c _schedule+0x7f0/0xa94 schedule+0x5c/0xc4 schedulepreemptdisabled+0x24/0x40 _mutexlock.constprop.0+0x2c0/0x540 _mutexlockslowpath+0x14/0x20 mutexlock+0x48/0x54 clkpreparelock+0x44/0xa0 clkprepare+0x20/0x44 imxirqsteerresume+0x28/0xe0 pmgenericruntimeresume+0x2c/0x44 _genpdruntimeresume+0x30/0x80 genpdruntimeresume+0xc8/0x2c0 _rpmcallback+0x48/0x1d8 rpmcallback+0x6c/0x78 rpmresume+0x490/0x6b4 _pmruntimeresume+0x50/0x94 irqchippmget+0x2c/0xa0 _irqdosethandler+0x178/0x24c irqsetchainedhandleranddata+0x60/0xa4 mxcgpio_probe+0x160/0x4b0
Cure this by implementing the irqbuslock/sync_unlock() interrupt chip callbacks and handle power management in them as they are invoked from non-atomic context.
tglx: Rewrote change log, added Fixes tag
In the Linux kernel, the following vulnerability has been resolved:
drm/gma500: fix null pointer dereference in psbintellvdsgetmodes
In psbintellvdsgetmodes(), the return value of drmmodeduplicate() is assigned to mode, which will lead to a possible NULL pointer dereference on failure of drmmodeduplicate(). Add a check to avoid npd.(CVE-2024-42309)
In the Linux kernel, the following vulnerability has been resolved:
media: venus: fix use after free in vdec_close
There appears to be a possible use after free with vdec_close(). The firmware will add buffer release work to the work queue through HFI callbacks as a normal part of decoding. Randomly closing the decoder device from userspace during normal decoding can incur a read after free for inst.
Fix it by cancelling the work in vdec_close.(CVE-2024-42313)
In the Linux kernel, the following vulnerability has been resolved:
ipvs: properly dereference pe in ipvsadd_service
Use pe directly to resolve sparse warning:
net/netfilter/ipvs/ipvsctl.c:1471:27: warning: dereference of noderef expression(CVE-2024-42322)
In the Linux kernel, the following vulnerability has been resolved:
PCI: keystone: Fix NULL pointer dereference in case of DT error in kspciesetuprcapp_regs()
If IORESOURCEMEM is not provided in Device Tree due to any error, resourcelistfirsttype() will return NULL and pciparserequestofpci_ranges() will just emit a warning.
This will cause a NULL pointer dereference. Fix this bug by adding NULL return check.
Found by Linux Verification Center (linuxtesting.org) with SVACE.(CVE-2024-43823)
In the Linux kernel, the following vulnerability has been resolved:
leds: trigger: Unregister sysfs attributes before calling deactivate()
Triggers which have trigger specific sysfs attributes typically store related data in trigger-data allocated by the activate() callback and freed by the deactivate() callback.
Calling deviceremovegroups() after calling deactivate() leaves a window where the sysfs attributes show/store functions could be called after deactivation and then operate on the just freed trigger-data.
Move the deviceremovegroups() call to before deactivate() to close this race window.
This also makes the deactivation path properly do things in reverse order of the activation path which calls the activate() callback before calling deviceaddgroups().(CVE-2024-43830)
In the Linux kernel, the following vulnerability has been resolved:
md: fix deadlock between mddev_suspend and flush bio
Deadlock occurs when mddev is being suspended while some flush bio is in progress. It is a complex issue.
T1. the first flush is at the ending stage, it clears 'mddev->flushbio' and tries to submit data, but is blocked because mddev is suspended by T4. T2. the second flush sets 'mddev->flushbio', and attempts to queue mdsubmitflushdata(), which is already running (T1) and won't execute again if on the same CPU as T1. T3. the third flush inc activeio and tries to flush, but is blocked because 'mddev->flushbio' is not NULL (set by T2). T4. mddevsuspend() is called and waits for active_io dec to 0 which is inc by T3.
T1 T2 T3 T4 (flush 1) (flush 2) (third 3) (suspend) mdsubmitflushdata mddev->flushbio = NULL; . . mdflushrequest . mddev->flushbio = bio . queue submitflushes . . . . mdhandlerequest . . activeio + 1 . . mdflushrequest . . wait !mddev->flushbio . . . . mddevsuspend . . wait !activeio . . . submitflushes . queuework mdsubmitflushdata . //mdsubmitflushdata is already running (T1) . mdhandlerequest wait resume
The root issue is non-atomic inc/dec of activeio during flush process. activeio is dec before mdsubmitflushdata is queued, and inc soon after mdsubmitflushdata() run. mdflushrequest activeio + 1 submitflushes activeio - 1 mdsubmitflushdata mdhandlerequest activeio + 1 makerequest active_io - 1
If activeio is dec after mdhandlerequest() instead of within submitflushes(), makerequest() can be called directly intead of mdhandlerequest() in mdsubmitflushdata(), and active_io will only inc and dec once in the whole flush process. Deadlock will be fixed.
Additionally, the only difference between fixing the issue and before is that there is no return error handling of makerequest(). But after previous patch cleaned mdwritestart(), makerequst() only return error in raid5makerequest() by dm-raid, see commit 41425f96d7aa ("dm-raid456, md/raid456: fix a deadlock for dm-raid456 while io concurrent with reshape)". Since dm always splits data and flush operation into two separate io, io size of flush submitted by dm always is 0, makerequest() will not be called in mdsubmitflushdata(). To prevent future modifications from introducing issues, add WARNON to ensure makerequest() no error is returned in this context.(CVE-2024-43855)
In the Linux kernel, the following vulnerability has been resolved:
memcg: protect concurrent access to memcgroupidr
Commit 73f576c04b94 ("mm: memcontrol: fix cgroup creation failure after many small jobs") decoupled the memcg IDs from the CSS ID space to fix the cgroup creation failures. It introduced IDR to maintain the memcg ID space. The IDR depends on external synchronization mechanisms for modifications. For the memcgroupidr, the idralloc() and idrreplace() happen within css callback and thus are protected through cgroupmutex from concurrent modifications. However idrremove() for memcgroupidr was not protected against concurrency and can be run concurrently for different memcgs when they hit their refcnt to zero. Fix that.
We have been seeing listlru based kernel crashes at a low frequency in our fleet for a long time. These crashes were in different part of listlru code including listlruadd(), listlrudel() and reparenting code. Upon further inspection, it looked like for a given object (dentry and inode), the superblock's listlru didn't have listlruone for the memcg of that object. The initial suspicions were either the object is not allocated through kmemcachealloclru() or somehow memcglistlrualloc() failed to allocate listlruone() for a memcg but returned success. No evidence were found for these cases.
Looking more deeply, we started seeing situations where valid memcg's id is not present in memcgroupidr and in some cases multiple valid memcgs have same id and memcgroupidr is pointing to one of them. So, the most reasonable explanation is that these situations can happen due to race between multiple idrremove() calls or race between idralloc()/idrreplace() and idrremove(). These races are causing multiple memcgs to acquire the same ID and then offlining of one of them would cleanup listlrus on the system for all of them. Later access from other memcgs to the listlru cause crashes due to missing listlruone.(CVE-2024-43892)
In the Linux kernel, the following vulnerability has been resolved:
serial: core: check uartclk for zero to avoid divide by zero
Calling ioctl TIOCSSERIAL with an invalid baudbase can result in uartclk being zero, which will result in a divide by zero error in uartgetdivisor(). The check for uartclk being zero in uartset_info() needs to be done before other settings are made as subsequent calls to ioctl TIOCSSERIAL for the same port would be impacted if the uartclk check was done where uartclk gets set.
Oops: divide error: 0000 PREEMPT SMP KASAN PTI RIP: 0010:uartgetdivisor (drivers/tty/serial/serialcore.c:580) Call Trace: <TASK> serial8250getdivisor (drivers/tty/serial/8250/8250port.c:2576 drivers/tty/serial/8250/8250port.c:2589) serial8250dosettermios (drivers/tty/serial/8250/8250port.c:502 drivers/tty/serial/8250/8250port.c:2741) serial8250settermios (drivers/tty/serial/8250/8250port.c:2862) uartchangelinesettings (./include/linux/spinlock.h:376 ./include/linux/serialcore.h:608 drivers/tty/serial/serialcore.c:222) uartportstartup (drivers/tty/serial/serialcore.c:342) uartstartup (drivers/tty/serial/serialcore.c:368) uartsetinfo (drivers/tty/serial/serialcore.c:1034) uartsetinfouser (drivers/tty/serial/serialcore.c:1059) ttysetserial (drivers/tty/ttyio.c:2637) ttyioctl (drivers/tty/ttyio.c:2647 drivers/tty/ttyio.c:2791) _x64sysioctl (fs/ioctl.c:52 fs/ioctl.c:907 fs/ioctl.c:893 fs/ioctl.c:893) dosyscall64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1)) entrySYSCALL64afterhwframe (arch/x86/entry/entry64.S:130)
Rule: add(CVE-2024-43893)
In the Linux kernel, the following vulnerability has been resolved:
fou: remove warn in guegroreceive on unsupported protocol
Drop the WARNONONCE inn guegroreceive if the encapsulated type is not known or does not have a GRO handler.
Such a packet is easily constructed. Syzbot generates them and sets off this warning.
Remove the warning as it is expected and not actionable.
The warning was previously reduced from WARNON to WARNONONCE in commit 270136613bf7 ("fou: Do WARNONONCE in guegro_receive for bad proto callbacks").(CVE-2024-44940)
In the Linux kernel, the following vulnerability has been resolved:
atm: idt77252: prevent use after free in dequeue_rx()
We can't dereference "skb" after calling vcc->push() because the skb is released.(CVE-2024-44998)
In the Linux kernel, the following vulnerability has been resolved:
xhci: Fix Panther point NULL pointer deref at full-speed re-enumeration
re-enumerating full-speed devices after a failed address device command can trigger a NULL pointer dereference.
Full-speed devices may need to reconfigure the endpoint 0 Max Packet Size value during enumeration. Usb core calls usbep0reinit() in this case, which ends up calling xhciconfigureendpoint().
On Panther point xHC the xhciconfigureendpoint() function will additionally check and reserve bandwidth in software. Other hosts do this in hardware
If xHC address device command fails then a new xhcivirtdevice structure is allocated as part of re-enabling the slot, but the bandwidth table pointers are not set up properly here. This triggers the NULL pointer dereference the next time usbep0reinit() is called and xhciconfigureendpoint() tries to check and reserve bandwidth
[46710.713538] usb 3-1: new full-speed USB device number 5 using xhcihcd [46710.713699] usb 3-1: Device not responding to setup address. [46710.917684] usb 3-1: Device not responding to setup address. [46711.125536] usb 3-1: device not accepting address 5, error -71 [46711.125594] BUG: kernel NULL pointer dereference, address: 0000000000000008 [46711.125600] #PF: supervisor read access in kernel mode [46711.125603] #PF: errorcode(0x0000) - not-present page [46711.125606] PGD 0 P4D 0 [46711.125610] Oops: Oops: 0000 [#1] PREEMPT SMP PTI [46711.125615] CPU: 1 PID: 25760 Comm: kworker/1:2 Not tainted 6.10.32 #1 [46711.125620] Hardware name: Gigabyte Technology Co., Ltd. [46711.125623] Workqueue: usbhubwq hubevent [usbcore] [46711.125668] RIP: 0010:xhcireservebandwidth (drivers/usb/host/xhci.c
Fix this by making sure bandwidth table pointers are set up correctly after a failed address device command, and additionally by avoiding checking for bandwidth in cases like this where no actual endpoints are added or removed, i.e. only context for default control endpoint 0 is evaluated.(CVE-2024-45006)
In the Linux kernel, the following vulnerability has been resolved:
s390/dasd: fix error recovery leading to data corruption on ESE devices
Extent Space Efficient (ESE) or thin provisioned volumes need to be formatted on demand during usual IO processing.
The dasdeseneeds_format function checks for error codes that signal the non existence of a proper track format.
The check for incorrect length is to imprecise since other error cases leading to transport of insufficient data also have this flag set. This might lead to data corruption in certain error cases for example during a storage server warmstart.
Fix by removing the check for incorrect length and replacing by explicitly checking for invalid track format in transport mode.
Also remove the check for file protected since this is not a valid ESE handling case.(CVE-2024-45026)
In the Linux kernel, the following vulnerability has been resolved:
nfc: pn533: Add poll mod list filling check
In case of improtocols value is 1 and tmprotocols value is 0 this combination successfully passes the check 'if (!improtocols && !tmprotocols)' in the nfcstartpoll(). But then after pn533pollcreatemodlist() call in pn533startpoll() poll mod list will remain empty and dev->pollmodcount will remain 0 which lead to division by zero.
Normally no im protocol has value 1 in the mask, so this combination is not expected by driver. But these protocol values actually come from userspace via Netlink interface (NFCCMDSTARTPOLL operation). So a broken or malicious program may pass a message containing a "bad" combination of protocol parameter values so that dev->pollmodcount is not incremented inside pn533pollcreatemodlist(), thus leading to division by zero. Call trace looks like: nfcgenlstartpoll() nfcstartpoll() ->startpoll() pn533start_poll()
Add poll mod list filling check.
Found by Linux Verification Center (linuxtesting.org) with SVACE.(CVE-2024-46676)
In the Linux kernel, the following vulnerability has been resolved:
usb: typec: ucsi: Fix null pointer dereference in trace
ucsiregisteraltmode checks ISERR for the alt pointer and treats NULL as valid. When CONFIGTYPECDPALTMODE is not enabled, ucsiregisterdisplayport returns NULL which causes a NULL pointer dereference in trace. Rather than return NULL, call typecportregisteraltmode to register DisplayPort alternate mode as a non-controllable mode when CONFIGTYPECDPALTMODE is not enabled.(CVE-2024-46719)
In the Linux kernel, the following vulnerability has been resolved:
bpf: Remove tstrun from lwtseg6localprogops.
The syzbot reported that the lwtseg6 related BPF ops can be invoked via bpftestrun() without without entering inputactionendbpf() first.
Martin KaFai Lau said that self test for BPFPROGTYPELWTSEG6LOCAL probably didn't work since it was introduced in commit 04d4b274e2a ("ipv6: sr: Add seg6local action End.BPF"). The reason is that the per-CPU variable seg6bpfsrh_states::srh is never assigned in the self test case but each BPF function expects it.
Remove testrun for BPFPROGTYPELWT_SEG6LOCAL.(CVE-2024-46754)
In the Linux kernel, the following vulnerability has been resolved:
ice: Add netifdeviceattach/detach into PF reset flow
Ethtool callbacks can be executed while reset is in progress and try to access deleted resources, e.g. getting coalesce settings can result in a NULL pointer dereference seen below.
Reproduction steps: Once the driver is fully initialized, trigger reset: # echo 1 > /sys/class/net/<interface>/device/reset when reset is in progress try to get coalesce settings using ethtool: # ethtool -c <interface>
BUG: kernel NULL pointer dereference, address: 0000000000000020 PGD 0 P4D 0 Oops: Oops: 0000 [#1] PREEMPT SMP PTI CPU: 11 PID: 19713 Comm: ethtool Tainted: G S 6.10.0-rc7+ #7 RIP: 0010:icegetqcoalesce+0x2e/0xa0 [ice] RSP: 0018:ffffbab1e9bcf6a8 EFLAGS: 00010206 RAX: 000000000000000c RBX: ffff94512305b028 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff9451c3f2e588 RDI: ffff9451c3f2e588 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: ffff9451c3f2e580 R11: 000000000000001f R12: ffff945121fa9000 R13: ffffbab1e9bcf760 R14: 0000000000000013 R15: ffffffff9e65dd40 FS: 00007faee5fbe740(0000) GS:ffff94546fd80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000020 CR3: 0000000106c2e005 CR4: 00000000001706f0 Call Trace: <TASK> icegetcoalesce+0x17/0x30 [ice] coalescepreparedata+0x61/0x80 ethnldefaultdoit+0xde/0x340 genlfamilyrcvmsgdoit+0xf2/0x150 genlrcvmsg+0x1b3/0x2c0 netlinkrcvskb+0x5b/0x110 genlrcv+0x28/0x40 netlinkunicast+0x19c/0x290 netlinksendmsg+0x222/0x490 _syssendto+0x1df/0x1f0 _x64syssendto+0x24/0x30 dosyscall64+0x82/0x160 entrySYSCALL64after_hwframe+0x76/0x7e RIP: 0033:0x7faee60d8e27
Calling netifdevicedetach() before reset makes the net core not call the driver when ethtool command is issued, the attempt to execute an ethtool command during reset will result in the following message:
netlink error: No such device
instead of NULL pointer dereference. Once reset is done and icerebuild() is executing, the netifdevice_attach() is called to allow for ethtool operations to occur again in a safe manner.(CVE-2024-46770)
In the Linux kernel, the following vulnerability has been resolved:
ksmbd: unset the binding mark of a reused connection
Steve French reported null pointer dereference error from sha256 lib. cifs.ko can send session setup requests on reused connection. If reused connection is used for binding session, conn->binding can still remain true and generatepreauthhash() will not set sess->PreauthHashValue and it will be NULL. It is used as a material to create an encryption key in ksmbdgensmb311encryptionkey. ->PreauthHashValue cause null pointer dereference error from cryptoshash_update().
BUG: kernel NULL pointer dereference, address: 0000000000000000
PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 8 PID: 429254 Comm: kworker/8:39 Hardware name: LENOVO 20MAS08500/20MAS08500, BIOS N2CET69W (1.52 ) Workqueue: ksmbd-io handleksmbdwork [ksmbd] RIP: 0010:libsha256basedoupdate.isra.0+0x11e/0x1d0 [sha256ssse3] <TASK> ? showregs+0x6d/0x80 ? _die+0x24/0x80 ? pagefaultoops+0x99/0x1b0 ? douseraddrfault+0x2ee/0x6b0 ? excpagefault+0x83/0x1b0 ? asmexcpagefault+0x27/0x30 ? _pfxsha256transformrorx+0x10/0x10 [sha256ssse3] ? libsha256basedoupdate.isra.0+0x11e/0x1d0 [sha256ssse3] ? _pfxsha256transformrorx+0x10/0x10 [sha256ssse3] ? _pfxsha256transformrorx+0x10/0x10 [sha256ssse3] _sha256update+0x77/0xa0 [sha256ssse3] sha256avx2update+0x15/0x30 [sha256ssse3] cryptoshashupdate+0x1e/0x40 hmacupdate+0x12/0x20 cryptoshashupdate+0x1e/0x40 generatekey+0x234/0x380 [ksmbd] generatesmb3encryptionkey+0x40/0x1c0 [ksmbd] ksmbdgensmb311encryptionkey+0x72/0xa0 [ksmbd] ntlmauthenticate.isra.0+0x423/0x5d0 [ksmbd] smb2sesssetup+0x952/0xaa0 [ksmbd] _processrequest+0xa3/0x1d0 [ksmbd] _handleksmbdwork+0x1c4/0x2f0 [ksmbd] handleksmbdwork+0x2d/0xa0 [ksmbd] processonework+0x16c/0x350 workerthread+0x306/0x440 ? _pfxworkerthread+0x10/0x10 kthread+0xef/0x120 ? _pfxkthread+0x10/0x10 retfromfork+0x44/0x70 ? _pfxkthread+0x10/0x10 retfromfork_asm+0x1b/0x30 </TASK>(CVE-2024-46795)
In the Linux kernel, the following vulnerability has been resolved:
drm/amdgpu: the warning dereferencing obj for nbiov74
if ras_manager obj null, don't print NBIO err data(CVE-2024-46819)
In the Linux kernel, the following vulnerability has been resolved:
sched: sch_cake: fix bulk flow accounting logic for host fairness
In sch_cake, we keep track of the count of active bulk flows per host, when running in dst/src host fairness mode, which is used as the round-robin weight when iterating through flows. The count of active bulk flows is updated whenever a flow changes state.
This has a peculiar interaction with the hash collision handling: when a hash collision occurs (after the set-associative hashing), the state of the hash bucket is simply updated to match the new packet that collided, and if host fairness is enabled, that also means assigning new per-host state to the flow. For this reason, the bulk flow counters of the host(s) assigned to the flow are decremented, before new state is assigned (and the counters, which may not belong to the same host anymore, are incremented again).
Back when this code was introduced, the host fairness mode was always enabled, so the decrement was unconditional. When the configuration flags were introduced the increment was made conditional, but the decrement was not. Which of course can lead to a spurious decrement (and associated wrap-around to U16_MAX).
AFAICT, when host fairness is disabled, the decrement and wrap-around happens as soon as a hash collision occurs (which is not that common in itself, due to the set-associative hashing). However, in most cases this is harmless, as the value is only used when host fairness mode is enabled. So in order to trigger an array overflow, sch_cake has to first be configured with host fairness disabled, and while running in this mode, a hash collision has to occur to cause the overflow. Then, the qdisc has to be reconfigured to enable host fairness, which leads to the array out-of-bounds because the wrapped-around value is retained and used as an array index. It seems that syzbot managed to trigger this, which is quite impressive in its own right.
This patch fixes the issue by introducing the same conditional check on decrement as is used on increment.
The original bug predates the upstreaming of cake, but the commit listed in the Fixes tag touched that code, meaning that this patch won't apply before that.(CVE-2024-46828)
In the Linux kernel, the following vulnerability has been resolved:
btrfs: clean up our handling of refs == 0 in snapshot delete
In reada we BUGON(refs == 0), which could be unkind since we aren't holding a lock on the extent leaf and thus could get a transient incorrect answer. In walkdownproc we also BUGON(refs == 0), which could happen if we have extent tree corruption. Change that to return -EUCLEAN. In dowalkdown() we catch this case and handle it correctly, however we return -EIO, which -EUCLEAN is a more appropriate error code. Finally in walkupproc we have the same BUG_ON(refs == 0), so convert that to proper error handling. Also adjust the error message so we can actually do something with the information.(CVE-2024-46840)
In the Linux kernel, the following vulnerability has been resolved:
perf/x86/intel: Limit the period on Haswell
Running the ltp test cve-2015-3290 concurrently reports the following warnings.
perfevents: irq loop stuck! WARNING: CPU: 31 PID: 32438 at arch/x86/events/intel/core.c:3174 intelpmuhandleirq+0x285/0x370 Call Trace: <NMI> ? _warn+0xa4/0x220 ? intelpmuhandleirq+0x285/0x370 ? _reportbug+0x123/0x130 ? intelpmuhandleirq+0x285/0x370 ? _reportbug+0x123/0x130 ? intelpmuhandleirq+0x285/0x370 ? reportbug+0x3e/0xa0 ? handlebug+0x3c/0x70 ? excinvalidop+0x18/0x50 ? asmexcinvalidop+0x1a/0x20 ? irqworkclaim+0x1e/0x40 ? intelpmuhandleirq+0x285/0x370 perfeventnmihandler+0x3d/0x60 nmi_handle+0x104/0x330
Thanks to Thomas Gleixner's analysis, the issue is caused by the low initial period (1) of the frequency estimation algorithm, which triggers the defects of the HW, specifically erratum HSW11 and HSW143. (For the details, please refer https://lore.kernel.org/lkml/87plq9l5d2.ffs@tglx/)
The HSW11 requires a period larger than 100 for the INST_RETIRED.ALL event, but the initial period in the freq mode is 1. The erratum is the same as the BDM11, which has been supported in the kernel. A minimum period of 128 is enforced as well on HSW.
HSW143 is regarding that the fixed counter 1 may overcount 32 with the Hyper-Threading is enabled. However, based on the test, the hardware has more issues than it tells. Besides the fixed counter 1, the message 'interrupt took too long' can be observed on any counter which was armed with a period < 32 and two events expired in the same NMI. A minimum period of 32 is enforced for the rest of the events. The recommended workaround code of the HSW143 is not implemented. Because it only addresses the issue for the fixed counter. It brings extra overhead through extra MSR writing. No related overcounting issue has been reported so far.(CVE-2024-46848)
In the Linux kernel, the following vulnerability has been resolved:
net: dpaa: Pad packets to ETH_ZLEN
When sending packets under 60 bytes, up to three bytes of the buffer following the data may be leaked. Avoid this by extending all packets to ETH_ZLEN, ensuring nothing is leaked in the padding. This bug can be reproduced by running
$ ping -s 11 destination(CVE-2024-46854)
In the Linux kernel, the following vulnerability has been resolved:
netfilter: nft_socket: fix sk refcount leaks
We must put 'sk' reference before returning.(CVE-2024-46855)
In the Linux kernel, the following vulnerability has been resolved:
mptcp: pm: Fix uaf in _timerdelete_sync
There are two paths to access mptcppmdeladdtimer, result in a race condition:
CPU1 CPU2
==== ====
net_rx_action
napi_poll netlink_sendmsg
__napi_poll netlink_unicast
process_backlog netlink_unicast_kernel
__netif_receive_skb genl_rcv
__netif_receive_skb_one_core netlink_rcv_skb
NF_HOOK genl_rcv_msg
ip_local_deliver_finish genl_family_rcv_msg
ip_protocol_deliver_rcu genl_family_rcv_msg_doit
tcp_v4_rcv mptcp_pm_nl_flush_addrs_doit
tcp_v4_do_rcv mptcp_nl_remove_addrs_list
tcp_rcv_established mptcp_pm_remove_addrs_and_subflows
tcp_data_queue remove_anno_list_by_saddr
mptcp_incoming_options mptcp_pm_del_add_timer
mptcp_pm_del_add_timer kfree(entry)
In removeannolistbysaddr(running on CPU2), after leaving the critical zone protected by "pm.lock", the entry will be released, which leads to the occurrence of uaf in the mptcppmdeladdtimer(running on CPU1).
Keeping a reference to addtimer inside the lock, and calling skstoptimersync() with this reference, instead of "entry->add_timer".
Move listdel(&entry->list) to mptcppmdeladd_timer and inside the pm lock, do not directly access any members of the entry outside the pm lock, which can avoid similar "entry->x" uaf.(CVE-2024-46858)
In the Linux kernel, the following vulnerability has been resolved:
crypto: stm32/cryp - call finalize with bh disabled
The finalize operation in interrupt mode produce a produces a spinlock recursion warning. The reason is the fact that BH must be disabled during this process.(CVE-2024-47658)
In the Linux kernel, the following vulnerability has been resolved:
spi: hisi-kunpeng: Add verification for the max_frequency provided by the firmware
If the value of maxspeedhz is 0, it may cause a division by zero error in hisicalceffectivespeed(). The value of maxspeedhz is provided by firmware. Firmware is generally considered as a trusted domain. However, as division by zero errors can cause system failure, for defense measure, the value of maxspeed is validated here. So 0 is regarded as invalid and an error code is returned.(CVE-2024-47664)
In the Linux kernel, the following vulnerability has been resolved:
ocfs2: add bounds checking to ocfs2xattrfind_entry()
Add a paranoia check to make sure it doesn't stray beyond valid memory region containing ocfs2 xattr entries when scanning for a match. It will prevent out-of-bound access in case of crafted images.(CVE-2024-47670)
In the Linux kernel, the following vulnerability has been resolved:
USB: usbtmc: prevent kernel-usb-infoleak
The syzbot reported a kernel-usb-infoleak in usbtmc_write, we need to clear the structure before filling fields.(CVE-2024-47671)
In the Linux kernel, the following vulnerability has been resolved:
wifi: iwlwifi: mvm: don't wait for tx queues if firmware is dead
There is a WARNING in iwltranswaittxqueues_empty() (that was recently converted from just a message), that can be hit if we wait for TX queues to become empty after firmware died. Clearly, we can't expect anything from the firmware after it's declared dead.
Don't call iwltranswaittxqueues_empty() in this case. While it could be a good idea to stop the flow earlier, the flush functions do some maintenance work that is not related to the firmware, so keep that part of the code running even when the firmware is not running.
{ "severity": "Critical" }
{ "x86_64": [ "kernel-5.10.0-136.97.0.178.oe2203sp1.x86_64.rpm", "kernel-debuginfo-5.10.0-136.97.0.178.oe2203sp1.x86_64.rpm", "kernel-debugsource-5.10.0-136.97.0.178.oe2203sp1.x86_64.rpm", "kernel-devel-5.10.0-136.97.0.178.oe2203sp1.x86_64.rpm", "kernel-headers-5.10.0-136.97.0.178.oe2203sp1.x86_64.rpm", "kernel-source-5.10.0-136.97.0.178.oe2203sp1.x86_64.rpm", "kernel-tools-5.10.0-136.97.0.178.oe2203sp1.x86_64.rpm", "kernel-tools-debuginfo-5.10.0-136.97.0.178.oe2203sp1.x86_64.rpm", "kernel-tools-devel-5.10.0-136.97.0.178.oe2203sp1.x86_64.rpm", "perf-5.10.0-136.97.0.178.oe2203sp1.x86_64.rpm", "perf-debuginfo-5.10.0-136.97.0.178.oe2203sp1.x86_64.rpm", "python3-perf-5.10.0-136.97.0.178.oe2203sp1.x86_64.rpm", "python3-perf-debuginfo-5.10.0-136.97.0.178.oe2203sp1.x86_64.rpm" ], "aarch64": [ "kernel-5.10.0-136.97.0.178.oe2203sp1.aarch64.rpm", "kernel-debuginfo-5.10.0-136.97.0.178.oe2203sp1.aarch64.rpm", "kernel-debugsource-5.10.0-136.97.0.178.oe2203sp1.aarch64.rpm", "kernel-devel-5.10.0-136.97.0.178.oe2203sp1.aarch64.rpm", "kernel-headers-5.10.0-136.97.0.178.oe2203sp1.aarch64.rpm", "kernel-source-5.10.0-136.97.0.178.oe2203sp1.aarch64.rpm", "kernel-tools-5.10.0-136.97.0.178.oe2203sp1.aarch64.rpm", "kernel-tools-debuginfo-5.10.0-136.97.0.178.oe2203sp1.aarch64.rpm", "kernel-tools-devel-5.10.0-136.97.0.178.oe2203sp1.aarch64.rpm", "perf-5.10.0-136.97.0.178.oe2203sp1.aarch64.rpm", "perf-debuginfo-5.10.0-136.97.0.178.oe2203sp1.aarch64.rpm", "python3-perf-5.10.0-136.97.0.178.oe2203sp1.aarch64.rpm", "python3-perf-debuginfo-5.10.0-136.97.0.178.oe2203sp1.aarch64.rpm" ], "src": [ "kernel-5.10.0-136.97.0.178.oe2203sp1.src.rpm" ] }