OESA-2024-2182

Currently, unbinding a CCU driver unmaps the device's MMIO region, while leaving its clocks/resets and their providers registered. This can cause a page fault later when some clock operation tries to perform MMIO. Fix this by separating the CCU initialization from the memory allocation, and then using a devres callback to unregister the clocks and resets.

This also fixes a memory leak of the struct ccu_reset, and uses the correct owner (the specific platform driver) for the clocks and resets.

Early OF clock providers are never unregistered, and limited error handling is possible, so they are mostly unchanged. The error reporting is made more consistent by moving the message inside ofsunxiccu_probe.(CVE-2021-47205)

In the Linux kernel, the following vulnerability has been resolved:

NFSD: Fix ia_size underflow

iattr::iasize is a lofft, which is a signed 64-bit type. NFSv3 and NFSv4 both define file size as an unsigned 64-bit type. Thus there is a range of valid file size values an NFS client can send that is already larger than Linux can handle.

Currently decodefattr4() dumps a full u64 value into iasize. If that value happens to be larger than S64MAX, then iasize underflows. I'm about to fix up the NFSv3 behavior as well, so let's catch the underflow in the common code path: nfsd_setattr().(CVE-2022-48828)

In the Linux kernel, the following vulnerability has been resolved:

net: mvpp2: clear BM pool before initialization

Register value persist after booting the kernel using kexec which results in kernel panic. Thus clear the BM pool registers before initialisation to fix the issue.(CVE-2024-35837)

In the Linux kernel, the following vulnerability has been resolved:

drivers: core: synchronize reallyprobe() and devuevent()

Synchronize the dev->driver usage in reallyprobe() and devuevent(). These can run in different threads, what can result in the following race condition for dev->driver uninitialization:

Thread #1:

reallyprobe() { ... probefailed: ... deviceunbindcleanup(dev) { ... dev->driver = NULL; // <= Failed probe sets dev->driver to NULL ... } ... }

Thread #2:

devuevent() { ... if (dev->driver) // If dev->driver is NULLed from reallyprobe() from here on, // after above check, the system crashes addueventvar(env, "DRIVER=%s", dev->driver->name); ... }

reallyprobe() holds the lock, already. So nothing needs to be done there. devuevent() is called with lock held, often, too. But not always. What implies that we can't add any locking in dev_uevent() itself. So fix this race by adding the lock to the non-protected path. This is the path where above race is observed:

devuevent+0x235/0x380 ueventshow+0x10c/0x1f0 <= Add lock here devattrshow+0x3a/0xa0 sysfskfseqshow+0x17c/0x250 kernfsseqshow+0x7c/0x90 seqreaditer+0x2d7/0x940 kernfsfopreaditer+0xc6/0x310 vfsread+0x5bc/0x6b0 ksysread+0xeb/0x1b0 _x64sysread+0x42/0x50 x64syscall+0x27ad/0x2d30 dosyscall64+0xcd/0x1d0 entrySYSCALL64after_hwframe+0x77/0x7f

Similar cases are reported by syzkaller in

https://syzkaller.appspot.com/bug?extid=ffa8143439596313a85a

But these are regarding the initialization of dev->driver

dev->driver = drv;

As this switches dev->driver to non-NULL these reports can be considered to be false-positives (which should be "fixed" by this commit, as well, though).

The same issue was reported and tried to be fixed back in 2015 in

https://lore.kernel.org/lkml/1421259054-2574-1-git-send-email-a.sangwan@samsung.com/

already.(CVE-2024-39501)

In the Linux kernel, the following vulnerability has been resolved:

scsi: qedi: Fix crash while reading debugfs attribute

The qedidbgdonotrecovercmdread() function invokes sprintf() directly on a __user pointer, which results into the crash.

To fix this issue, use a small local stack buffer for sprintf() and then call simplereadfrombuffer(), which in turns make the copyto_user() call.

BUG: unable to handle page fault for address: 00007f4801111000 PGD 8000000864df6067 P4D 8000000864df6067 PUD 864df7067 PMD 846028067 PTE 0 Oops: 0002 [#1] PREEMPT SMP PTI Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 06/15/2023 RIP: 0010:memcpyorig+0xcd/0x130 RSP: 0018:ffffb7a18c3ffc40 EFLAGS: 00010202 RAX: 00007f4801111000 RBX: 00007f4801111000 RCX: 000000000000000f RDX: 000000000000000f RSI: ffffffffc0bfd7a0 RDI: 00007f4801111000 RBP: ffffffffc0bfd7a0 R08: 725f746f6e5f6f64 R09: 3d7265766f636572 R10: ffffb7a18c3ffd08 R11: 0000000000000000 R12: 00007f4881110fff R13: 000000007fffffff R14: ffffb7a18c3ffca0 R15: ffffffffc0bfd7af FS: 00007f480118a740(0000) GS:ffff98e38af00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f4801111000 CR3: 0000000864b8e001 CR4: 00000000007706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> ? _diebody+0x1a/0x60 ? pagefaultoops+0x183/0x510 ? excpagefault+0x69/0x150 ? asmexcpagefault+0x22/0x30 ? memcpyorig+0xcd/0x130 vsnprintf+0x102/0x4c0 sprintf+0x51/0x80 qedidbgdonotrecovercmdread+0x2f/0x50 [qedi 6bcfdeeecdea037da47069eca2ba717c84a77324] fullproxyread+0x50/0x80 vfsread+0xa5/0x2e0 ? folioaddnewanonrmap+0x44/0xa0 ? setpteat+0x15/0x30 ? doptemissing+0x426/0x7f0 ksysread+0xa5/0xe0 dosyscall64+0x58/0x80 ? _countmemcgevents+0x46/0x90 ? countmemcgeventmm+0x3d/0x60 ? handlemmfault+0x196/0x2f0 ? douseraddrfault+0x267/0x890 ? excpagefault+0x69/0x150 entrySYSCALL64afterhwframe+0x72/0xdc RIP: 0033:0x7f4800f20b4d(CVE-2024-40978)

In the Linux kernel, the following vulnerability has been resolved:

dropmonitor: replace spinlock by rawspinlock

tracedropcommon() is called with preemption disabled, and it acquires a spinlock. This is problematic for RT kernels because spinlocks are sleeping locks in this configuration, which causes the following splat:

BUG: sleeping function called from invalid context at kernel/locking/spinlockrt.c:48 inatomic(): 1, irqsdisabled(): 1, nonblock: 0, pid: 449, name: rcuc/47 preemptcount: 1, expected: 0 RCU nest depth: 2, expected: 2 5 locks held by rcuc/47/449: #0: ff1100086ec30a60 ((softirqctrl.lock)){+.+.}-{2:2}, at: _localbhdisableip+0x105/0x210 #1: ffffffffb394a280 (rcureadlock){....}-{1:2}, at: rtspinlock+0xbf/0x130 #2: ffffffffb394a280 (rcureadlock){....}-{1:2}, at: _localbhdisableip+0x11c/0x210 #3: ffffffffb394a160 (rcucallback){....}-{0:0}, at: rcudobatch+0x360/0xc70 #4: ff1100086ee07520 (&data->lock){+.+.}-{2:2}, at: tracedropcommon.constprop.0+0xb5/0x290 irq event stamp: 139909 hardirqs last enabled at (139908): [<ffffffffb1df2b33>] _rawspinunlockirqrestore+0x63/0x80 hardirqs last disabled at (139909): [<ffffffffb19bd03d>] tracedropcommon.constprop.0+0x26d/0x290 softirqs last enabled at (139892): [<ffffffffb07a1083>] _localbhenableip+0x103/0x170 softirqs last disabled at (139898): [<ffffffffb0909b33>] rcucpukthread+0x93/0x1f0 Preemption disabled at: [<ffffffffb1de786b>] rtmutexslowunlock+0xab/0x2e0 CPU: 47 PID: 449 Comm: rcuc/47 Not tainted 6.9.0-rc2-rt1+ #7 Hardware name: Dell Inc. PowerEdge R650/0Y2G81, BIOS 1.6.5 04/15/2022 Call Trace: <TASK> dumpstacklvl+0x8c/0xd0 dumpstack+0x14/0x20 _mightresched+0x21e/0x2f0 rtspinlock+0x5e/0x130 ? tracedropcommon.constprop.0+0xb5/0x290 ? skbqueuepurgereason.part.0+0x1bf/0x230 tracedropcommon.constprop.0+0xb5/0x290 ? preemptcountsub+0x1c/0xd0 ? rawspinunlockirqrestore+0x4a/0x80 ? _pfxtracedropcommon.constprop.0+0x10/0x10 ? rtmutexslowunlock+0x26a/0x2e0 ? skbqueuepurgereason.part.0+0x1bf/0x230 ? _pfxrtmutexslowunlock+0x10/0x10 ? skbqueuepurgereason.part.0+0x1bf/0x230 tracekfreeskbhit+0x15/0x20 tracekfreeskb+0xe9/0x150 kfreeskbreason+0x7b/0x110 skbqueuepurgereason.part.0+0x1bf/0x230 ? _pfxskbqueuepurgereason.part.0+0x10/0x10 ? marklock.part.0+0x8a/0x520 ...

tracedropcommon() also disables interrupts, but this is a minor issue because we could easily replace it with a local_lock.

Replace the spinlock with rawspin_lock to avoid sleeping in atomic context.(CVE-2024-40980)

In the Linux kernel, the following vulnerability has been resolved:

jfs: don't walk off the end of ealist

Add a check before visiting the members of ea to make sure each ea stays within the ealist.(CVE-2024-41017)

In the Linux kernel, the following vulnerability has been resolved:

ata: libata-core: Fix null pointer dereference on error

If the ataportalloc() call in atahostalloc() fails, atahostrelease() will get called.

However, the code in atahostrelease() tries to free ata_port struct members unconditionally, which can lead to the following:

BUG: unable to handle page fault for address: 0000000000003990 PGD 0 P4D 0 Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 10 PID: 594 Comm: (udev-worker) Not tainted 6.10.0-rc5 #44 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014 RIP: 0010:atahostrelease.cold+0x2f/0x6e [libata] Code: e4 4d 63 f4 44 89 e2 48 c7 c6 90 ad 32 c0 48 c7 c7 d0 70 33 c0 49 83 c6 0e 41 RSP: 0018:ffffc90000ebb968 EFLAGS: 00010246 RAX: 0000000000000041 RBX: ffff88810fb52e78 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff88813b3218c0 RDI: ffff88813b3218c0 RBP: ffff88810fb52e40 R08: 0000000000000000 R09: 6c65725f74736f68 R10: ffffc90000ebb738 R11: 73692033203a746e R12: 0000000000000004 R13: 0000000000000000 R14: 0000000000000011 R15: 0000000000000006 FS: 00007f6cc55b9980(0000) GS:ffff88813b300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000003990 CR3: 00000001122a2000 CR4: 0000000000750ef0 PKRU: 55555554 Call Trace: <TASK> ? _diebody.cold+0x19/0x27 ? pagefaultoops+0x15a/0x2f0 ? excpagefault+0x7e/0x180 ? asmexcpagefault+0x26/0x30 ? atahostrelease.cold+0x2f/0x6e [libata] ? atahostrelease.cold+0x2f/0x6e [libata] releasenodes+0x35/0xb0 devresreleasegroup+0x113/0x140 atahostalloc+0xed/0x120 [libata] atahostallocpinfo+0x14/0xa0 [libata] ahciinit_one+0x6c9/0xd20 [ahci]

Do not access ata_port struct members unconditionally.(CVE-2024-41098)

In the Linux kernel, the following vulnerability has been resolved:

nilfs2: add missing check for inode numbers on directory entries

Syzbot reported that mounting and unmounting a specific pattern of corrupted nilfs2 filesystem images causes a use-after-free of metadata file inodes, which triggers a kernel bug in lruaddfn().

As Jan Kara pointed out, this is because the link count of a metadata file gets corrupted to 0, and nilfsevictinode(), which is called from iput(), tries to delete that inode (ifile inode in this case).

The inconsistency occurs because directories containing the inode numbers of these metadata files that should not be visible in the namespace are read without checking.

Fix this issue by treating the inode numbers of these internal files as errors in the sanity check helper when reading directory folios/pages.

Also thanks to Hillf Danton and Matthew Wilcox for their initial mm-layer analysis.(CVE-2024-42104)

In the Linux kernel, the following vulnerability has been resolved:

drm/amd/display: Skip finding free audio for unknown engine_id

[WHY] ENGINEIDUNKNOWN = -1 and can not be used as an array index. Plus, it also means it is uninitialized and does not need free audio.

[HOW] Skip and return NULL.

This fixes 2 OVERRUN issues reported by Coverity.(CVE-2024-42119)

In the Linux kernel, the following vulnerability has been resolved:

kobjectuevent: Fix OOB access within zapmodalias_env()

zapmodaliasenv() wrongly calculates size of memory block to move, so will cause OOB memory access issue if variable MODALIAS is not the last one within its @env parameter, fixed by correcting size to memmove.(CVE-2024-42292)

In the Linux kernel, the following vulnerability has been resolved:

lib: objagg: Fix general protection fault

The library supports aggregation of objects into other objects only if the parent object does not have a parent itself. That is, nesting is not supported.

Aggregation happens in two cases: Without and with hints, where hints are a pre-computed recommendation on how to aggregate the provided objects.

Nesting is not possible in the first case due to a check that prevents it, but in the second case there is no check because the assumption is that nesting cannot happen when creating objects based on hints. The violation of this assumption leads to various warnings and eventually to a general protection fault [1].

Before fixing the root cause, error out when nesting happens and warn.

[1] general protection fault, probably for non-canonical address 0xdead000000000d90: 0000 [#1] PREEMPT SMP PTI CPU: 1 PID: 1083 Comm: kworker/1:9 Tainted: G W 6.9.0-rc6-custom-gd9b4f1cca7fb #7 Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019 Workqueue: mlxswcore mlxswspacltcamvregionrehashwork RIP: 0010:mlxswspaclerpbfinsert+0x25/0x80 [...] Call Trace: <TASK> mlxswspaclatcamentryadd+0x256/0x3c0 mlxswspacltcamentrycreate+0x5e/0xa0 mlxswspacltcamvchunkmigrateone+0x16b/0x270 mlxswspacltcamvregionrehashwork+0xbe/0x510 processonework+0x151/0x370 workerthread+0x2cb/0x3e0 kthread+0xd0/0x100 retfromfork+0x34/0x50 retfromforkasm+0x1a/0x30 </TASK>(CVE-2024-43846)

In the Linux kernel, the following vulnerability has been resolved:

drm/vmwgfx: Fix a deadlock in dma buf fence polling

Introduce a version of the fence ops that on release doesn't remove the fence from the pending list, and thus doesn't require a lock to fix poll->fence wait->fence unref deadlocks.

vmwgfx overwrites the wait callback to iterate over the list of all fences and update their status, to do that it holds a lock to prevent the list modifcations from other threads. The fence destroy callback both deletes the fence and removes it from the list of pending fences, for which it holds a lock.

dma buf polling cb unrefs a fence after it's been signaled: so the poll calls the wait, which signals the fences, which are being destroyed. The destruction tries to acquire the lock on the pending fences list which it can never get because it's held by the wait from which it was called.

Old bug, but not a lot of userspace apps were using dma-buf polling interfaces. Fix those, in particular this fixes KDE stalls/deadlock.(CVE-2024-43863)

In the Linux kernel, the following vulnerability has been resolved:

jfs: fix null ptr deref in dtInsertEntry

[syzbot reported] general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN PTI KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f] CPU: 0 PID: 5061 Comm: syz-executor404 Not tainted 6.8.0-syzkaller-08951-gfe46a7dd189e #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 RIP: 0010:dtInsertEntry+0xd0c/0x1780 fs/jfs/jfsdtree.c:3713 ... [Analyze] In dtInsertEntry(), when the pointer h has the same value as p, after writing name in UniStrncpyto_le(), p->header.flag will be cleared. This will cause the previously true judgment "p->header.flag & BT-LEAF" to change to no after writing the name operation, this leads to entering an incorrect branch and accessing the uninitialized object ih when judging this condition for the second time.

[Fix] After got the page, check freelist first, if freelist == 0 then exit dtInsert() and return -EINVAL.(CVE-2024-44939)

In the Linux kernel, the following vulnerability has been resolved:

x86/mm: Fix pticlonepgtable() alignment assumption

Guenter reported dodgy crashes on an i386-nosmp build using GCC-11 that had the form of endless traps until entry stack exhaust and then

DF from the stack guard.

It turned out that pticlonepgtable() had alignment assumptions on the start address, notably it hard assumes start is PMD aligned. This is true on x86_64, but very much not true on i386.

These assumptions can cause the end condition to malfunction, leading to a 'short' clone. Guess what happens when the user mapping has a short copy of the entry text?

Use the correct increment form for addr to avoid alignment assumptions.(CVE-2024-44965)

In the Linux kernel, the following vulnerability has been resolved:

net: hns3: fix a deadlock problem when config TC during resetting

When config TC during the reset process, may cause a deadlock, the flow is as below: pf reset start │ ▼ ...... setup tc │ │ ▼ ▼ DOWN: napidisable() napidisable()(skip) │ │ │ ▼ ▼ ...... ...... │ │ ▼ │ napienable() │ ▼ UINIT: netifnapidel() │ ▼ ...... │ ▼ INIT: netifnapiadd() │ ▼ ...... global reset start │ │ ▼ ▼ UP: napienable()(skip) ...... │ │ ▼ ▼ ...... napi_disable()

In reset process, the driver will DOWN the port and then UINIT, in this case, the setup tc process will UP the port before UINIT, so cause the problem. Adds a DOWN process in UINIT to fix it.(CVE-2024-44995)

In the Linux kernel, the following vulnerability has been resolved:

gtp: pull network headers in gtpdevxmit()

syzbot/KMSAN reported use of uninit-value in getdevxmit() [1]

We must make sure the IPv4 or Ipv6 header is pulled in skb->head before accessing fields in them.

Use pskbinetmay_pull() to fix this issue.

[1] BUG: KMSAN: uninit-value in ipv6pdpfind drivers/net/gtp.c:220 [inline] BUG: KMSAN: uninit-value in gtpbuildskbip6 drivers/net/gtp.c:1229 [inline] BUG: KMSAN: uninit-value in gtpdevxmit+0x1424/0x2540 drivers/net/gtp.c:1281 ipv6pdpfind drivers/net/gtp.c:220 [inline] gtpbuildskbip6 drivers/net/gtp.c:1229 [inline] gtpdevxmit+0x1424/0x2540 drivers/net/gtp.c:1281 _netdevstartxmit include/linux/netdevice.h:4913 [inline] netdevstartxmit include/linux/netdevice.h:4922 [inline] xmitone net/core/dev.c:3580 [inline] devhardstartxmit+0x247/0xa20 net/core/dev.c:3596 _devqueuexmit+0x358c/0x5610 net/core/dev.c:4423 devqueuexmit include/linux/netdevice.h:3105 [inline] packetxmit+0x9c/0x6c0 net/packet/afpacket.c:276 packetsnd net/packet/afpacket.c:3145 [inline] packetsendmsg+0x90e3/0xa3a0 net/packet/afpacket.c:3177 socksendmsgnosec net/socket.c:730 [inline] _socksendmsg+0x30f/0x380 net/socket.c:745 _syssendto+0x685/0x830 net/socket.c:2204 _dosyssendto net/socket.c:2216 [inline] _sesyssendto net/socket.c:2212 [inline] _x64syssendto+0x125/0x1d0 net/socket.c:2212 x64syscall+0x3799/0x3c10 arch/x86/include/generated/asm/syscalls64.h:45 dosyscallx64 arch/x86/entry/common.c:52 [inline] dosyscall64+0xcd/0x1e0 arch/x86/entry/common.c:83 entrySYSCALL64afterhwframe+0x77/0x7f

Uninit was created at: slabpostallochook mm/slub.c:3994 [inline] slaballocnode mm/slub.c:4037 [inline] kmemcacheallocnodenoprof+0x6bf/0xb80 mm/slub.c:4080 kmallocreserve+0x13d/0x4a0 net/core/skbuff.c:583 _allocskb+0x363/0x7b0 net/core/skbuff.c:674 allocskb include/linux/skbuff.h:1320 [inline] allocskbwithfrags+0xc8/0xbf0 net/core/skbuff.c:6526 sockallocsendpskb+0xa81/0xbf0 net/core/sock.c:2815 packetallocskb net/packet/afpacket.c:2994 [inline] packetsnd net/packet/afpacket.c:3088 [inline] packetsendmsg+0x749c/0xa3a0 net/packet/afpacket.c:3177 socksendmsgnosec net/socket.c:730 [inline] _socksendmsg+0x30f/0x380 net/socket.c:745 _syssendto+0x685/0x830 net/socket.c:2204 _dosyssendto net/socket.c:2216 [inline] _sesyssendto net/socket.c:2212 [inline] _x64syssendto+0x125/0x1d0 net/socket.c:2212 x64syscall+0x3799/0x3c10 arch/x86/include/generated/asm/syscalls64.h:45 dosyscallx64 arch/x86/entry/common.c:52 [inline] dosyscall64+0xcd/0x1e0 arch/x86/entry/common.c:83 entrySYSCALL64afterhwframe+0x77/0x7f

CPU: 0 UID: 0 PID: 7115 Comm: syz.1.515 Not tainted 6.11.0-rc1-syzkaller-00043-g94ede2a3e913 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/27/2024(CVE-2024-44999)

In the Linux kernel, the following vulnerability has been resolved:

vfs: Don't evict inode under the inode lru traversing context

The inode reclaiming process(See function pruneicachesb) collects all reclaimable inodes and mark them with IFREEING flag at first, at that time, other processes will be stuck if they try getting these inodes (See function findinodefast), then the reclaiming process destroy the inodes by function disposelist(). Some filesystems(eg. ext4 with ea_inode feature, ubifs with xattr) may do inode lookup in the inode evicting callback function, if the inode lookup is operated under the inode lru traversing context, deadlock problems may happen.

Case 1: In function ext4evictinode(), the ea inode lookup could happen if ea_inode feature is enabled, the lookup process will be stuck under the evicting context like this:

File A has inode ireg and an ea inode iea
getfattr(A, xattrbuf) // iea is added into lru // lru->i_ea
Then, following three processes running like this:

PA PB echo 2 > /proc/sys/vm/dropcaches shrinkslab prunedcachesb // ireg is added into lru, lru->iea->ireg pruneicachesb listlruwalkone inodelruisolate iea->istate |= IFREEING // set inode state inodelruisolate _iget(ireg) spinunlock(&ireg->ilock) spinunlock(lrulock) rm file A ireg->nlink = 0 iput(ireg) // ireg->nlink is 0, do evict ext4evictinode ext4xattrdeleteinode ext4xattrinodedecrefall ext4xattrinodeiget ext4iget(iea->iino) igetlocked findinodefast _waitonfreeinginode(iea) ----→ AA deadlock disposelist // cannot be executed by pruneicachesb wakeupbit(&iea->istate)

Case 2: In deleted inode writing function ubifsjnlwriteinode(), file deleting process holds BASEHD's wbuf->iomutex while getting the xattr inode, which could race with inode reclaiming process(The reclaiming process could try locking BASEHD's wbuf->io_mutex in inode evicting function), then an ABBA deadlock problem would happen as following:

File A has inode ia and a xattr(with inode ixa), regular file B has inode ib and a xattr.
getfattr(A, xattr_buf) // ixa is added into lru // lru->ixa

Then, following three processes running like this:

   PA                PB                        PC
           echo 2 &gt; /proc/sys/vm/drop_caches
            shrink_slab
             prune_dcache_sb
             // ib and ia are added into lru, lru-&gt;ixa-&gt;ib-&gt;ia
             prune_icache_sb
              list_lru_walk_one
               inode_lru_isolate
                ixa-&gt;i_state |= I_FREEING // set inode state
               inode_lru_isolate
                __iget(ib)
                spin_unlock(&amp;ib-&gt;i_lock)
                spin_unlock(lru_lock)
                                              rm file B
                                               ib-&gt;nlink = 0

rm file A iput(ia) ubifsevictinode(ia) ubifsjnldeleteinode(ia) ubifsjnlwriteinode(ia) makereservation(BASEHD) // Lock wbuf->iomutex ubifsiget(ixa->iino) igetlocked findinodefast _waitonfreeinginode(ixa) | iput(ib) // ib->nlink is 0, do evict | ubifsevictinode | ubifsjnldeleteinode(ib) ↓ ubifsjnlwriteinode ABBA deadlock ←-----makereservation(BASEHD) disposelist // cannot be executed by pruneicachesb wakeupbit(&ixa->istate)

Fix the possible deadlock by using new inode state flag ILRUISOLATING to pin the inode in memory while inodelruisolate( ---truncated---(CVE-2024-45003)

In the Linux kernel, the following vulnerability has been resolved:

fix bitmap corruption on closerange() with CLOSERANGE_UNSHARE

copyfdbitmaps(new, old, count) is expected to copy the first count/BITSPERLONG bits from old->fullfdsbits[] and fill the rest with zeroes. What it does is copying enough words (BITSTOLONGS(count/BITSPERLONG)), then memsets the rest. That works fine, if all bits past the cutoff point are clear. Otherwise we are risking garbage from the last word we'd copied.

For most of the callers that is true - expandfdtable() has count equal to old->maxfds, so there's no open descriptors past count, let alone fully occupied words in ->openfds[], which is what bits in ->fullfds_bits[] correspond to.

The other caller (dupfd()) passes sanefdtablesize(oldfdt, maxfds), which is the smallest multiple of BITSPERLONG that covers all opened descriptors below maxfds. In the common case (copying on fork()) maxfds is ~0U, so all opened descriptors will be below it and we are fine, by the same reasons why the call in expandfdtable() is safe.

Unfortunately, there is a case where maxfds is less than that and where we might, indeed, end up with junk in ->fullfdsbits[] - closerange(from, to, CLOSERANGEUNSHARE) with * descriptor table being currently shared * 'to' being above the current capacity of descriptor table * 'from' being just under some chunk of opened descriptors. In that case we end up with observably wrong behaviour - e.g. spawn a child with CLONEFILES, get all descriptors in range 0..127 open, then closerange(64, ~0U, CLOSERANGEUNSHARE) and watch dup(0) ending up with descriptor #128, despite #64 being observably not open.

The minimally invasive fix would be to deal with that in dupfd(). If this proves to add measurable overhead, we can go that way, but let's try to fix copyfd_bitmaps() first.

new helper: bitmapcopyandexpand(to, from, bitsto_copy, size).
make copyfdbitmaps() take the bitmap size in words, rather than bits; it's 'count' argument is always a multiple of BITSPERLONG, so we are not losing any information, and that way we can use the same helper for all three bitmaps - compiler will see that count is a multiple of BITSPERLONG for the large ones, so it'll generate plain memcpy()+memset().

Reproducer added to tools/testing/selftests/core/closerangetest.c(CVE-2024-45025)

In the Linux kernel, the following vulnerability has been resolved:

mmc: mmc_test: Fix NULL dereference on allocation failure

If the "test->highmem = allocpages()" allocation fails then calling _free_pages(test->highmem) will result in a NULL dereference. Also change the error code to -ENOMEM instead of returning success.(CVE-2024-45028)

In the Linux kernel, the following vulnerability has been resolved:

drm/amd/display: Skip wbsclsetscaler_filter if filter is null

Callers can pass null in filter (i.e. from returned from the function wbsclgetfiltercoeffs16p) and a null check is added to ensure that is not the case.

This fixes 4 NULL_RETURNS issues reported by Coverity.(CVE-2024-46714)

In the Linux kernel, the following vulnerability has been resolved:

drm/amdgpu: fix ucode out-of-bounds read warning

Clear warning that read ucode[] may out-of-bounds.(CVE-2024-46723)

In the Linux kernel, the following vulnerability has been resolved:

drm/amd/pm: fix the Out-of-bounds read warning

using index i - 1U may beyond element index for mc_data[] when i = 0.(CVE-2024-46731)

In the Linux kernel, the following vulnerability has been resolved:

btrfs: fix qgroup reserve leaks in cowfilerange

In the buffered write path, the dirty page owns the qgroup reserve until it creates an ordered_extent.

Therefore, any errors that occur before the orderedextent is created must free that reservation, or else the space is leaked. The fstest generic/475 exercises various IO error paths, and is able to trigger errors in cowfile_range where we fail to get to allocating the ordered extent. Note that because we do clear delalloc, we are likely to remove the inode from the delalloc list, so the inodes/pages to not have invalidate/launder called on them in the commit abort path.

This results in failures at the unmount stage of the test that look like:

BTRFS: error (device dm-8 state EA) in cleanuptransaction:2018: errno=-5 IO failure BTRFS: error (device dm-8 state EA) in btrfsreplacefileextents:2416: errno=-5 IO failure BTRFS warning (device dm-8 state EA): qgroup 0/5 has unreleased space, type 0 rsv 28672 ------------[ cut here ]------------ WARNING: CPU: 3 PID: 22588 at fs/btrfs/disk-io.c:4333 closectree+0x222/0x4d0 [btrfs] Modules linked in: btrfs blake2bgeneric libcrc32c xor zstdcompress raid6pq CPU: 3 PID: 22588 Comm: umount Kdump: loaded Tainted: G W 6.10.0-rc7-gab56fde445b8 #21 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 RIP: 0010:closectree+0x222/0x4d0 [btrfs] RSP: 0018:ffffb4465283be00 EFLAGS: 00010202 RAX: 0000000000000001 RBX: ffffa1a1818e1000 RCX: 0000000000000001 RDX: 0000000000000000 RSI: ffffb4465283bbe0 RDI: ffffa1a19374fcb8 RBP: ffffa1a1818e13c0 R08: 0000000100028b16 R09: 0000000000000000 R10: 0000000000000003 R11: 0000000000000003 R12: ffffa1a18ad7972c R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f9168312b80(0000) GS:ffffa1a4afcc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f91683c9140 CR3: 000000010acaa000 CR4: 00000000000006f0 Call Trace: <TASK> ? closectree+0x222/0x4d0 [btrfs] ? _warn.cold+0x8e/0xea ? closectree+0x222/0x4d0 [btrfs] ? reportbug+0xff/0x140 ? handlebug+0x3b/0x70 ? excinvalidop+0x17/0x70 ? asmexcinvalidop+0x1a/0x20 ? closectree+0x222/0x4d0 [btrfs] genericshutdownsuper+0x70/0x160 killanonsuper+0x11/0x40 btrfskillsuper+0x11/0x20 [btrfs] deactivatelockedsuper+0x2e/0xa0 cleanupmnt+0xb5/0x150 taskworkrun+0x57/0x80 syscallexittousermode+0x121/0x130 dosyscall64+0xab/0x1a0 entrySYSCALL64after_hwframe+0x77/0x7f RIP: 0033:0x7f916847a887 ---[ end trace 0000000000000000 ]--- BTRFS error (device dm-8 state EA): qgroup reserved space leaked

Cases 2 and 3 in the outreserve path both pertain to this type of leak and must free the reserved qgroup data. Because it is already an error path, I opted not to handle the possible errors in btrfsfreeqgroupdata.(CVE-2024-46733)

In the Linux kernel, the following vulnerability has been resolved:

smb/server: fix potential null-ptr-deref of leasectxinfo in smb2_open()

null-ptr-deref will occur when (reqoplevel == SMB2OPLOCKLEVELLEASE) and parselease_state() return NULL.

Fix this by check if 'leasectxinfo' is NULL.

Additionally, remove the redundant parentheses in parsedurablehandle_context().(CVE-2024-46742)

In the Linux kernel, the following vulnerability has been resolved:

Squashfs: sanity check symbolic link size

Syzkiller reports a "KMSAN: uninit-value in pick_link" bug.

This is caused by an uninitialised page, which is ultimately caused by a corrupted symbolic link size read from disk.

The reason why the corrupted symlink size causes an uninitialised page is due to the following sequence of events:

squashfsreadinode() is called to read the symbolic link from disk. This assigns the corrupted value 3875536935 to inode->i_size.
Later squashfssymlinkread_folio() is called, which assigns this corrupted value to the length variable, which being a signed int, overflows producing a negative number.
The following loop that fills in the page contents checks that the copied bytes is less than length, which being negative means the loop is skipped, producing an uninitialised page.

This patch adds a sanity check which checks that the symbolic link size is not larger than expected.

V2: fix spelling mistake.(CVE-2024-46744)

In the Linux kernel, the following vulnerability has been resolved:

Input: uinput - reject requests with unreasonable number of slots

When exercising uinput interface syzkaller may try setting up device with a really large number of slots, which causes memory allocation failure in inputmtinit_slots(). While this allocation failure is handled properly and request is rejected, it results in syzkaller reports. Additionally, such request may put undue burden on the system which will try to free a lot of memory for a bogus request.

Fix it by limiting allowed number of slots to 100. This can easily be extended if we see devices that can track more than 100 contacts.(CVE-2024-46745)

In the Linux kernel, the following vulnerability has been resolved:

HID: cougar: fix slab-out-of-bounds Read in cougarreportfixup

report_fixup for the Cougar 500k Gaming Keyboard was not verifying that the report descriptor size was correct before accessing it(CVE-2024-46747)

In the Linux kernel, the following vulnerability has been resolved:

btrfs: don't BUGON() when 0 reference count at btrfslookupextentinfo()

Instead of doing a BUG_ON() handle the error by returning -EUCLEAN, aborting the transaction and logging an error message.(CVE-2024-46751)

In the Linux kernel, the following vulnerability has been resolved:

btrfs: replace BUGON() with error handling at updaterefforcow()

Instead of a BUG_ON() just return an error, log an error message and abort the transaction in case we find an extent buffer belonging to the relocation tree that doesn't have the full backref flag set. This is unexpected and should never happen (save for bugs or a potential bad memory).(CVE-2024-46752)

In the Linux kernel, the following vulnerability has been resolved:

userfaultfd: fix checks for huge PMDs

Patch series "userfaultfd: fix races around pmdtranshuge() check", v2.

The pmdtranshuge() code in mfill_atomic() is wrong in three different ways depending on kernel version:

The pmdtranshuge() check is racy and can lead to a BUG_ON() (if you hit the right two race windows) - I've tested this in a kernel build with some extra mdelay() calls. See the commit message for a description of the race scenario. On older kernels (before 6.5), I think the same bug can even theoretically lead to accessing transhuge page contents as a page table if you hit the right 5 narrow race windows (I haven't tested this case).
As pointed out by Qi Zheng, pmdtranshuge() is not sufficient for detecting PMDs that don't point to page tables. On older kernels (before 6.5), you'd just have to win a single fairly wide race to hit this. I've tested this on 6.1 stable by racing migration (with a mdelay() patched into trytomigrate()) against UFFDIOZEROPAGE - on my x86 VM, that causes a kernel oops in ptlockptr().
On newer kernels (>=6.5), for shmem mappings, khugepaged is allowed to yank page tables out from under us (though I haven't tested that), so I think the BUGON() checks in mfillatomic() are just wrong.

I decided to write two separate fixes for these (one fix for bugs 1+2, one fix for bug 3), so that the first fix can be backported to kernels affected by bugs 1+2.

This patch (of 2):

This fixes two issues.

I discovered that the following race can occur:

mfillatomic other thread ============ ============ <zap PMD> pmdpgetlockless() [reads none pmd] <bail if transhuge> <if none:> <pagefault creates transhuge zeropage> _ptealloc [no-op] <zap PMD> <bail if pmdtranshuge(dst_pmd)> BUG_ON(pmd_none(dst_pmd))

I have experimentally verified this in a kernel with extra mdelay() calls; the BUGON(pmdnone(*dst_pmd)) triggers.

On kernels newer than commit 0d940a9b270b ("mm/pgtable: allow pteoffsetmaplock to fail"), this can't lead to anything worse than a BUGON(), since the page table access helpers are actually designed to deal with page tables concurrently disappearing; but on older kernels (<=6.4), I think we could probably theoretically race past the two BUG_ON() checks and end up treating a hugepage as a page table.

The second issue is that, as Qi Zheng pointed out, there are other types of huge PMDs that pmdtranshuge() can't catch: devmap PMDs and swap PMDs (in particular, migration PMDs).

On <=6.4, this is worse than the first issue: If mfillatomic() runs on a PMD that contains a migration entry (which just requires winning a single, fairly wide race), it will pass the PMD to pteoffsetmaplock(), which assumes that the PMD points to a page table.

Breakage follows: First, the kernel tries to take the PTE lock (which will crash or maybe worse if there is no "struct page" for the address bits in the migration entry PMD - I think at least on X86 there usually is no corresponding "struct page" thanks to the PTE inversion mitigation, amd64 looks different).

If that didn't crash, the kernel would next try to write a PTE into what it wrongly thinks is a page table.

As part of fixing these issues, get rid of the check for pmdtranshuge() before _ptealloc() - that's redundant, we're going to have to check for that after the _ptealloc() anyway.

Backport note: pmdpgetlockless() is pmdreadatomic() in older kernels.(CVE-2024-46787)

In the Linux kernel, the following vulnerability has been resolved:

sch/netem: fix use after free in netem_dequeue

If netemdequeue() enqueues packet to inner qdisc and that qdisc returns _NETXMITSTOLEN. The packet is dropped but qdisctreereduce_backlog() is not called to update the parent's q.qlen, leading to the similar use-after-free as Commit e04991a48dbaf382 ("netem: fix return value if duplicate enqueue fails")

Commands to trigger KASAN UaF:

ip link add type dummy ip link set lo up ip link set dummy0 up tc qdisc add dev lo parent root handle 1: drr tc filter add dev lo parent 1: basic classid 1:1 tc class add dev lo classid 1:1 drr tc qdisc add dev lo parent 1:1 handle 2: netem tc qdisc add dev lo parent 2: handle 3: drr tc filter add dev lo parent 3: basic classid 3:1 action mirred egress redirect dev dummy0 tc class add dev lo classid 3:1 drr ping -c1 -W0.01 localhost # Trigger bug tc class del dev lo classid 1:1 tc class add dev lo classid 1:1 drr ping -c1 -W0.01 localhost # UaF(CVE-2024-46800)

Database specific

{
    "severity": "High"
}

References

Affected packages

openEuler:22.03-LTS-SP4 / kernel

Package

Name: kernel
Purl: pkg:rpm/openEuler/kernel&distro=openEuler-22.03-LTS-SP4

Affected ranges

Type: ECOSYSTEM
Events: Introduced

0Unknown introduced version / All previous versions are affected

Fixed

5.10.0-230.0.0.129.oe2203sp4

Ecosystem specific

{
    "x86_64": [
        "bpftool-5.10.0-230.0.0.129.oe2203sp4.x86_64.rpm",
        "bpftool-debuginfo-5.10.0-230.0.0.129.oe2203sp4.x86_64.rpm",
        "kernel-5.10.0-230.0.0.129.oe2203sp4.x86_64.rpm",
        "kernel-debuginfo-5.10.0-230.0.0.129.oe2203sp4.x86_64.rpm",
        "kernel-debugsource-5.10.0-230.0.0.129.oe2203sp4.x86_64.rpm",
        "kernel-devel-5.10.0-230.0.0.129.oe2203sp4.x86_64.rpm",
        "kernel-headers-5.10.0-230.0.0.129.oe2203sp4.x86_64.rpm",
        "kernel-source-5.10.0-230.0.0.129.oe2203sp4.x86_64.rpm",
        "kernel-tools-5.10.0-230.0.0.129.oe2203sp4.x86_64.rpm",
        "kernel-tools-debuginfo-5.10.0-230.0.0.129.oe2203sp4.x86_64.rpm",
        "kernel-tools-devel-5.10.0-230.0.0.129.oe2203sp4.x86_64.rpm",
        "perf-5.10.0-230.0.0.129.oe2203sp4.x86_64.rpm",
        "perf-debuginfo-5.10.0-230.0.0.129.oe2203sp4.x86_64.rpm",
        "python3-perf-5.10.0-230.0.0.129.oe2203sp4.x86_64.rpm",
        "python3-perf-debuginfo-5.10.0-230.0.0.129.oe2203sp4.x86_64.rpm"
    ],
    "aarch64": [
        "bpftool-5.10.0-230.0.0.129.oe2203sp4.aarch64.rpm",
        "bpftool-debuginfo-5.10.0-230.0.0.129.oe2203sp4.aarch64.rpm",
        "kernel-5.10.0-230.0.0.129.oe2203sp4.aarch64.rpm",
        "kernel-debuginfo-5.10.0-230.0.0.129.oe2203sp4.aarch64.rpm",
        "kernel-debugsource-5.10.0-230.0.0.129.oe2203sp4.aarch64.rpm",
        "kernel-devel-5.10.0-230.0.0.129.oe2203sp4.aarch64.rpm",
        "kernel-headers-5.10.0-230.0.0.129.oe2203sp4.aarch64.rpm",
        "kernel-source-5.10.0-230.0.0.129.oe2203sp4.aarch64.rpm",
        "kernel-tools-5.10.0-230.0.0.129.oe2203sp4.aarch64.rpm",
        "kernel-tools-debuginfo-5.10.0-230.0.0.129.oe2203sp4.aarch64.rpm",
        "kernel-tools-devel-5.10.0-230.0.0.129.oe2203sp4.aarch64.rpm",
        "perf-5.10.0-230.0.0.129.oe2203sp4.aarch64.rpm",
        "perf-debuginfo-5.10.0-230.0.0.129.oe2203sp4.aarch64.rpm",
        "python3-perf-5.10.0-230.0.0.129.oe2203sp4.aarch64.rpm",
        "python3-perf-debuginfo-5.10.0-230.0.0.129.oe2203sp4.aarch64.rpm"
    ],
    "src": [
        "kernel-5.10.0-230.0.0.129.oe2203sp4.src.rpm"
    ]
}

Database specific

source

"https://repo.openeuler.org/security/data/osv/OESA-2024-2182.json"