In the Linux kernel, the following vulnerability has been resolved:
btrfs: do proper folio cleanup when rundelallocnocow() failed
[BUG] With CONFIGDEBUGVM set, test case generic/476 has some chance to crash with the following VMBUGON_FOLIO():
BTRFS error (device dm-3): cowfilerange failed, start 1146880 end 1253375 len 106496 ret -28 BTRFS error (device dm-3): rundelallocnocow failed, start 1146880 end 1253375 len 106496 ret -28 page: refcount:4 mapcount:0 mapping:00000000592787cc index:0x12 pfn:0x10664 aops:btrfsaops [btrfs] ino:101 dentry name(?):"f1774" flags: 0x2fffff80004028(uptodate|lru|private|node=0|zone=2|lastcpupid=0xfffff) page dumped because: VMBUGONFOLIO(!foliotestlocked(folio)) ------------[ cut here ]------------ kernel BUG at mm/page-writeback.c:2992! Internal error: Oops - BUG: 00000000f2000800 [#1] SMP CPU: 2 UID: 0 PID: 3943513 Comm: kworker/u24:15 Tainted: G OE 6.12.0-rc7-custom+ #87 Tainted: [O]=OOTMODULE, [E]=UNSIGNEDMODULE Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022 Workqueue: eventsunbound btrfsasyncreclaimdataspace [btrfs] pc : foliocleardirtyforio+0x128/0x258 lr : foliocleardirtyforio+0x128/0x258 Call trace: foliocleardirtyforio+0x128/0x258 btrfsfolioclampcleardirty+0x80/0xd0 [btrfs] _processfolioscontig+0x154/0x268 [btrfs] extentclearunlockdelalloc+0x5c/0x80 [btrfs] rundelallocnocow+0x5f8/0x760 [btrfs] btrfsrundelallocrange+0xa8/0x220 [btrfs] writepagedelalloc+0x230/0x4c8 [btrfs] extentwritepage+0xb8/0x358 [btrfs] extentwritecachepages+0x21c/0x4e8 [btrfs] btrfswritepages+0x94/0x150 [btrfs] dowritepages+0x74/0x190 filemapfdatawritewbc+0x88/0xc8 startdelallocinodes+0x178/0x3a8 [btrfs] btrfsstartdelallocroots+0x174/0x280 [btrfs] shrinkdelalloc+0x114/0x280 [btrfs] flushspace+0x250/0x2f8 [btrfs] btrfsasyncreclaimdataspace+0x180/0x228 [btrfs] processonework+0x164/0x408 workerthread+0x25c/0x388 kthread+0x100/0x118 retfrom_fork+0x10/0x20 Code: 910a8021 a90363f7 a9046bf9 94012379 (d4210000) ---[ end trace 0000000000000000 ]---
[CAUSE] The first two lines of extra debug messages show the problem is caused by the error handling of rundelallocnocow().
E.g. we have the following dirtied range (4K blocksize 4K page size):
0 16K 32K
|//////////////////////////////////////|
| Pre-allocated |
And the range [0, 16K) has a preallocated extent.
Enter rundelallocnocow() for range [0, 16K) Which found range [0, 16K) is preallocated, can do the proper NOCOW write.
Enter fallbacktofow() for range [16K, 32K) Since the range [16K, 32K) is not backed by preallocated extent, we have to go COW.
cowfilerange() failed for range [16K, 32K) So cowfilerange() will do the clean up by clearing folio dirty, unlock the folios.
Now the folios in range [16K, 32K) is unlocked.
Enter extentclearunlockdelalloc() from rundelallocnocow() Which is called with PAGESTARTWRITEBACK to start page writeback. But folios can only be marked writeback when it's properly locked, thus this triggered the VMBUGONFOLIO().
Furthermore there is another hidden but common bug that rundelallocnocow() is not clearing the folio dirty flags in its error handling path. This is the common bug shared between rundelallocnocow() and cowfilerange().
[FIX] - Clear folio dirty for range [@start, @curoffset) Introduce a helper, cleanupdirtyfolios(), which will find and lock the folio in the range, clear the dirty flag and start/end the writeback, with the extra handling for the @lockedfolio.
Introduce a helper to clear folio dirty, start and end writeback
Introduce a helper to record the last failed COW range end This is to trace which range we should skip, to avoid double unlocking.
Skip the failed COW range for the e ---truncated---