In the Linux kernel, the following vulnerability has been resolved:
btrfs: fix deadlock when aborting transaction during relocation with scrub
Before relocating a block group we pause scrub, then do the relocation and then unpause scrub. The relocation process requires starting and committing a transaction, and if we have a failure in the critical section of the transaction commit path (transaction state >= TRANSSTATECOMMIT_START), we will deadlock if there is a paused scrub.
That results in stack traces like the following:
[42.479] BTRFS info (device sdc): relocating block group 53876686848 flags metadata|raid6 [42.936] BTRFS warning (device sdc): Skipping commit of aborted transaction. [42.936] ------------[ cut here ]------------ [42.936] BTRFS: Transaction aborted (error -28) [42.936] WARNING: CPU: 11 PID: 346822 at fs/btrfs/transaction.c:1977 btrfscommittransaction+0xcc8/0xeb0 [btrfs] [42.936] Modules linked in: dmflakey dmmod loop btrfs (...) [42.936] CPU: 11 PID: 346822 Comm: btrfs Tainted: G W 6.3.0-rc2-btrfs-next-127+ #1 [42.936] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [42.936] RIP: 0010:btrfscommittransaction+0xcc8/0xeb0 [btrfs] [42.936] Code: ff ff 45 8b (...) [42.936] RSP: 0018:ffffb58649633b48 EFLAGS: 00010282 [42.936] RAX: 0000000000000000 RBX: ffff8be6ef4d5bd8 RCX: 0000000000000000 [42.936] RDX: 0000000000000002 RSI: ffffffffb35e7782 RDI: 00000000ffffffff [42.936] RBP: ffff8be6ef4d5c98 R08: 0000000000000000 R09: ffffb586496339e8 [42.936] R10: 0000000000000001 R11: 0000000000000001 R12: ffff8be6d38c7c00 [42.936] R13: 00000000ffffffe4 R14: ffff8be6c268c000 R15: ffff8be6ef4d5cf0 [42.936] FS: 00007f381a82b340(0000) GS:ffff8beddfcc0000(0000) knlGS:0000000000000000 [42.936] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [42.936] CR2: 00007f1e35fb7638 CR3: 0000000117680006 CR4: 0000000000370ee0 [42.936] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [42.936] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [42.936] Call Trace: [42.936] <TASK> [42.936] ? starttransaction+0xcb/0x610 [btrfs] [42.936] preparetorelocate+0x111/0x1a0 [btrfs] [42.936] relocateblockgroup+0x57/0x5d0 [btrfs] [42.936] ? btrfswaitnocowwriters+0x25/0xb0 [btrfs] [42.936] btrfsrelocateblockgroup+0x248/0x3c0 [btrfs] [42.936] ? _pfxautoremovewakefunction+0x10/0x10 [42.936] btrfsrelocatechunk+0x3b/0x150 [btrfs] [42.936] btrfsbalance+0x8ff/0x11d0 [btrfs] [42.936] ? _kmemcacheallocnode+0x14a/0x410 [42.936] btrfsioctl+0x2334/0x32c0 [btrfs] [42.937] ? modobjcgstate+0xd2/0x360 [42.937] ? refillobjstock+0xb0/0x160 [42.937] ? seqrelease+0x25/0x30 [42.937] ? _rseqhandlenotifyresume+0x3b5/0x4b0 [42.937] ? percpucounteraddbatch+0x2e/0xa0 [42.937] ? _x64sysioctl+0x88/0xc0 [42.937] _x64sysioctl+0x88/0xc0 [42.937] dosyscall64+0x38/0x90 [42.937] entrySYSCALL64afterhwframe+0x72/0xdc [42.937] RIP: 0033:0x7f381a6ffe9b [42.937] Code: 00 48 89 44 24 (...) [42.937] RSP: 002b:00007ffd45ecf060 EFLAGS: 00000246 ORIGRAX: 0000000000000010 [42.937] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f381a6ffe9b [42.937] RDX: 00007ffd45ecf150 RSI: 00000000c4009420 RDI: 0000000000000003 [42.937] RBP: 0000000000000003 R08: 0000000000000013 R09: 0000000000000000 [42.937] R10: 00007f381a60c878 R11: 0000000000000246 R12: 00007ffd45ed0423 [42.937] R13: 00007ffd45ecf150 R14: 0000000000000000 R15: 00007ffd45ecf148 [42.937] </TASK> [42.937] ---[ end trace 0000000000000000 ]--- [42.937] BTRFS: error (device sdc: state A) in cleanuptransaction:1977: errno=-28 No space left [59.196] INFO: task btrfs:346772 blocked for more than 120 seconds. [59.196] Tainted: G W 6.3.0-rc2-btrfs-next-127+ #1 [59.196] "echo 0 > /proc/sys/kernel/hung ---truncated---