In the Linux kernel, the following vulnerability has been resolved:
mm: gup: stop abusing trygrabfolio
A kernel warning was reported when pinning folio in CMA memory when launching SEV virtual machine. The splat looks like:
[ 464.325306] WARNING: CPU: 13 PID: 6734 at mm/gup.c:1313 getuserpages+0x423/0x520 [ 464.325464] CPU: 13 PID: 6734 Comm: qemu-kvm Kdump: loaded Not tainted 6.6.33+ #6 [ 464.325477] RIP: 0010:getuserpages+0x423/0x520 [ 464.325515] Call Trace: [ 464.325520] <TASK> [ 464.325523] ? _getuserpages+0x423/0x520 [ 464.325528] ? _warn+0x81/0x130 [ 464.325536] ? _getuserpages+0x423/0x520 [ 464.325541] ? reportbug+0x171/0x1a0 [ 464.325549] ? handlebug+0x3c/0x70 [ 464.325554] ? excinvalidop+0x17/0x70 [ 464.325558] ? asmexcinvalidop+0x1a/0x20 [ 464.325567] ? _getuserpages+0x423/0x520 [ 464.325575] _guplongtermlocked+0x212/0x7a0 [ 464.325583] internalgetuserpagesfast+0xfb/0x190 [ 464.325590] pinuserpagesfast+0x47/0x60 [ 464.325598] sevpinmemory+0xca/0x170 [kvmamd] [ 464.325616] sevmemencregisterregion+0x81/0x130 [kvm_amd]
Per the analysis done by yangge, when starting the SEV virtual machine, it will call pinuserpagesfast(..., FOLLLONGTERM, ...) to pin the memory. But the page is in CMA area, so fast GUP will fail then fallback to the slow path due to the longterm pinnalbe check in trygrabfolio().
The slow path will try to pin the pages then migrate them out of CMA area. But the slow path also uses trygrabfolio() to pin the page, it will also fail due to the same check then the above warning is triggered.
In addition, the trygrabfolio() is supposed to be used in fast path and it elevates folio refcount by using add ref unless zero. We are guaranteed to have at least one stable reference in slow path, so the simple atomic add could be used. The performance difference should be trivial, but the misuse may be confusing and misleading.
Redefined trygrabfolio() to trygrabfoliofast(), and trygrabpage() to trygrab_folio(), and use them in the proper paths. This solves both the abuse and the kernel warning.
The proper naming makes their usecase more clear and should prevent from abusing in the future.
peterx said:
: The user will see the pin fails, for gpu-slow it further triggers the WARN : right below that failure (as in the original report): : : folio = trygrabfolio(page, pageincrem - 1, : follflags); : if (WARNONONCE(!folio)) { <------------------------ here : /* : * Release the 1st page ref if the : * folio is problematic, fail hard. : */ : gupputfolio(pagefolio(page), 1, : follflags); : ret = -EFAULT; : goto out; : }
[1] https://lore.kernel.org/linux-mm/1719478388-31917-1-git-send-email-yangge1116@126.com/
[shy828301@gmail.com: fix implicit declaration of function trygrabfolio_fast] Link: https://lkml.kernel.org/r/CAHbLzkowMSso-4Nufc9hcMehQsK9PNz3OSu-+eniU-2Mm-xjhA@mail.gmail.com