In the Linux kernel, the following vulnerability has been resolved:
fs: Prevent file descriptor table allocations exceeding INT_MAX
When sysctlnropen is set to a very high value (for example, 1073741816 as set by systemd), processes attempting to use file descriptors near the limit can trigger massive memory allocation attempts that exceed INT_MAX, resulting in a WARNING in mm/slub.c:
WARNING: CPU: 0 PID: 44 at mm/slub.c:5027 _kvmallocnode_noprof+0x21a/0x288
This happens because kvmallocarray() and kvmalloc() check if the requested size exceeds INTMAX and emit a warning when the allocation is not flagged with _GFPNOWARN.
Specifically, when nropen is set to 1073741816 (0x3ffffff8) and a process calls dup2(oldfd, 1073741880), the kernel attempts to allocate: - File descriptor array: 1073741880 * 8 bytes = 8,589,935,040 bytes - Multiple bitmaps: ~400MB - Total allocation size: > 8GB (exceeding INTMAX = 2,147,483,647)
Reproducer: 1. Set /proc/sys/fs/nropen to 1073741816: # echo 1073741816 > /proc/sys/fs/nropen
Run a program that uses a high file descriptor:
int main() { struct rlimit rlim = {1073741824, 1073741824}; setrlimit(RLIMIT_NOFILE, &rlim); dup2(2, 1073741880); // Triggers the warning return 0; }
Observe WARNING in dmesg at mm/slub.c:5027
systemd commit a8b627a introduced automatic bumping of fs.nr_open to the maximum possible value. The rationale was that systems with memory control groups (memcg) no longer need separate file descriptor limits since memory is properly accounted. However, this change overlooked that:
systemd's algorithm starts with INTMAX and keeps halving the value until the kernel accepts it. On most systems, this results in nropen being set to 1073741816 (0x3ffffff8), which is just under 1GB of file descriptors.
While processes rarely use file descriptors near this limit in normal operation, certain selftests (like tools/testing/selftests/core/unshare_test.c) and programs that test file descriptor limits can trigger this issue.
Fix this by adding a check in allocfdtable() to ensure the requested allocation size does not exceed INTMAX. This causes the operation to fail with -EMFILE instead of triggering a kernel warning and avoids the impractical >8GB memory allocation request.