vhost-net: guest to host kernel escape during migration
A buffer overflow vulnerability was found in the networking virtualization functionality (vhost-net) that could be abused during live migration of virtual machines. A privileged guest user may pass descriptors with invalid length to the host when live migration is underway to crash the host kernel or, potentially, escalate their privileges on the host.
The warning in mem_cgroup_reparent_charges() was triggered too early and too often in certain cases.
kvm: potential system hang due to an error in mmu_shrink_scan().
nfs: NULL pointer dereference due to an anomalized NFS message sequence.
An attacker, who is able to mount an exported NFS filesystem, is able to trigger a null pointer dereference by using an invalid NFS sequence. This can panic the machine and deny access to the NFS server. Any outstanding disk writes to the NFS server will be lost.
fuse_kio_pcs: kernel crash in pcs_sockio_xmit().
Processes could get stuck in copy_net_ns() forever.
vziolimit: kernel crash due to a division by zero in throttle_charge().
mem_cgroup_reparent_charges() could get stuck while holding cgroup_mutex and make the whole system hang.
kvm: inefficient memory shrinking for VMs.
It was discovered that a node with dozens of CPU cores, lots of RAM and many VMs running could get into a situation when almost all CPU cores were busy in mmu_shrink_scan(). This could happen because memory shrinking was done under kvm_lock spinlock and only for one VM at a time. All CPU cores but one just waited for kvm_lock in such cases, while the last one was busy with the actual memory shrinking for a VM.
fuse_kio_pcs: latency was calculated incorrectly.
It was found that the in-kernel implementation of Virtuozzo Storage client stored latency values in milliseconds rather than in microseconds, resulting in bogus statistics data.
tcp: integer overflow while processing SACK blocks allows remote denial of service.
An integer overflow was found in the way the Linux kernel's networking subsystem processed TCP Selective Acknowledgment (SACK) segments. While processing SACK segments, the Linux kernel's socket buffer (SKB) data structure becomes fragmented. Each fragment is about TCP maximum segment size (MSS) bytes. To efficiently process SACK blocks, the Linux kernel merges multiple fragmented SKBs into one, potentially overflowing the variable holding the number of segments. A remote attacker could use this flaw to crash the Linux kernel by sending a crafted sequence of SACK segments on a TCP connection with small value of TCP MSS, resulting in a denial of service.
OOM killer would kill tasks from cgroups without memory guarantees first.
If the amount of free memory is low, OOM killer would kill the tasks from cgroups without memory guarantees first. However, it seems more reasonable to kill the tasks from cgroups exceeding their guarantees the most.
virtio_scsi: a race condition in the Linux block layer could cause certain I/O requests to hang.
ploop: kernel crash in ploop_congested().
ext4: inode tables created during online resize were not zeroed.
It was discovered that inode tables created during online resize of an ext4 filesystem were not zeroed after that. This could potentially result in lower performance of the filesystem.
Windows Server 2016 Essentials failed to install into a QEMU VM with disabled PMU.
It was found that if no PMU counters were exposed to guest, KVM skipped the whole remaining PMU-related initialization, including filling of LBR-related data. As it turned out, Windows Server 2016 Essentials tried to access these data during the installation and failed to install as a result.
ploop: 'pcompact' could hang if run simultaneously with 'ploop-balloon status'
Memory leak in the implementation of IPv4 routing.
It was discovered that a certain sequence of operations related to IPv4 routing could trigger a kernel memory leak. An attacker could potentially exploit that from a container to cause a denial of service.
KVM: potential use-after-free via kvm_ioctl_create_device().
A use-after-free vulnerability was found in the way KVM implements its device control API. When a device is created via kvm_ioctl_create_device(), it holds a reference to a VM object. This reference is transferred to file descriptor table of the caller. If such file descriptor was closed, reference count to the VM object could become zero, which could lead to a use-after-free issue. A user/process could use this flaw to crash the guest VM resulting in a denial of service or, potentially, gain privileged access to a system.
KVM: use-after-free in the emulation of the preemption timer for the L2 guest systems.
A use-after-free vulnerability was found in the way KVM emulates a preemption timer for L2 guests when nested virtualization is enabled. A guest user/process could use this flaw to crash the host kernel resulting in a denial of service or, potentially, gain privileged access to a system.
I/O errors were reported after a successful replacement of the ploop images.
'ploop replace' did not clear 'abort' flag.
It was found that if a ploop image was revoked and then replaced using 'ploop replace', 'abort' flag was not cleared. As a result, subsequent I/O operations would fail.
ploop: potential data corruption due to a race between 'prepare_merge' and 'submit_alloc' operations.
vzstat shows incorrect per-CT scheduling latency (MLAT).
High order page allocations were made in neigh_probe() in certain cases.
High order page allocations were triggered by CRIU while restoring TCP sockets.
Network performance issues due to the usage of pfmemalloc reserves.
It was discovered that network drivers could allocate memory for the socket buffers from pfmemalloc memory reserves, even when it was unnecessary. As a result, the network packets were dropped by sk_filter_trim_cap() causing performance issues.
fuse_kio_pcs: kernel crash in process_pcs_init_reply() caused by a double free.
skb drops due to the usage of pfmemalloc reserves were difficult to debug.
Additional diagnostics was introduced to make it easier to detect and analyze skb drops due to the usage of pfmemalloc reserves.
KVM did not update CPUID bits OSXSAVE and OSPKE in some cases.
It was discovered that CPUID bits OSXSAVE and OSPKE were not updated properly by KVM when the guest system rebooted. As a result, the guest system could crash.
The per-container limit on the network interfaces was too low for Docker in some cases.
It was discovered that Docker running inside a Virtuozzo container could hit the limit on the network interfaces (256) when it tried to start 50+ its containers. This fix allows changing that limit for the running containers and increases the default limit to 1024.
txqueuelen could not be changed via SIOCSIFTXQLEN ioctl on the host.
Kernel crash in ext4_clear_inode().
A large tarball with a lot of small files can fail to unpack inside a container if kmem limit is set.
It was found that unpacking a large tarball with a lot of small files could fail inside a container. This could happen because kmem limit was hit prematurely, while reclaimable memory was still available.
sr_mod: kernel crash in sr_block_revalidate_disk().
overlayfs: kernel crash in may_open().
CVE-2019-10140. An attacker with local access can create a denial of service situation via NULL pointer dereference in ovl_posix_acl_create(). The ovl_create() function can return a positive number leading to a null pointer derference of path in may_open(). This can allow attackers with ability to create directories on overlayfs to crash the kernel creating a Denial Of Service (DOS).
Links to certain files in /proc/ inside containers were not validated.
It was discovered that a malicious user inside a Virtuozzo container could potentially overwrite "vzctl" binary on the host. The attacker could replace executables in that container with symlinks to /proc/self/exe. After that, "vzctl exec" called from the host to run one of such executables would try to run the host's "vzctl" there instead. If the attacker managed to intercept that, they would be able to change the contents of the host's "vzctl" binary. The issue is similar to CVE-2019-5736, but affects "vzctl" rather than "runc".
Kernel crash (BUG_ON) ploop_relocblks_ioc().
/proc/sys/net/core/somaxconn was not available in the containers.
userfaultfd bypasses tmpfs file permissions.
A flaw was found in the implementation of userfaultfd. An attacker is able to bypass file permissions on filesystems mounted with tmpfs/hugetlbs to modify a file and possibly disrupt normal system behaviour. At this time there is an understanding there is no crash or priviledge escalation but the impact of modifications on these filesystems of files in production systems may have adverse affects.
ipvs: an unneeded debug message is output when a network namespace is initialized.
Debug message 'IPVS: Creating netns size=... id=...' could be output many times to the system log when the network namespaces are initialized, making the log less readable.
'perf record -a' causes segfaults in applications executing vsyscalls.
Some operations with ebtables could consume large amounts of memory, resulting in DoS.
A flaw was found in the implementation of ebtables in the Linux kernel. A local attacker in a container could exploit it to consume large amounts of memory, eventually causing denial of service on the host.
Kernel crash (access out of bounds) in SyS_mincore().
(enhancement) FUSE: backported immediate-write support for the fast path.