vhost-net: guest to host kernel escape during migration
A buffer overflow vulnerability was found in the networking virtualization functionality (vhost-net) that could be abused during live migration of virtual machines. A privileged guest user may pass descriptors with invalid length to the host when live migration is underway to crash the host kernel or, potentially, escalate their privileges on the host.
The warning in mem_cgroup_reparent_charges() was triggered too early and too often in certain cases.
kvm: potential system hang due to an error in mmu_shrink_scan().
nfs: NULL pointer dereference due to an anomalized NFS message sequence.
An attacker, who is able to mount an exported NFS filesystem, is able to trigger a null pointer dereference by using an invalid NFS sequence. This can panic the machine and deny access to the NFS server. Any outstanding disk writes to the NFS server will be lost.
fuse_kio_pcs: kernel crash in pcs_sockio_xmit().
Processes could get stuck in copy_net_ns() forever.
vziolimit: kernel crash due to a division by zero in throttle_charge().
mem_cgroup_reparent_charges() could get stuck while holding cgroup_mutex and make the whole system hang.
kvm: inefficient memory shrinking for VMs.
It was discovered that a node with dozens of CPU cores, lots of RAM and many VMs running could get into a situation when almost all CPU cores were busy in mmu_shrink_scan(). This could happen because memory shrinking was done under kvm_lock spinlock and only for one VM at a time. All CPU cores but one just waited for kvm_lock in such cases, while the last one was busy with the actual memory shrinking for a VM.
fuse_kio_pcs: latency was calculated incorrectly.
It was found that the in-kernel implementation of Virtuozzo Storage client stored latency values in milliseconds rather than in microseconds, resulting in bogus statistics data.
tcp: integer overflow while processing SACK blocks allows remote denial of service.
An integer overflow was found in the way the Linux kernel's networking subsystem processed TCP Selective Acknowledgment (SACK) segments. While processing SACK segments, the Linux kernel's socket buffer (SKB) data structure becomes fragmented. Each fragment is about TCP maximum segment size (MSS) bytes. To efficiently process SACK blocks, the Linux kernel merges multiple fragmented SKBs into one, potentially overflowing the variable holding the number of segments. A remote attacker could use this flaw to crash the Linux kernel by sending a crafted sequence of SACK segments on a TCP connection with small value of TCP MSS, resulting in a denial of service.
Processes could hang while closing a file located on the storage cluster.
OOM killer would kill tasks from cgroups without memory guarantees first.
If the amount of free memory is low, OOM killer would kill the tasks from cgroups without memory guarantees first. However, it seems more reasonable to kill the tasks from cgroups exceeding their guarantees the most.
virtio_scsi: a race condition in the Linux block layer could cause certain I/O requests to hang.
ploop: kernel crash in ploop_congested().
ext4: inode tables created during online resize were not zeroed.
It was discovered that inode tables created during online resize of an ext4 filesystem were not zeroed after that. This could potentially result in lower performance of the filesystem.
Windows Server 2016 Essentials failed to install into a QEMU VM with disabled PMU.
It was found that if no PMU counters were exposed to guest, KVM skipped the whole remaining PMU-related initialization, including filling of LBR-related data. As it turned out, Windows Server 2016 Essentials tried to access these data during the installation and failed to install as a result.
ploop: 'pcompact' could hang if run simultaneously with 'ploop-balloon status'
Memory leak in the implementation of IPv4 routing.
It was discovered that a certain sequence of operations related to IPv4 routing could trigger a kernel memory leak. An attacker could potentially exploit that from a container to cause a denial of service.
KVM: potential use-after-free via kvm_ioctl_create_device().
A use-after-free vulnerability was found in the way KVM implements its device control API. When a device is created via kvm_ioctl_create_device(), it holds a reference to a VM object. This reference is transferred to file descriptor table of the caller. If such file descriptor was closed, reference count to the VM object could become zero, which could lead to a use-after-free issue. A user/process could use this flaw to crash the guest VM resulting in a denial of service or, potentially, gain privileged access to a system.
KVM: use-after-free in the emulation of the preemption timer for the L2 guest systems.
A use-after-free vulnerability was found in the way KVM emulates a preemption timer for L2 guests when nested virtualization is enabled. A guest user/process could use this flaw to crash the host kernel resulting in a denial of service or, potentially, gain privileged access to a system.
I/O errors were reported after a successful replacement of the ploop images.
'ploop replace' did not clear 'abort' flag.
It was found that if a ploop image was revoked and then replaced using 'ploop replace', 'abort' flag was not cleared. As a result, subsequent I/O operations would fail.
ploop: potential data corruption due to a race between 'prepare_merge' and 'submit_alloc' operations.
vzstat shows incorrect per-CT scheduling latency (MLAT).
High order page allocations were made in neigh_probe() in certain cases.
High order page allocations were triggered by CRIU while restoring TCP sockets.
Network performance issues due to the usage of pfmemalloc reserves.
It was discovered that network drivers could allocate memory for the socket buffers from pfmemalloc memory reserves, even when it was unnecessary. As a result, the network packets were dropped by sk_filter_trim_cap() causing performance issues.
fuse_kio_pcs: kernel crash in process_pcs_init_reply() caused by a double free.
fuse_kio_pcs: kernel crash in kpcs_kill_requests().
skb drops due to the usage of pfmemalloc reserves were difficult to debug.
Additional diagnostics was introduced to make it easier to detect and analyze skb drops due to the usage of pfmemalloc reserves.
KVM did not update CPUID bits OSXSAVE and OSPKE in some cases.
It was discovered that CPUID bits OSXSAVE and OSPKE were not updated properly by KVM when the guest system rebooted. As a result, the guest system could crash.
The per-container limit on the network interfaces was too low for Docker in some cases.
It was discovered that Docker running inside a Virtuozzo container could hit the limit on the network interfaces (256) when it tried to start 50+ its containers. This fix allows changing that limit for the running containers and increases the default limit to 1024.
txqueuelen could not be changed via SIOCSIFTXQLEN ioctl on the host.
Kernel crash in ext4_clear_inode().
A large tarball with a lot of small files can fail to unpack inside a container if kmem limit is set.
It was found that unpacking a large tarball with a lot of small files could fail inside a container. This could happen because kmem limit was hit prematurely, while reclaimable memory was still available.
sr_mod: kernel crash in sr_block_revalidate_disk().