How to Fix L1 Terminal Fault (L1TF) in VMware vSphere
As predicted by VMware’s CEO Pat Gelsinger, further vulnerabilities similar to Spectre and Meltdown have just been identified that expose VMware vSphere environments to similar and significant security threats. We included automated checks in Runecast Analyzer to detect Spectre and Meltdown vulnerabilities when it became public, and have now done the same for this threat.
The new vulnerability is named “L1 Terminal Fault” or L1TF. The most obvious difference from Spectre/Meltdown is that this vulnerability only affects Intel processors - so less environments are affected. However, it is important to recognise that there can be even more disruptive post-remediation consequences than was the case with Spectre/Meltdown remediation. An overview is provided below discussing the potential impact of the vulnerability, and the method to remediate.
Potential Impact from L1TF
The potential impact of the vulnerability is that a malicious code running on an Intel CPU could read data it is not entitled to from another process (either running at the same time (“concurrent-context attack vector”), or at a previous time (“sequential-context attack vector”)).
Within a virtual environment, VMs run as processes in ESXi and so a VM can read (or steal) data it is not entitled to access from another VM. This could be any data, so might include highly sensitive information such as passwords / credit card details, etc. As VMs may be owned by different groups, or organizations, the potential risk of breach, and resulting impact is extremely serious.
Potential Impact from Remediation
If you are using Intel processors in your environment, then your hypervisors will be exposed. The impact from performing remediation depends significantly on whether you have hyperthreading enabled for your hosts. This is because hosts with hyperthreading disabled only need protection against the “sequential-context attack vector”, whereas hosts with hyperthreading enabled also have to be protected against the “concurrent-context attack vector”.
Mitigation of the sequential-context attack vector has minimal impact for the performance of enterprise applications. Upgrading vCenter and ESXi using the latest patches will provide protection against the sequential-context attack vector. This provides hosts with hyperthreading disabled full protection. However, patching alone will not fully protect hosts with hyperthreading enabled.
When remediating hosts with hyperthreading enabled, it is necessary to enable the “ESXi Side-Channel-Aware Scheduler”. This scheduler will then schedule on only one Hyperthread of a Hyperthread-enabled core, with potentially significant consequences. If you decide to carry out this remediation then you need to be very careful to assess, plan and execute the process. VMware describe this process fully in KB 55806, and have built a custom (PowerCLI) tool (“HTAware Mitigation Tool”) to assess the environment for impact and carry out the important changes within your hosts.
If you do enable the side-channel-aware scheduler (either manually, or using the tool) then there will be significant compromises made afterwards, such as the following:
- A VM whose total count of vCPUs is greater than the number of physical cores on the running host will not power-on after the remediation has been carried out. The VM is considered too wide to run on the existing host and must be corrected by reconfiguring the VM to user fewer vCPUs or to move to a host with more cores prior to mitigation. Reconfiguration of the VM must be done while the VM is powered off.
- VMs pinned to specific PCPU on the host may not power-on after the remediation has been carried out. The PCPU numbers may need to be altered prior to enabling the HTAware Mitigation as the former physical designations used may no longer be valid. Reconfiguration of the VM must be done while the VM is powered off.
- After carrying out the remediation, the cluster may not have the spare capacity for rolling upgrades.
- A VM with configured with the high latency sensitivity setting cannot be honored once the remediation is carried out, if the number of high-latency sensitive vCPUs >= (number of cores– 1). This is due to the host having insufficient capacity to run its own jobs.
- Hosts that are using the second Hyper-Thread on a core to satisfy their load may not be able to satisfy the load without suffering throughput or response time degradation after the remediation has been carried out.
There are other considerations, and the full list of performance, capacity, availability, and operational compromises is listed in KB 55767. This must be read, understood in full and applied to your use case before deciding on performing remediation. The risk to security vs. risk to runtime operation from remediation needs to be balanced carefully.
Remediation Process
To protect against L1TF you first need to carry out a patch upgrade in the normal sequence: vCenter (and any external PSCs) will need to be patched first, and then the ESXi hosts will need to be updated. VMware describe this as the “Update Phase” in KB 55806, and it should be considered low-risk as this stage does not enable the disruptive Side-Channel-Aware Scheduler. As usual, It is very important to carry out the remediation in this order (vCenter first with any related PSCs, then ESXi hosts).
vCenter
To update the vCenter application you should use your normal procedure to install the latest patch available for the version you are running. VMware have released the following builds for the various supported versions; these directly address L1TF:
Following the successful upgrade of the vCenter application, it is also necessary to protect the operating system that vCenter is running on. This is because L1TF also creates vulnerabilities at the OS level.
If you are running vCenter on Windows, then the relevant Microsoft patches need to be applied. If you are running the VMware vCSA, then no patch is currently available from VMware (though these are pending). A workaround has been provided and should be carried out using the instructions in KB 52312 to remediate the vCSA OS.
Note: The vCSA v5.5 OS is not affected and so the workaround does not need to be carried out. However, all other supported versions of vCenter (v6.0 onwards) are affected (see VMSA-2018-0021).
Once vCenter has been successfully upgraded, and the OS protected, you will then need to patch the ESXi hosts using your normal patching procedure. The relevant patches are as follows:
N.B. It is important to note that the ESXi patches include the necessary microcode updates for the CPU so these do not need to be installed separately from the vendor.
Once the hosts have been upgraded, you will be protected against the sequential-context attack vector, and so all hosts with hyperthreading disabled will be fully protected.
If you have hosts with hyperthreading enabled, you will then be in position to continue with the “Planning Stage” to determine the impact of enabling the ESXi Side-Channel-Aware Schedule for these hosts. You can use the custom-built VMware tool (“HTAware Mitigation Tool”) to assist in this purpose. The tool is designed to help make informed decisions as to whether you should enter into the “Scheduler-Enablement Phase” after evaluating the risks associated with your use case. It is also possible to enable the scheduler for affected hosts using the tool.
The workflow for the “Planning Phases” and “Schedule-Enablement” phases following the update phases is shown below, from KB 55806:
We recommend following this KB content through the Planning and Scheduler-enablement phases, if it is required to do so in your use case. Full description of the “HTAware Mitigation Tool”, and its operation within these phases is available from KB 56931.
Runecast Analyzer will automatically detect for the presence of L1TF specifically within your environment and will identify your exposure to the sequential-context attack vector and concurrent-context attack vector. It will show you exactly which hosts are vulnerable and also give you the relevant KB information within the application. This vulnerability is one among many that have been reported in the VMware Knowledge Base, or in VMSAs and can be detected automatically by using the Runecast Analyzer.
Warren Legg
Senior Product Developer