Broadcom announced VMware Cloud Foundation 9.1 last week, and there is quite a bit to unpack. If you are already running VCF 9.0, you will recognise a lot of the direction here. This release builds on that foundation and has focus on efficiency, security and AI workload support. I’ve tried to cover some key features in this blog.

The Context

Infrastructure teams are being asked to do more with the same budget. AI workloads are landing on-premises, developers want self-service provisioning, and at the same time compliance and security requirements are getting stricter. VCF 9.1 is clearly built with all of this in mind. It is not a revolution, but a focused set of improvements and can be seen as evolution of VCF 9.0

Infrastructure Efficiency

The efficiency of an IT environment has always been a hot topic, but with the recent prices it is more relevant. I always keep an eye on is what can be done to squeeze more out of existing hardware, and 9.1 has some interesting additions here. Enhanced NVMe memory tiering is getting a meaningful upgrade. The idea is that hot data stays in DRAM while colder pages get offloaded to NVMe, effectively expanding usable memory capacity without buying more RAM. Broadcom claims this can reduce total cost of ownership by up to 40 percent in the right workloads. That is a bold number, but even a fraction of that is worth exploring if you are running memory-heavy environments.

vSAN global deduplication and compression improvements are also part of the package. These run continuously in the background and now also support encrypted environments, which removes a previous limitation that was frustrating for security-conscious admins.

Kubernetes and Modern App Delivery

VKS (vSphere Kubernetes Service) gets a significant scale bump in this release. You can now run up to 500 Kubernetes clusters per Supervisor. For platform teams supporting many development squads from a single management backbone this removes a ceiling that was getting in the way.

There is also a new lightweight Kubernetes environment aimed at test and dev, so you no longer need to dedicate a full cluster to spinning up ephemeral environments. That is a sensible addition; dedicated dev clusters have always felt like overkill for what is essentially throwaway infrastructure.

VKS and VM Fast Deploy

This is one of the features I was surprised to see. VCF 9.1 introduces Fast Deploy for both VKS clusters and VMs. VKS cluster provisioning drops from around 37 minutes to 11 minutes (!) a roughly 70 percent improvement. Cluster upgrades improve even more dramatically, from nearly 7 hours down to about 1.7 hours. That is over 5 hours saved per upgrade cycle in a busy environment.

On the VM side, a new Direct-Mode provisioning approach accelerates deployment based on image size and concurrent operations, while maintaining full disk integrity from the start. For application teams that spin up dev or test environments on demand, or need to scale quickly to handle traffic spikes, this changes the calculus on what "fast enough" actually looks like.

AI Workload Support

It is clear that AI is a primary driver for this release. A few things stand out:

  • AMD Instinct MI350 GPU support is now included. AMD claims a 4x generational increase in AI compute over previous AMD GPUs. VCF now supports these natively, which gives organisations an alternative to NVIDIA for on-premises AI inference workloads.
  • AI-aware vMotion — workloads running on GPUs can now live-migrate between GPU hosts with zero downtime. Anyone who has tried to maintain GPU workload availability during hardware maintenance will appreciate this.
  • Real-time observability for AI lets you monitor token consumption and track active AI agents and the models they are using. Useful for capacity planning and chargebacks once AI becomes a real cost centre.

For AI development, there is also native support for PyTorch and vLLM, standardised architecture via OPEA, and integration with Hugging Face giving access to over 1.8 million open-source models out of the box.

Security and Resilience

This is the area where 9.1 brings some genuinely useful improvements.

vSphere Live Patching — Now with TPM Support

Live patching for ESX hosts has been around since vSphere 8 Update 3, and VCF 9.0 expanded it quite a bit. One limitation that stuck around though was TPM-enabled hosts — they were excluded from the live patching workflow entirely. In 9.1 that gap is closed. TPM-enabled hosts can now participate in live patching, and it is enabled by default on those hosts. This matters more than it might seem. Nearly 90 percent of new server hardware ships with TPM enabled, so the previous limitation was increasingly blocking live patching in practice. With 9.1 you can apply security patches without evacuating workloads or putting hosts into full maintenance mode across the vast majority of your fleet. For something like an AI inference cluster where availability is critical, this is a genuine operational improvement.

vCenter Quick Patch

This one is straightforward but very welcome. Patching vCenter has traditionally meant a maintenance window — roughly 20 minutes of downtime and up to 40 minutes of total operation time when you account for everything around it. With Quick Patch in 9.1, that shrinks to approximately 5 minutes with no workload disruption. The way it works is that vCenter services are classified by their impact level, and updates are applied in a targeted way rather than taking the whole appliance offline. Broadcom quotes an overall operation time reduction of around 80 percent. In environments where vCenter patching has been deferred because the window just never felt worth it, this removes that excuse entirely, which ultimately means better compliance posture and fewer unpatched management components sitting around longer than they should.

On-Premises Ransomware Recovery

VCF 9.1 introduces a sovereign ransomware recovery path that stays entirely within the platform — no dependency on external services or cloud-based recovery orchestration. Combined with vSAN for Recovery, the idea is that you can go from a destructive attack to a recovered state without relying on anything outside your own infrastructure. For organisations with strict data sovereignty requirements or limited external connectivity, this is a meaningful addition.

Continuous Compliance Enforcement

Rather than running compliance scans periodically or scrambling before an audit, VCF 9.1 makes compliance a continuous runtime condition. Configuration drift gets caught and addressed in real time rather than discovered weeks later. If you work in a regulated environment where a misconfiguration sitting undetected for 30 days is a real risk, this changes the conversation around audit readiness considerably.

CrowdStrike EDR Integration

CrowdStrike is now integrated into the ransomware recovery workflow for endpoint detection. If you are already running CrowdStrike in your environment this is a natural fit — it means your existing EDR investment participates in the recovery loop rather than sitting alongside it as a separate track.

Networking

Networking is an area that got a lot of attention in 9.1, and it is worth covering in some detail because there are a few genuinely interesting changes here, not just incremental improvements.

EVPN-VXLAN Fabric Interoperability

This is probably the headline networking feature. VCF 9.1 introduces EVPN-VXLAN based interoperability with the physical network fabric, supporting Arista UCN, Cisco Nexus ONE and SONiC. What this means in practice is a consistent overlay protocol running end-to-end; from ESX host all the way down to the physical switching layer, rather than NSX doing its own thing on top of whatever the network team has built underneath. The operational implications are worth thinking through. For network teams, it removes the boundary between what NSX manages and what the physical fabric manages, which has historically been a source of finger-pointing when things go wrong. For virtualisation admins, it means VPCs and workloads connect to pre-configured external connections without needing to understand the physical fabric topology. There is also a cost angle here: distributed connectivity from ESX hosts directly to the physical fabric eliminates the need for dedicated NSX edge nodes in some topologies, which is a real capital saving in larger environments where you might be running multiple edge clusters today.

VPC and Transit Gateway Improvements

Transit Gateway gets more flexible in 9.1. You can now have multiple external connections and multiple Transit Gateways per tenant, with distributed VLAN connections, isolated VPN, static routes and custom NAT configurations all available. This covers multi-site routing scenarios that previously required external routing equipment or creative workarounds. VPC scope can now also be defined directly from vCenter or NSX: you can pin a VPC to specific clusters or let it span all vCenters in the networking domain, depending on what you need. That is a useful administrative control that was missing before. The Distributed Transit Gateway (DTGW) model also gets expanded service capabilities in 9.1. Previously it was limited to basic DHCP and external NAT , now stateful services including various NAT options and load balancing are available through a Virtual Network Appliance cluster, without adding complexity to the distributed architecture.

VPC Policy-Based Connectivity

For environments with strict isolation requirements, VCF 9.1 introduces VPC Communities. A way to define which VPCs can talk to each other without manually configuring firewall rules everywhere. There are three flavours: regular communities where VPCs within the group can communicate, isolated communities for sensitive workloads that should have no lateral communication, and shared communities for services like DNS that need to be reachable from everywhere. This is available through the Advanced Cyber Compliance add-on.

IPAM and Infoblox Integration

IP address management gets a native integration with Infoblox in 9.1. If you are already running Infoblox as your IPAM and DNS source of truth, VCF can now hook into it directly, consuming network containers for subnet creation and keeping IP and FQDN information in sync automatically. This prevents the all-too-common situation of VCF and Infoblox drifting out of sync and causing conflicts that are a pain to track down. The built-in IPAM capabilities also improve: a single IP block can now support up to 10 CIDRs and 10 IP ranges, with the ability to exclude specific IPs, which gives more flexibility when fitting VCF into an existing addressing scheme.

High-Performance NICs: EDPIO and UPT for NVIDIA Adapters

For AI and high-performance networking workloads, VCF 9.1 adds proper support for NVIDIA ConnectX and BlueField adapters. Enhanced DirectPath I/O (EDPIO) for ConnectX-6 DX, ConnectX-7 and BlueField-3 provides near-bare-metal NIC performance with GPUDirect RDMA support, which is useful for distributed AI training jobs that need fast GPU-to-GPU communication across hosts. Uniform Passthrough (UPT) covers the non-AI high-performance workload cases, leveraging VMXNET3 hardware emulation without requiring custom guest OS drivers. Both retain compatibility with vMotion, HA and DRS, which has historically been the sticking point with passthrough networking.

Operations at Scale

vSphere Elastic Provisioning

Adding hosts to an existing cluster has never been complicated in principle, but it has always required someone to be present for it; watching the process, confirming steps, dealing with anything that does not go quite right. Elastic Provisioning in 9.1 makes this genuinely zero-touch. Combined with expanded fleet size and upgrade scale, a single VCF instance can now govern significantly larger environments, and growth events no longer need to be scheduled around someone's availability. For organisations that are scaling fast or running lean ops teams, that matters.

Real-Time Operational Observability

The five-minute polling interval in traditional vSphere monitoring has always been a blind spot. A spike that lasts 90 seconds, causes a problem, and then disappears looks like nothing in a graph built on five-minute averages. VCF 9.1 replaces that model with high-velocity data streams that capture what is actually happening in real time. If you have ever spent time in a war room trying to reproduce an intermittent performance issue that the monitoring says never happened, this is the fix for that. It does not change what is going wrong, but it means you will actually be able to see it when it does.

Final Thoughts

VCF 9.1 is a solid step forward, particularly if efficiency, networking and security are high on your agenda. A few things stand out as worth your attention: Operational efficiency wins:

  • VKS Fast Deploy: provisioning times dropping from 37 to 11 minutes will change how platform teams actually work day to day
  • vCenter Quick Patch: an 80% reduction in patch operation time removes the main excuse for deferring vCenter updates
  • Live Patching for TPM-enabled hosts: finally viable for the 90% of modern hardware that ships with TPM, making zero-downtime patching practical at scale

Networking maturity:

  • EVPN-VXLAN fabric interoperability is the headline here, it removes a long-standing friction point between virtualisation and network teams
  • The potential to eliminate dedicated NSX edge nodes in certain topologies is a real capital saving
  • Infoblox integration means IPAM actually stays in sync rather than drifting into conflict hell

AI and performance:

  • AMD Instinct MI350 GPU support provides an alternative to NVIDIA for on-prem inference
  • EDPIO and UPT for NVIDIA ConnectX/BlueField adapters give near-bare-metal network performance while keeping vMotion/HA/DRS compatibility
  • AI-aware vMotion means GPU workloads can finally live-migrate without downtime

If you are still on VCF 5.x and weighing up the move to 9, this release strengthens that case. And if you are already on 9.0, this looks like a worthwhile upgrade.

Previous Post