Stretch Management Domain in VCF 5.1.1
In this article, we’ll guide you through the process of stretching a VMware Cloud Foundation (VCF) management domain across two availability zones, enhancing your infrastructure’s resilience and ensuring continuous availability. By the end of this tutorial, you’ll have a solid understanding of how to configure and manage a stretched VCF deployment to protect your critical workloads.
Table of Contents
Prerequisites
The requirements for stretching a management domain in VCF are documented on the VMware by Broadcom website. In our environment, we deviate from the VCF blueprint. We have two dedicated Edge nodes in AZ1 and two in AZ2. You can see the subnet requirements table we are using below.
Function | AZ1 | AZ2 | HA Layer 3 Gateway |
VM management VLAN | ✓ | ✓ | ✓ |
ESXi Management VLAN (AZ1) | ✓ | X | ✓ |
vMotion VLAN (AZ1) | ✓ | X | ✓ |
vSAN VLAN (AZ1) | ✓ | X | ✓ |
NSX Host Overlay (AZ1) | ✓ | X | ✓ |
NSX Edge Uplink01 (AZ1) | ✓ | X | X |
NSX Edge Uplink02 (AZ1) | ✓ | X | X |
NSX Edge Overlay (AZ1) | ✓ | X | ✓ |
ESXi Management VLAN (AZ2) | X | ✓ | ✓ |
vMotion VLAN (AZ2) | X | ✓ | ✓ |
vSAN VLAN (AZ2) | X | ✓ | ✓ |
NSX Host Overlay (AZ2) | X | ✓ | ✓ |
NSX Edge Uplink01 (AZ2) | X | ✓ | X |
NSX Edge Uplink02 (AZ2) | X | ✓ | X |
NSX Edge Overlay (AZ2) | X | ✓ | ✓ |
Network Pool
By default, you will have only one network pool after deploying VCF with the VMware Cloud Builder. One of the first tasks is to commission the hosts in AZ2 within the SDDC Manager. To do this, you need to have a second network pool that will contain AZ2-specific networks for vMotion and vSAN for the hosts in AZ2.
Commision hosts AZ2 hosts
Once the AZ2-specific network pool has been created, you can commission the hosts in the SDDC Manager. Click on the “Commission Hosts” button to start the wizard.
Make sure you perform all the necessary checks from the checklist and click the “Proceed” button.
Add all AZ2 hosts and perform the validation. Once the validation is complete, click the “Next” button.
Click the ‘Commission‘ button to initiate the host commissioning task in the SDDC Manager.
Once the commission task has been completed, you will see the AZ2 hosts in the inventory as Unassigned. These ESXi hosts can eventually be used to stretch the management domain.
vSAN witness node
Make sure that you have already deployed and configured the vSAN Witness node according to the VMware by Broadcom documentation, and register the vSAN Witness in the management domain as shown below.
Stretching the Management Domain
Retrieving IDs of Unassigned ESXi Hosts
With the prerequisites in place, we are now ready to communicate with the API of the SDDC Manager. Let’s start by collecting the ID numbers of the Unassigned ESXi hosts that we commissioned earlier.
Click on Developer Center, then API Explorer, and select Hosts. Choose GET from the options. Enter “UNASSIGNED_USEABLE” as the value for the status parameter and click Execute.
You will eventually receive a response with all the ESXi hosts that have the status UNASSIGNED_USEABLE. Note down all the IDs of these ESXi hosts, as we will use them in a later step.
Retrieving ID of cluster
We now need to retrieve the ID of the management domain cluster that we want to stretch between availability zones.
Click on Developer Center, then API Explorer, and select Clusters. Choose GET from the options and click Execute.
You will eventually receive a response with the details of the management domain cluster. Note down the ID of the cluster, as we will use it in a later step.
Creating the JSON specification
Based on the information gathered in the previous steps, we are now ready to create a JSON specification to initiate the stretch management domain task. Since the VCF management domain has been deployed with the vSphere Distributed Switch profile “Profile-1,” there is no need to configure additional uplinks in the JSON specification.
In the hostSpecs, enter the following details for each host:
Key | Value |
hostname | FQDN of the ESXi Host |
networkProfileName | The network profile name we have created earlier |
vdsName | The vSphere distibuted switch name that is in use by AZ1 |
id | The ID of the host we have gathered earlier |
licenseKey | The ESXi license that need to be applied on the ESXi host |
In the networkSpec section, under networkProfiles, enter the following details:
Key | Value |
name | The network profile name we have created earlier |
ipAddressPoolName | The NSX IP Pool that will be used (will be created in the next specification section) |
uplinkProfileName | The NSX uplink Profile in that will be used (will be created in the next specification section) |
vdsName | The vSphere distibuted switch name that is in use by AZ1 |
In the networkSpec section, under nsxClusterSpec, enter the following details:
Key | Value |
description | Description that will be used for the IP Pool in NSX |
name | A name that will be used for the IP Pool in NSX |
uplinkProfileName | A name that will be used for the uplink Profile in NSX |
vdsName | The vSphere distibuted switch name that is in use by AZ1 |
In the networkSpec section, under uplinkProfiles, provide the following details:
Key | Value |
name | A name that will be used for the uplink profile |
transportVlan | The VLAN for the host TEP communication |
In the witnessSpec, enter the following details:
Key | Value |
fqdn | The FQDN of the vSAN witness node |
vsanCidr | The CIDR that is used by the vSAN witness node |
vsanIp | The IP address of the vSAN witness node |
{ "clusterStretchSpec": { "hostSpecs": [ { "hostname": "m01-esx20001.domain.internal", "hostNetworkSpec": { "networkProfileName": "m01-np02", "vmNics": [ { "id": "vmnic0", "uplink": "uplink1", "vdsName": "m01-cl01-vds01" }, { "id": "vmnic1", "uplink": "uplink2", "vdsName": "m01-cl01-vds01" } ] }, "id": "1e826b08-52c5-4e95-85c9-b205c777f55a", "licenseKey": "XXXXX-XXXXX-XXXXX-XXXXX-XXXXX" }, { "hostname": "m01-esx20002.domain.internal", "hostNetworkSpec": { "networkProfileName": "m01-np02", "vmNics": [ { "id": "vmnic0", "uplink": "uplink1", "vdsName": "m01-cl01-vds01" }, { "id": "vmnic1", "uplink": "uplink2", "vdsName": "m01-cl01-vds01" } ] }, "id": "c59ec647-90fa-4537-a330-0debc63dc26b", "licenseKey": "XXXXX-XXXXX-XXXXX-XXXXX-XXXXX" }, { "hostname": "m01-esx20003.domain.internal", "hostNetworkSpec": { "networkProfileName": "m01-np02", "vmNics": [ { "id": "vmnic0", "uplink": "uplink1", "vdsName": "m01-cl01-vds01" }, { "id": "vmnic1", "uplink": "uplink2", "vdsName": "m01-cl01-vds01" } ] }, "id": "b1e3481f-fd65-4db9-8c8a-2e2138d017a8", "licenseKey": "XXXXX-XXXXX-XXXXX-XXXXX-XXXXX" }, { "hostname": "m01-esx20004.domain.internal", "hostNetworkSpec": { "networkProfileName": "m01-np02", "vmNics": [ { "id": "vmnic0", "uplink": "uplink1", "vdsName": "m01-cl01-vds01" }, { "id": "vmnic1", "uplink": "uplink2", "vdsName": "m01-cl01-vds01" } ] }, "id": "ae56f9b0-ddee-4f28-8389-f0ec82ea2055", "licenseKey": "XXXXX-XXXXX-XXXXX-XXXXX-XXXXX" } ], "isEdgeClusterConfiguredForMultiAZ": false, "networkSpec": { "networkProfiles": [ { "isDefault": false, "name": "m01-np02", "nsxtHostSwitchConfigs": [ { "ipAddressPoolName": "m01-cl01-az2-tep01", "uplinkProfileName": "m01-cl01-vds01-az2-hostswitch", "vdsName": "m01-cl01-vds01", "vdsUplinkToNsxUplink": [ { "nsxUplinkName": "uplink-1", "vdsUplinkName": "uplink1" }, { "nsxUplinkName": "uplink-2", "vdsUplinkName": "uplink2" } ] } ] } ], "nsxClusterSpec": { "ipAddressPoolsSpec": [ { "description": "AZ2 ESXi Host Overlay TEP IP Pool", "name": "m01-cl01-az2-tep01", "subnets": [ { "cidr": "xx.xx.57.0/24", "gateway": "xx.xx.57.1", "ipAddressPoolRanges": [ { "end": "xx.xx.57.60", "start": "xx.xx.57.20" } ] } ] } ], "uplinkProfiles": [ { "name": "m01-cl01-vds01-az2-hostswitch", "teamings": [ { "activeUplinks": [ "uplink-1", "uplink-2" ], "name": "DEFAULT", "policy": "LOADBALANCE_SRCID", "standByUplinks": [] } ], "transportVlan": 1108 } ] } }, "witnessSpec": { "fqdn": "w01-vsw30001.domain.internal", "vsanCidr": "xx.xx.80.0/24", "vsanIp": "xx.xx.80.22" }, "witnessTrafficSharedWithVsanTraffic": true } }
Perform validation of the JSON specification
We now have the JSON specification ready. Before applying the cluster stretch task, we need to validate the JSON specifications.
To do this, navigate to the Developer Center, then to API Explorer, and select Clusters. Choose the POST /v1/clusters/{id}/validations option. Enter the cluster ID we gathered earlier, paste the JSON specification into the clusterUpdateSpec field, and click Execute.
If the input is valid, you should receive a response with resultStatus set to “SUCCEEDED.” Once you see this success message, you can proceed with creating the Cluster Stretch task.
Update cluster by Stretching a standard vSAN cluster
After a successful validation of the JSON specification, you are ready to update the cluster for stretching.
To do this, navigate to the Developer Center, then to API Explorer, and select Clusters. Choose the PATCH /v1/clusters/{id} option. Enter the cluster ID we gathered earlier, paste the JSON specification into the clusterUpdateSpec field, and click Execute.
In the response, you should see a task created for stretching the management domain vSAN cluster. This task will also appear in the SDDC Manager’s tasks view at the bottom of the screen.
Issue during stretch task
One issue we encountered was “Configure Static Routes On ESXi Hosts Along With Witness Validation.” We were unsure of the cause of this error, so we began by investigating the log files.
In the log file /var/log/vmware/vcf/domainmanager/domainmanager.log
, we found the following error message: “Could not connect to the SSH server @ esxi-fqdn for configuration.” This led us to believe that SSH is disabled on the ESXi hosts.
Error: 2024-07-11T08:53:37.320+0000 DEBUG [vcf_dm,668f9d8c598a98687e2d1a17ee230635,8c2c] [c.v.v.c.f.p.a.i.ConfigureStaticRouteOnHostsWithWitnessValidationAction,dm-exec-5] Trying to ping xx.xx.80.22 from m01-esx20003.domain.internal 2024-07-11T08:53:37.330+0000 DEBUG [vcf_dm,668f9d8c598a98687e2d1a17ee230635,8c2c] [c.v.v.s.c.s.SecurityConfigurationServiceImpl,dm-exec-5] Security config retrieved {"fipsMode":false} 2024-07-11T08:53:37.332+0000 ERROR [vcf_dm,668f9d8c598a98687e2d1a17ee230635,8c2c] [c.v.evo.sddc.common.util.SshUtil,dm-exec-5] Unable to create jsch CLI session: com.jcraft.jsch.JSchException: java.net.ConnectException: Connection refused at com.jcraft.jsch.Util.createSocket(Util.java:394) at com.jcraft.jsch.Session.connect(Session.java:215) at com.vmware.evo.sddc.common.util.SshUtil.getSession(SshUtil.java:678) at com.vmware.evo.sddc.common.util.SshUtil.getSession(SshUtil.java:626) at com.vmware.evo.sddc.common.util.command.SshCommandExecuter.<init>(SshCommandExecuter.java:46) at com.vmware.evo.sddc.common.util.command.SshCommandExecuterFactory.createSshCommandExecuter(SshCommandExecuterFactory.java:71) at com.vmware.evo.sddc.common.util.command.SshCommandExecuterFactory.createSshCommandExecuter(SshCommandExecuterFactory.java:42) at jdk.internal.reflect.GeneratedMethodAccessor2203.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:568) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:343) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:196) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163) at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:750) at org.springframework.validation.beanvalidation.MethodValidationInterceptor.invoke(MethodValidationInterceptor.java:141) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:184) at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:750) at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:702) at com.vmware.evo.sddc.common.util.command.SshCommandExecuterFactory$$SpringCGLIB$$0.createSshCommandExecuter(<generated>) at com.vmware.vcf.common.fsm.plugins.action.impl.ConfigureStaticRouteOnHostsWithWitnessValidationAction.witnessConnectivityCheck(ConfigureStaticRouteOnHostsWithWitnessValidationAction.java:78) at com.vmware.vcf.common.fsm.plugins.action.impl.ConfigureStaticRouteOnHostsWithWitnessValidationAction.postValidate(ConfigureStaticRouteOnHostsWithWitnessValidationAction.java:66) at com.vmware.vcf.common.fsm.plugins.action.impl.ConfigureStaticRouteOnHostsWithWitnessValidationAction.postValidate(ConfigureStaticRouteOnHostsWithWitnessValidationAction.java:29) at com.vmware.evo.sddc.orchestrator.platform.action.FsmActionState.lambda$static$1(FsmActionState.java:23) at com.vmware.evo.sddc.orchestrator.platform.action.FsmActionState.invoke(FsmActionState.java:62) at com.vmware.evo.sddc.orchestrator.platform.action.FsmActionPlugin.invoke(FsmActionPlugin.java:159) at com.vmware.evo.sddc.orchestrator.platform.action.FsmActionPlugin.invoke(FsmActionPlugin.java:144) at com.vmware.evo.sddc.orchestrator.core.ProcessingTaskSubscriber.invokeMethod(ProcessingTaskSubscriber.java:400) at com.vmware.evo.sddc.orchestrator.core.ProcessingTaskSubscriber.processTask(ProcessingTaskSubscriber.java:561) at com.vmware.evo.sddc.orchestrator.core.ProcessingTaskSubscriber.accept(ProcessingTaskSubscriber.java:124) at jdk.internal.reflect.GeneratedMethodAccessor887.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:568) at com.google.common.eventbus.Subscriber.invokeSubscriberMethod(Subscriber.java:85) at com.google.common.eventbus.Subscriber.lambda$dispatchEvent$0(Subscriber.java:71) at com.vmware.vcf.common.tracing.TraceRunnable.run(TraceRunnable.java:59) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at java.base/java.lang.Thread.run(Thread.java:833) Caused by: java.net.ConnectException: Connection refused at java.base/sun.nio.ch.Net.connect0(Native Method) at java.base/sun.nio.ch.Net.connect(Net.java:579) at java.base/sun.nio.ch.Net.connect(Net.java:568) at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:588) at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327) at java.base/java.net.Socket.connect(Socket.java:633) at java.base/java.net.Socket.connect(Socket.java:583) at java.base/java.net.Socket.<init>(Socket.java:507) at java.base/java.net.Socket.<init>(Socket.java:287) at com.jcraft.jsch.Util$1.run(Util.java:362) ... 1 common frames omitted 2024-07-11T08:53:37.333+0000 ERROR [vcf_dm,668f9d8c598a98687e2d1a17ee230635,8c2c] [c.v.e.s.c.u.c.SshCommandExecuter,dm-exec-5] Could not connect to the SSH server @ m01-esx20003.domain.internal for configuration. com.jcraft.jsch.JSchException: java.net.ConnectException: Connection refused at com.jcraft.jsch.Util.createSocket(Util.java:394) at com.jcraft.jsch.Session.connect(Session.java:215) at com.vmware.evo.sddc.common.util.SshUtil.getSession(SshUtil.java:678) at com.vmware.evo.sddc.common.util.SshUtil.getSession(SshUtil.java:626) at com.vmware.evo.sddc.common.util.command.SshCommandExecuter.<init>(SshCommandExecuter.java:46) at com.vmware.evo.sddc.common.util.command.SshCommandExecuterFactory.createSshCommandExecuter(SshCommandExecuterFactory.java:71) at com.vmware.evo.sddc.common.util.command.SshCommandExecuterFactory.createSshCommandExecuter(SshCommandExecuterFactory.java:42) at jdk.internal.reflect.GeneratedMethodAccessor2203.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:568) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:343) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:196) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163) at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:750) at org.springframework.validation.beanvalidation.MethodValidationInterceptor.invoke(MethodValidationInterceptor.java:141) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:184) at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:750) at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:702) at com.vmware.evo.sddc.common.util.command.SshCommandExecuterFactory$$SpringCGLIB$$0.createSshCommandExecuter(<generated>) at com.vmware.vcf.common.fsm.plugins.action.impl.ConfigureStaticRouteOnHostsWithWitnessValidationAction.witnessConnectivityCheck(ConfigureStaticRouteOnHostsWithWitnessValidationAction.java:78) at com.vmware.vcf.common.fsm.plugins.action.impl.ConfigureStaticRouteOnHostsWithWitnessValidationAction.postValidate(ConfigureStaticRouteOnHostsWithWitnessValidationAction.java:66) at com.vmware.vcf.common.fsm.plugins.action.impl.ConfigureStaticRouteOnHostsWithWitnessValidationAction.postValidate(ConfigureStaticRouteOnHostsWithWitnessValidationAction.java:29) at com.vmware.evo.sddc.orchestrator.platform.action.FsmActionState.lambda$static$1(FsmActionState.java:23) at com.vmware.evo.sddc.orchestrator.platform.action.FsmActionState.invoke(FsmActionState.java:62) at com.vmware.evo.sddc.orchestrator.platform.action.FsmActionPlugin.invoke(FsmActionPlugin.java:159) at com.vmware.evo.sddc.orchestrator.platform.action.FsmActionPlugin.invoke(FsmActionPlugin.java:144) at com.vmware.evo.sddc.orchestrator.core.ProcessingTaskSubscriber.invokeMethod(ProcessingTaskSubscriber.java:400) at com.vmware.evo.sddc.orchestrator.core.ProcessingTaskSubscriber.processTask(ProcessingTaskSubscriber.java:561) at com.vmware.evo.sddc.orchestrator.core.ProcessingTaskSubscriber.accept(ProcessingTaskSubscriber.java:124) at jdk.internal.reflect.GeneratedMethodAccessor887.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:568) at com.google.common.eventbus.Subscriber.invokeSubscriberMethod(Subscriber.java:85) at com.google.common.eventbus.Subscriber.lambda$dispatchEvent$0(Subscriber.java:71) at com.vmware.vcf.common.tracing.TraceRunnable.run(TraceRunnable.java:59) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at java.base/java.lang.Thread.run(Thread.java:833) Caused by: java.net.ConnectException: Connection refused at java.base/sun.nio.ch.Net.connect0(Native Method) at java.base/sun.nio.ch.Net.connect(Net.java:579) at java.base/sun.nio.ch.Net.connect(Net.java:568) at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:588) at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327) at java.base/java.net.Socket.connect(Socket.java:633) at java.base/java.net.Socket.connect(Socket.java:583) at java.base/java.net.Socket.<init>(Socket.java:507) at java.base/java.net.Socket.<init>(Socket.java:287) at com.jcraft.jsch.Util$1.run(Util.java:362) ... 1 common frames omitted
Stretch task completed
After enabling SSH services on the AZ2 ESXi hosts and retrying the task, it completed successfully. Upon consulting with VMware by Broadcom, we learned that the commissioning task has been modified and that SSH is disabled during this process.
As shown in the picture below, we now have a stretched VCF Management domain cluster with multiple availability zones.