Stretch Management Domain in VCF 5.1.1 1

Stretch Management Domain in VCF 5.1.1

In this article, we’ll guide you through the process of stretching a VMware Cloud Foundation (VCF) management domain across two availability zones, enhancing your infrastructure’s resilience and ensuring continuous availability. By the end of this tutorial, you’ll have a solid understanding of how to configure and manage a stretched VCF deployment to protect your critical workloads.

Prerequisites

The requirements for stretching a management domain in VCF are documented on the VMware by Broadcom website. In our environment, we deviate from the VCF blueprint. We have two dedicated Edge nodes in AZ1 and two in AZ2. You can see the subnet requirements table we are using below.

FunctionAZ1AZ2HA Layer 3 Gateway
VM management VLAN
ESXi Management VLAN (AZ1)X
vMotion VLAN (AZ1)X
vSAN VLAN (AZ1)X
NSX Host Overlay (AZ1)X
NSX Edge Uplink01 (AZ1)XX
NSX Edge Uplink02 (AZ1)XX
NSX Edge Overlay (AZ1)X
ESXi Management VLAN (AZ2)X
vMotion VLAN (AZ2)X
vSAN VLAN (AZ2)X
NSX Host Overlay (AZ2)X
NSX Edge Uplink01 (AZ2)XX
NSX Edge Uplink02 (AZ2)XX
NSX Edge Overlay (AZ2)X

Network Pool

By default, you will have only one network pool after deploying VCF with the VMware Cloud Builder. One of the first tasks is to commission the hosts in AZ2 within the SDDC Manager. To do this, you need to have a second network pool that will contain AZ2-specific networks for vMotion and vSAN for the hosts in AZ2.

Commision hosts AZ2 hosts

Once the AZ2-specific network pool has been created, you can commission the hosts in the SDDC Manager. Click on the “Commission Hosts” button to start the wizard.

Make sure you perform all the necessary checks from the checklist and click the “Proceed” button.

Add all AZ2 hosts and perform the validation. Once the validation is complete, click the “Next” button.

Click the ‘Commission‘ button to initiate the host commissioning task in the SDDC Manager.

Once the commission task has been completed, you will see the AZ2 hosts in the inventory as Unassigned. These ESXi hosts can eventually be used to stretch the management domain.

vSAN witness node

Make sure that you have already deployed and configured the vSAN Witness node according to the VMware by Broadcom documentation, and register the vSAN Witness in the management domain as shown below.

Stretching the Management Domain

Retrieving IDs of Unassigned ESXi Hosts

With the prerequisites in place, we are now ready to communicate with the API of the SDDC Manager. Let’s start by collecting the ID numbers of the Unassigned ESXi hosts that we commissioned earlier.

Click on Developer Center, then API Explorer, and select Hosts. Choose GET from the options. Enter “UNASSIGNED_USEABLE” as the value for the status parameter and click Execute.

You will eventually receive a response with all the ESXi hosts that have the status UNASSIGNED_USEABLE. Note down all the IDs of these ESXi hosts, as we will use them in a later step.

Retrieving ID of cluster

We now need to retrieve the ID of the management domain cluster that we want to stretch between availability zones.

Click on Developer Center, then API Explorer, and select Clusters. Choose GET from the options and click Execute.

You will eventually receive a response with the details of the management domain cluster. Note down the ID of the cluster, as we will use it in a later step.

Creating the JSON specification

Based on the information gathered in the previous steps, we are now ready to create a JSON specification to initiate the stretch management domain task. Since the VCF management domain has been deployed with the vSphere Distributed Switch profile “Profile-1,” there is no need to configure additional uplinks in the JSON specification.


In the hostSpecs, enter the following details for each host:

KeyValue
hostnameFQDN of the ESXi Host
networkProfileNameThe network profile name we have created earlier
vdsNameThe vSphere distibuted switch name that is in use by AZ1
idThe ID of the host we have gathered earlier
licenseKeyThe ESXi license that need to be applied on the ESXi host

In the networkSpec section, under networkProfiles, enter the following details:

KeyValue
nameThe network profile name we have created earlier
ipAddressPoolNameThe NSX IP Pool that will be used (will be created in the next specification section)
uplinkProfileNameThe NSX uplink Profile in that will be used (will be created in the next specification section)
vdsNameThe vSphere distibuted switch name that is in use by AZ1

In the networkSpec section, under nsxClusterSpec, enter the following details:

KeyValue
descriptionDescription that will be used for the IP Pool in NSX
nameA name that will be used for the IP Pool in NSX
uplinkProfileNameA name that will be used for the uplink Profile in NSX
vdsNameThe vSphere distibuted switch name that is in use by AZ1

In the networkSpec section, under uplinkProfiles, provide the following details:

KeyValue
nameA name that will be used for the uplink profile
transportVlanThe VLAN for the host TEP communication

In the witnessSpec, enter the following details:

KeyValue
fqdnThe FQDN of the vSAN witness node
vsanCidrThe CIDR that is used by the vSAN witness node
vsanIpThe IP address of the vSAN witness node
{
"clusterStretchSpec": {
  "hostSpecs": [
   {
    "hostname": "m01-esx20001.domain.internal",
    "hostNetworkSpec": {
     "networkProfileName": "m01-np02",
     "vmNics": [
      {
       "id": "vmnic0",
       "uplink": "uplink1",
       "vdsName": "m01-cl01-vds01"
      },
      {
       "id": "vmnic1",
       "uplink": "uplink2",
       "vdsName": "m01-cl01-vds01"
      }
     ]
    },
    "id": "1e826b08-52c5-4e95-85c9-b205c777f55a",
    "licenseKey": "XXXXX-XXXXX-XXXXX-XXXXX-XXXXX"
   },
   {
    "hostname": "m01-esx20002.domain.internal",
    "hostNetworkSpec": {
     "networkProfileName": "m01-np02",
     "vmNics": [
      {
       "id": "vmnic0",
       "uplink": "uplink1",
       "vdsName": "m01-cl01-vds01"
      },
      {
       "id": "vmnic1",
       "uplink": "uplink2",
       "vdsName": "m01-cl01-vds01"
      }
     ]
    },
    "id": "c59ec647-90fa-4537-a330-0debc63dc26b",
    "licenseKey": "XXXXX-XXXXX-XXXXX-XXXXX-XXXXX"
   },
   {
    "hostname": "m01-esx20003.domain.internal",
    "hostNetworkSpec": {
     "networkProfileName": "m01-np02",
     "vmNics": [
      {
       "id": "vmnic0",
       "uplink": "uplink1",
       "vdsName": "m01-cl01-vds01"
      },
      {
       "id": "vmnic1",
       "uplink": "uplink2",
       "vdsName": "m01-cl01-vds01"
      }
     ]
    },
    "id": "b1e3481f-fd65-4db9-8c8a-2e2138d017a8",
    "licenseKey": "XXXXX-XXXXX-XXXXX-XXXXX-XXXXX"
   },
   {
    "hostname": "m01-esx20004.domain.internal",
    "hostNetworkSpec": {
     "networkProfileName": "m01-np02",
     "vmNics": [
      {
       "id": "vmnic0",
       "uplink": "uplink1",
       "vdsName": "m01-cl01-vds01"
      },
      {
       "id": "vmnic1",
       "uplink": "uplink2",
       "vdsName": "m01-cl01-vds01"
      }
     ]
    },
    "id": "ae56f9b0-ddee-4f28-8389-f0ec82ea2055",
    "licenseKey": "XXXXX-XXXXX-XXXXX-XXXXX-XXXXX"
   }
  ],
  "isEdgeClusterConfiguredForMultiAZ": false,
  "networkSpec": {
   "networkProfiles": [
    {
     "isDefault": false,
     "name": "m01-np02",
     "nsxtHostSwitchConfigs": [
      {
       "ipAddressPoolName": "m01-cl01-az2-tep01",
       "uplinkProfileName": "m01-cl01-vds01-az2-hostswitch",
       "vdsName": "m01-cl01-vds01",
       "vdsUplinkToNsxUplink": [
        {
         "nsxUplinkName": "uplink-1",
         "vdsUplinkName": "uplink1"
        },
        {
         "nsxUplinkName": "uplink-2",
         "vdsUplinkName": "uplink2"
        }
       ]
      }
     ]
    }
   ],
   "nsxClusterSpec": {
    "ipAddressPoolsSpec": [
     {
      "description": "AZ2 ESXi Host Overlay TEP IP Pool",
      "name": "m01-cl01-az2-tep01",
      "subnets": [
       {
        "cidr": "xx.xx.57.0/24",
        "gateway": "xx.xx.57.1",
        "ipAddressPoolRanges": [
         {
          "end": "xx.xx.57.60",
          "start": "xx.xx.57.20"
         }
        ]
       }
      ]
     }
    ],
    "uplinkProfiles": [
     {
      "name": "m01-cl01-vds01-az2-hostswitch",
      "teamings": [
       {
        "activeUplinks": [
         "uplink-1",
         "uplink-2"
        ],
        "name": "DEFAULT",
        "policy": "LOADBALANCE_SRCID",
        "standByUplinks": []
       }
      ],
      "transportVlan": 1108
     }
    ]
   }
  },
  "witnessSpec": {
   "fqdn": "w01-vsw30001.domain.internal",
   "vsanCidr": "xx.xx.80.0/24",
   "vsanIp": "xx.xx.80.22"
  },
  "witnessTrafficSharedWithVsanTraffic": true
}
}

Perform validation of the JSON specification

We now have the JSON specification ready. Before applying the cluster stretch task, we need to validate the JSON specifications.

To do this, navigate to the Developer Center, then to API Explorer, and select Clusters. Choose the POST /v1/clusters/{id}/validations option. Enter the cluster ID we gathered earlier, paste the JSON specification into the clusterUpdateSpec field, and click Execute.

If the input is valid, you should receive a response with resultStatus set to “SUCCEEDED.” Once you see this success message, you can proceed with creating the Cluster Stretch task.

Update cluster by Stretching a standard vSAN cluster

After a successful validation of the JSON specification, you are ready to update the cluster for stretching.

To do this, navigate to the Developer Center, then to API Explorer, and select Clusters. Choose the PATCH /v1/clusters/{id} option. Enter the cluster ID we gathered earlier, paste the JSON specification into the clusterUpdateSpec field, and click Execute.

In the response, you should see a task created for stretching the management domain vSAN cluster. This task will also appear in the SDDC Manager’s tasks view at the bottom of the screen.

Issue during stretch task

One issue we encountered was “Configure Static Routes On ESXi Hosts Along With Witness Validation.” We were unsure of the cause of this error, so we began by investigating the log files.

In the log file /var/log/vmware/vcf/domainmanager/domainmanager.log, we found the following error message: “Could not connect to the SSH server @ esxi-fqdn for configuration.” This led us to believe that SSH is disabled on the ESXi hosts.

Error:

2024-07-11T08:53:37.320+0000 DEBUG [vcf_dm,668f9d8c598a98687e2d1a17ee230635,8c2c] [c.v.v.c.f.p.a.i.ConfigureStaticRouteOnHostsWithWitnessValidationAction,dm-exec-5]  Trying to ping xx.xx.80.22 from m01-esx20003.domain.internal
2024-07-11T08:53:37.330+0000 DEBUG [vcf_dm,668f9d8c598a98687e2d1a17ee230635,8c2c] [c.v.v.s.c.s.SecurityConfigurationServiceImpl,dm-exec-5]  Security config retrieved {"fipsMode":false}
2024-07-11T08:53:37.332+0000 ERROR [vcf_dm,668f9d8c598a98687e2d1a17ee230635,8c2c] [c.v.evo.sddc.common.util.SshUtil,dm-exec-5]  Unable to create jsch CLI session:
com.jcraft.jsch.JSchException: java.net.ConnectException: Connection refused
        at com.jcraft.jsch.Util.createSocket(Util.java:394)
        at com.jcraft.jsch.Session.connect(Session.java:215)
        at com.vmware.evo.sddc.common.util.SshUtil.getSession(SshUtil.java:678)
        at com.vmware.evo.sddc.common.util.SshUtil.getSession(SshUtil.java:626)
        at com.vmware.evo.sddc.common.util.command.SshCommandExecuter.<init>(SshCommandExecuter.java:46)
        at com.vmware.evo.sddc.common.util.command.SshCommandExecuterFactory.createSshCommandExecuter(SshCommandExecuterFactory.java:71)
        at com.vmware.evo.sddc.common.util.command.SshCommandExecuterFactory.createSshCommandExecuter(SshCommandExecuterFactory.java:42)
        at jdk.internal.reflect.GeneratedMethodAccessor2203.invoke(Unknown Source)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:568)
        at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:343)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:196)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
        at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:750)
        at org.springframework.validation.beanvalidation.MethodValidationInterceptor.invoke(MethodValidationInterceptor.java:141)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:184)
        at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:750)
        at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:702)
        at com.vmware.evo.sddc.common.util.command.SshCommandExecuterFactory$$SpringCGLIB$$0.createSshCommandExecuter(<generated>)
        at com.vmware.vcf.common.fsm.plugins.action.impl.ConfigureStaticRouteOnHostsWithWitnessValidationAction.witnessConnectivityCheck(ConfigureStaticRouteOnHostsWithWitnessValidationAction.java:78)
        at com.vmware.vcf.common.fsm.plugins.action.impl.ConfigureStaticRouteOnHostsWithWitnessValidationAction.postValidate(ConfigureStaticRouteOnHostsWithWitnessValidationAction.java:66)
        at com.vmware.vcf.common.fsm.plugins.action.impl.ConfigureStaticRouteOnHostsWithWitnessValidationAction.postValidate(ConfigureStaticRouteOnHostsWithWitnessValidationAction.java:29)
        at com.vmware.evo.sddc.orchestrator.platform.action.FsmActionState.lambda$static$1(FsmActionState.java:23)
        at com.vmware.evo.sddc.orchestrator.platform.action.FsmActionState.invoke(FsmActionState.java:62)
        at com.vmware.evo.sddc.orchestrator.platform.action.FsmActionPlugin.invoke(FsmActionPlugin.java:159)
        at com.vmware.evo.sddc.orchestrator.platform.action.FsmActionPlugin.invoke(FsmActionPlugin.java:144)
        at com.vmware.evo.sddc.orchestrator.core.ProcessingTaskSubscriber.invokeMethod(ProcessingTaskSubscriber.java:400)
        at com.vmware.evo.sddc.orchestrator.core.ProcessingTaskSubscriber.processTask(ProcessingTaskSubscriber.java:561)
        at com.vmware.evo.sddc.orchestrator.core.ProcessingTaskSubscriber.accept(ProcessingTaskSubscriber.java:124)
        at jdk.internal.reflect.GeneratedMethodAccessor887.invoke(Unknown Source)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:568)
        at com.google.common.eventbus.Subscriber.invokeSubscriberMethod(Subscriber.java:85)
        at com.google.common.eventbus.Subscriber.lambda$dispatchEvent$0(Subscriber.java:71)
        at com.vmware.vcf.common.tracing.TraceRunnable.run(TraceRunnable.java:59)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.net.ConnectException: Connection refused
        at java.base/sun.nio.ch.Net.connect0(Native Method)
        at java.base/sun.nio.ch.Net.connect(Net.java:579)
        at java.base/sun.nio.ch.Net.connect(Net.java:568)
        at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:588)
        at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327)
        at java.base/java.net.Socket.connect(Socket.java:633)
        at java.base/java.net.Socket.connect(Socket.java:583)
        at java.base/java.net.Socket.<init>(Socket.java:507)
        at java.base/java.net.Socket.<init>(Socket.java:287)
        at com.jcraft.jsch.Util$1.run(Util.java:362)
        ... 1 common frames omitted
2024-07-11T08:53:37.333+0000 ERROR [vcf_dm,668f9d8c598a98687e2d1a17ee230635,8c2c] [c.v.e.s.c.u.c.SshCommandExecuter,dm-exec-5]  Could not connect to the SSH server @ m01-esx20003.domain.internal for configuration.
com.jcraft.jsch.JSchException: java.net.ConnectException: Connection refused
        at com.jcraft.jsch.Util.createSocket(Util.java:394)
        at com.jcraft.jsch.Session.connect(Session.java:215)
        at com.vmware.evo.sddc.common.util.SshUtil.getSession(SshUtil.java:678)
        at com.vmware.evo.sddc.common.util.SshUtil.getSession(SshUtil.java:626)
        at com.vmware.evo.sddc.common.util.command.SshCommandExecuter.<init>(SshCommandExecuter.java:46)
        at com.vmware.evo.sddc.common.util.command.SshCommandExecuterFactory.createSshCommandExecuter(SshCommandExecuterFactory.java:71)
        at com.vmware.evo.sddc.common.util.command.SshCommandExecuterFactory.createSshCommandExecuter(SshCommandExecuterFactory.java:42)
        at jdk.internal.reflect.GeneratedMethodAccessor2203.invoke(Unknown Source)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:568)
        at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:343)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:196)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
        at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:750)
        at org.springframework.validation.beanvalidation.MethodValidationInterceptor.invoke(MethodValidationInterceptor.java:141)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:184)
        at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:750)
        at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:702)
        at com.vmware.evo.sddc.common.util.command.SshCommandExecuterFactory$$SpringCGLIB$$0.createSshCommandExecuter(<generated>)
        at com.vmware.vcf.common.fsm.plugins.action.impl.ConfigureStaticRouteOnHostsWithWitnessValidationAction.witnessConnectivityCheck(ConfigureStaticRouteOnHostsWithWitnessValidationAction.java:78)
        at com.vmware.vcf.common.fsm.plugins.action.impl.ConfigureStaticRouteOnHostsWithWitnessValidationAction.postValidate(ConfigureStaticRouteOnHostsWithWitnessValidationAction.java:66)
        at com.vmware.vcf.common.fsm.plugins.action.impl.ConfigureStaticRouteOnHostsWithWitnessValidationAction.postValidate(ConfigureStaticRouteOnHostsWithWitnessValidationAction.java:29)
        at com.vmware.evo.sddc.orchestrator.platform.action.FsmActionState.lambda$static$1(FsmActionState.java:23)
        at com.vmware.evo.sddc.orchestrator.platform.action.FsmActionState.invoke(FsmActionState.java:62)
        at com.vmware.evo.sddc.orchestrator.platform.action.FsmActionPlugin.invoke(FsmActionPlugin.java:159)
        at com.vmware.evo.sddc.orchestrator.platform.action.FsmActionPlugin.invoke(FsmActionPlugin.java:144)
        at com.vmware.evo.sddc.orchestrator.core.ProcessingTaskSubscriber.invokeMethod(ProcessingTaskSubscriber.java:400)
        at com.vmware.evo.sddc.orchestrator.core.ProcessingTaskSubscriber.processTask(ProcessingTaskSubscriber.java:561)
        at com.vmware.evo.sddc.orchestrator.core.ProcessingTaskSubscriber.accept(ProcessingTaskSubscriber.java:124)
        at jdk.internal.reflect.GeneratedMethodAccessor887.invoke(Unknown Source)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:568)
        at com.google.common.eventbus.Subscriber.invokeSubscriberMethod(Subscriber.java:85)
        at com.google.common.eventbus.Subscriber.lambda$dispatchEvent$0(Subscriber.java:71)
        at com.vmware.vcf.common.tracing.TraceRunnable.run(TraceRunnable.java:59)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.net.ConnectException: Connection refused
        at java.base/sun.nio.ch.Net.connect0(Native Method)
        at java.base/sun.nio.ch.Net.connect(Net.java:579)
        at java.base/sun.nio.ch.Net.connect(Net.java:568)
        at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:588)
        at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327)
        at java.base/java.net.Socket.connect(Socket.java:633)
        at java.base/java.net.Socket.connect(Socket.java:583)
        at java.base/java.net.Socket.<init>(Socket.java:507)
        at java.base/java.net.Socket.<init>(Socket.java:287)
        at com.jcraft.jsch.Util$1.run(Util.java:362)
        ... 1 common frames omitted

Stretch task completed

After enabling SSH services on the AZ2 ESXi hosts and retrying the task, it completed successfully. Upon consulting with VMware by Broadcom, we learned that the commissioning task has been modified and that SSH is disabled during this process.

As shown in the picture below, we now have a stretched VCF Management domain cluster with multiple availability zones.

Leave a Comment