Unable to bring cluster online in vROPS 8.6.3 1

Unable to bring cluster online in vROPS 8.6.3

This week, I was troubleshooting an issue with one of the vROPS (vRealize Operations Manager) clusters at a customer. I was unable to perform the “bring the cluster online” task from the admin page of vROPS after a restart.

The issue

As shown in the picture below, I was unable to bring the cluster online. The cluster status was “failure” with the error message: Cluster failed to come online.

issue
Performing the “bring cluster online” task results in a failure in the admin portal of vROPS.

Troubleshooting steps

Rebooting the vROPS nodes didn’t make any differences, so let’s take a look into the log files of vROPS.
Tasks like “Bring cluster online” or “Take Cluster Offline” are being logged in the following log file: /var/log/casa_logs/casa.log.

2022-10-18T10:59:56,906+0000 ERROR [ajp-nio-127.0.0.1-8011-exec-2] [Ff00006U] sysadmin.cassandra.PythonCassandraCommand:296 - Could not run command='/usr/lib/vmware-python-3/bin/python /usr/lib/vmware-vcopssuite/utilities/sliceConfiguration/bin/vcopsConfigureRoles.py --action startServices --force cassandra'
stdout:
2022-10-18T10:59:56,881+0000 [11722] - admin - An unhandled exception occurred, exiting with exit code: 1,
Type: "<class 'urllib.error.URLError'>"
 Value: "<urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1076)>"
 Traceback: "Traceback (most recent call last):
  File "/usr/lib/python3.7/urllib/request.py", line 1348, in do_open
    encode_chunked=req.has_header('Transfer-encoding'))
  File "/usr/lib/python3.7/http/client.py", line 1281, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.7/http/client.py", line 1327, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.7/http/client.py", line 1276, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.7/http/client.py", line 1036, in _send_output
    self.send(msg)
  File "/usr/lib/python3.7/http/client.py", line 976, in send
    self.connect()
  File "/usr/lib/python3.7/http/client.py", line 1451, in connect
    server_hostname=server_hostname)
  File "/usr/lib/python3.7/ssl.py", line 423, in wrap_socket
    session=session
  File "/usr/lib/python3.7/ssl.py", line 870, in _create
    self.do_handshake()
  File "/usr/lib/python3.7/ssl.py", line 1139, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1076)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/lib/vmware-vcopssuite/utilities/sliceConfiguration/bin/vcopsConfigureRoles.py", line 1487, in <module>
    main()
  File "/usr/lib/vmware-vcopssuite/utilities/sliceConfiguration/bin/vcopsConfigureRoles.py", line 1481, in main
    doConfigurationsAndActions(loadStateFile, runConfigureRoles, rolesToModify, adminRoleConnectionString, context, runBringSliceOffline, runBringSliceOnline, runRepairRoles, doInitSliceId, runWriteRolesToStateFile, startServices, startServicesOnConfig, stopServices, serviceStatus, joinCasaCluster, waitForFirstbootScripts, setLock, enableDisableServices, disableAllServices, runPromoteNewMaster, oldPostgresMaster, enableHA, replica, enrollmentUserString, enrollmentThumbprintString, offlineReason, useHTTPSOnly, force, jsonOutput, args)
  File "/usr/lib/vmware-vcopssuite/utilities/sliceConfiguration/bin/vcopsConfigureRoles.py", line 1430, in doConfigurationsAndActions
    runStartServices(runningRoleStateFile, rolesToModify, enableServices = enableDisableServices, services = args, force = force, context = context)
  File "/usr/lib/vmware-vcopssuite/utilities/sliceConfiguration/bin/vcopsConfigureRoles.py", line 370, in runStartServices
    force=force, context=context)
  File "/usr/lib/vmware-vcopssuite/utilities/sliceConfiguration/bin/vcopsPlatformServices.py", line 307, in startPlatformServices
    cassandra_check.wait_for_cassandra(this_node_only=True)
  File "/usr/lib/vmware-vcopssuite/utilities/vmware/vcops/cassandra/check.py", line 512, in wait_for_cassandra
    if self._retry_timeout_sec == 0 or self._are_enough_nodes_up(this_node_only):
  File "/usr/lib/vmware-vcopssuite/utilities/vmware/vcops/cassandra/check.py", line 284, in _are_enough_nodes_up
    is_CA_enabled = CassandraCheck.get_CA_enabled(logger)
  File "/usr/lib/vmware-vcopssuite/utilities/vmware/vcops/cassandra/check.py", line 618, in get_CA_enabled
    ca_state = CassandraGetApiExecutor.execute_get_api_and_return_response('localhost', '/config/cassandra/cluster/ca', logger)
  File "/usr/lib/vmware-vcopssuite/utilities/vmware/vcops/cassandra/cassandra_get_api_executor.py", line 51, in execute_get_api_and_return_response
    status_code, token_pair = vc_ops_http_utilities.login(hostname)
  File "/usr/lib/vmware-vcopssuite/utilities/lib/vc_ops_http_utilities.py", line 295, in login
    response = opener.open(authorization_request)
  File "/usr/lib/python3.7/urllib/request.py", line 525, in open
    response = self._open(req, data)
  File "/usr/lib/python3.7/urllib/request.py", line 543, in _open
    '_open', req)
  File "/usr/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.7/urllib/request.py", line 1391, in https_open
    context=self._context, check_hostname=self._check_hostname)
  File "/usr/lib/python3.7/urllib/request.py", line 1350, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1076)>

2022-10-18T10:59:56 ERROR [11722] - root - An unhandled exception occurred, exiting with exit code: 1,
Type: "<class 'urllib.error.URLError'>"
 Value: "<urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1076)>"
 Traceback: "Traceback (most recent call last):
  File "/usr/lib/python3.7/urllib/request.py", line 1348, in do_open
    encode_chunked=req.has_header('Transfer-encoding'))
  File "/usr/lib/python3.7/http/client.py", line 1281, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.7/http/client.py", line 1327, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.7/http/client.py", line 1276, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.7/http/client.py", line 1036, in _send_output
    self.send(msg)
  File "/usr/lib/python3.7/http/client.py", line 976, in send
    self.connect()
  File "/usr/lib/python3.7/http/client.py", line 1451, in connect
    server_hostname=server_hostname)
  File "/usr/lib/python3.7/ssl.py", line 423, in wrap_socket
    session=session
  File "/usr/lib/python3.7/ssl.py", line 870, in _create
    self.do_handshake()
  File "/usr/lib/python3.7/ssl.py", line 1139, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1076)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/lib/vmware-vcopssuite/utilities/sliceConfiguration/bin/vcopsConfigureRoles.py", line 1487, in <module>
    main()
  File "/usr/lib/vmware-vcopssuite/utilities/sliceConfiguration/bin/vcopsConfigureRoles.py", line 1481, in main
    doConfigurationsAndActions(loadStateFile, runConfigureRoles, rolesToModify, adminRoleConnectionString, context, runBringSliceOffline, runBringSliceOnline, runRepairRoles, doInitSliceId, runWriteRolesToStateFile, startServices, startServicesOnConfig, stopServices, serviceStatus, joinCasaCluster, waitForFirstbootScripts, setLock, enableDisableServices, disableAllServices, runPromoteNewMaster, oldPostgresMaster, enableHA, replica, enrollmentUserString, enrollmentThumbprintString, offlineReason, useHTTPSOnly, force, jsonOutput, args)
  File "/usr/lib/vmware-vcopssuite/utilities/sliceConfiguration/bin/vcopsConfigureRoles.py", line 1430, in doConfigurationsAndActions
    runStartServices(runningRoleStateFile, rolesToModify, enableServices = enableDisableServices, services = args, force = force, context = context)
  File "/usr/lib/vmware-vcopssuite/utilities/sliceConfiguration/bin/vcopsConfigureRoles.py", line 370, in runStartServices
    force=force, context=context)
  File "/usr/lib/vmware-vcopssuite/utilities/sliceConfiguration/bin/vcopsPlatformServices.py", line 307, in startPlatformServices
    cassandra_check.wait_for_cassandra(this_node_only=True)
  File "/usr/lib/vmware-vcopssuite/utilities/vmware/vcops/cassandra/check.py", line 512, in wait_for_cassandra
    if self._retry_timeout_sec == 0 or self._are_enough_nodes_up(this_node_only):
  File "/usr/lib/vmware-vcopssuite/utilities/vmware/vcops/cassandra/check.py", line 284, in _are_enough_nodes_up
    is_CA_enabled = CassandraCheck.get_CA_enabled(logger)
  File "/usr/lib/vmware-vcopssuite/utilities/vmware/vcops/cassandra/check.py", line 618, in get_CA_enabled
    ca_state = CassandraGetApiExecutor.execute_get_api_and_return_response('localhost', '/config/cassandra/cluster/ca', logger)
  File "/usr/lib/vmware-vcopssuite/utilities/vmware/vcops/cassandra/cassandra_get_api_executor.py", line 51, in execute_get_api_and_return_response
    status_code, token_pair = vc_ops_http_utilities.login(hostname)
  File "/usr/lib/vmware-vcopssuite/utilities/lib/vc_ops_http_utilities.py", line 295, in login
    response = opener.open(authorization_request)
  File "/usr/lib/python3.7/urllib/request.py", line 525, in open
    response = self._open(req, data)
  File "/usr/lib/python3.7/urllib/request.py", line 543, in _open
    '_open', req)
  File "/usr/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.7/urllib/request.py", line 1391, in https_open
    context=self._context, check_hostname=self._check_hostname)
  File "/usr/lib/python3.7/urllib/request.py", line 1350, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1076)>
"

The following error message is being logged during the bring up of the vROPS cluster: error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired.

The solution

Note:

Make sure to have a snapshot in place of the vROPS nodes before doing any changes.

The first thing that came in my mind was: Do the issue still persist with the default self signed certificates?
So I followed the VMware KB on reloading the vROPS nodes certificates with the default certificates.

The following steps needs to be performed on all of the vROPS nodes

unset -f pathprepend
unset -f pathremove
unset -f pathappend
$VMWARE_PYTHON_BIN /usr/lib/vmware-casa/bin/activate_web_certificate.py DEFAULT
$VMWARE_PYTHON_BIN /usr/lib/vmware-vcopssuite/utilities/bin/restartHttpd.py

Note:

The unset commands are required as a result of the python version differences from 6.x/7.x to 8.x to avoid errors.
kb article VMware
Running the commands all of the vROPS nodes.

The newly self signed SSL certificate will be active on the vROPS nodes after the reload of the default certificates.
You can verify that in your internet browser by checking the certificate details.

Default self signed certificate
Self signed certificates has been configured.

This is the moment we all have been waiting for. Will the bring the cluster online task work with the new default SSL certificates?

Bring cluster online
The cluster is finally online again.

After discovering that the SSL certificate was causing the issue, we requested a new domain SSL certificate and configured it on the vROPS nodes. Performing a cluster restart worked without any issues. We can conclude that an expired domain signed SSL certificate will cause issues during the bring up of the vROPS cluster.

Leave a Comment