
Unable to bring cluster online in vROPS 8.6.3
This week, I was troubleshooting an issue with one of the vROPS (vRealize Operations Manager) clusters at a customer. I was unable to perform the “bring the cluster online” task from the admin page of vROPS after a restart.
The issue
As shown in the picture below, I was unable to bring the cluster online. The cluster status was “failure” with the error message: Cluster failed to come online.

Troubleshooting steps
Rebooting the vROPS nodes didn’t make any differences, so let’s take a look into the log files of vROPS.
Tasks like “Bring cluster online” or “Take Cluster Offline” are being logged in the following log file: /var/log/casa_logs/casa.log.
2022-10-18T10:59:56,906+0000 ERROR [ajp-nio-127.0.0.1-8011-exec-2] [Ff00006U] sysadmin.cassandra.PythonCassandraCommand:296 - Could not run command='/usr/lib/vmware-python-3/bin/python /usr/lib/vmware-vcopssuite/utilities/sliceConfiguration/bin/vcopsConfigureRoles.py --action startServices --force cassandra' stdout: 2022-10-18T10:59:56,881+0000 [11722] - admin - An unhandled exception occurred, exiting with exit code: 1, Type: "<class 'urllib.error.URLError'>" Value: "<urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1076)>" Traceback: "Traceback (most recent call last): File "/usr/lib/python3.7/urllib/request.py", line 1348, in do_open encode_chunked=req.has_header('Transfer-encoding')) File "/usr/lib/python3.7/http/client.py", line 1281, in request self._send_request(method, url, body, headers, encode_chunked) File "/usr/lib/python3.7/http/client.py", line 1327, in _send_request self.endheaders(body, encode_chunked=encode_chunked) File "/usr/lib/python3.7/http/client.py", line 1276, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "/usr/lib/python3.7/http/client.py", line 1036, in _send_output self.send(msg) File "/usr/lib/python3.7/http/client.py", line 976, in send self.connect() File "/usr/lib/python3.7/http/client.py", line 1451, in connect server_hostname=server_hostname) File "/usr/lib/python3.7/ssl.py", line 423, in wrap_socket session=session File "/usr/lib/python3.7/ssl.py", line 870, in _create self.do_handshake() File "/usr/lib/python3.7/ssl.py", line 1139, in do_handshake self._sslobj.do_handshake() ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1076) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/vmware-vcopssuite/utilities/sliceConfiguration/bin/vcopsConfigureRoles.py", line 1487, in <module> main() File "/usr/lib/vmware-vcopssuite/utilities/sliceConfiguration/bin/vcopsConfigureRoles.py", line 1481, in main doConfigurationsAndActions(loadStateFile, runConfigureRoles, rolesToModify, adminRoleConnectionString, context, runBringSliceOffline, runBringSliceOnline, runRepairRoles, doInitSliceId, runWriteRolesToStateFile, startServices, startServicesOnConfig, stopServices, serviceStatus, joinCasaCluster, waitForFirstbootScripts, setLock, enableDisableServices, disableAllServices, runPromoteNewMaster, oldPostgresMaster, enableHA, replica, enrollmentUserString, enrollmentThumbprintString, offlineReason, useHTTPSOnly, force, jsonOutput, args) File "/usr/lib/vmware-vcopssuite/utilities/sliceConfiguration/bin/vcopsConfigureRoles.py", line 1430, in doConfigurationsAndActions runStartServices(runningRoleStateFile, rolesToModify, enableServices = enableDisableServices, services = args, force = force, context = context) File "/usr/lib/vmware-vcopssuite/utilities/sliceConfiguration/bin/vcopsConfigureRoles.py", line 370, in runStartServices force=force, context=context) File "/usr/lib/vmware-vcopssuite/utilities/sliceConfiguration/bin/vcopsPlatformServices.py", line 307, in startPlatformServices cassandra_check.wait_for_cassandra(this_node_only=True) File "/usr/lib/vmware-vcopssuite/utilities/vmware/vcops/cassandra/check.py", line 512, in wait_for_cassandra if self._retry_timeout_sec == 0 or self._are_enough_nodes_up(this_node_only): File "/usr/lib/vmware-vcopssuite/utilities/vmware/vcops/cassandra/check.py", line 284, in _are_enough_nodes_up is_CA_enabled = CassandraCheck.get_CA_enabled(logger) File "/usr/lib/vmware-vcopssuite/utilities/vmware/vcops/cassandra/check.py", line 618, in get_CA_enabled ca_state = CassandraGetApiExecutor.execute_get_api_and_return_response('localhost', '/config/cassandra/cluster/ca', logger) File "/usr/lib/vmware-vcopssuite/utilities/vmware/vcops/cassandra/cassandra_get_api_executor.py", line 51, in execute_get_api_and_return_response status_code, token_pair = vc_ops_http_utilities.login(hostname) File "/usr/lib/vmware-vcopssuite/utilities/lib/vc_ops_http_utilities.py", line 295, in login response = opener.open(authorization_request) File "/usr/lib/python3.7/urllib/request.py", line 525, in open response = self._open(req, data) File "/usr/lib/python3.7/urllib/request.py", line 543, in _open '_open', req) File "/usr/lib/python3.7/urllib/request.py", line 503, in _call_chain result = func(*args) File "/usr/lib/python3.7/urllib/request.py", line 1391, in https_open context=self._context, check_hostname=self._check_hostname) File "/usr/lib/python3.7/urllib/request.py", line 1350, in do_open raise URLError(err) urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1076)> 2022-10-18T10:59:56 ERROR [11722] - root - An unhandled exception occurred, exiting with exit code: 1, Type: "<class 'urllib.error.URLError'>" Value: "<urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1076)>" Traceback: "Traceback (most recent call last): File "/usr/lib/python3.7/urllib/request.py", line 1348, in do_open encode_chunked=req.has_header('Transfer-encoding')) File "/usr/lib/python3.7/http/client.py", line 1281, in request self._send_request(method, url, body, headers, encode_chunked) File "/usr/lib/python3.7/http/client.py", line 1327, in _send_request self.endheaders(body, encode_chunked=encode_chunked) File "/usr/lib/python3.7/http/client.py", line 1276, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "/usr/lib/python3.7/http/client.py", line 1036, in _send_output self.send(msg) File "/usr/lib/python3.7/http/client.py", line 976, in send self.connect() File "/usr/lib/python3.7/http/client.py", line 1451, in connect server_hostname=server_hostname) File "/usr/lib/python3.7/ssl.py", line 423, in wrap_socket session=session File "/usr/lib/python3.7/ssl.py", line 870, in _create self.do_handshake() File "/usr/lib/python3.7/ssl.py", line 1139, in do_handshake self._sslobj.do_handshake() ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1076) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/vmware-vcopssuite/utilities/sliceConfiguration/bin/vcopsConfigureRoles.py", line 1487, in <module> main() File "/usr/lib/vmware-vcopssuite/utilities/sliceConfiguration/bin/vcopsConfigureRoles.py", line 1481, in main doConfigurationsAndActions(loadStateFile, runConfigureRoles, rolesToModify, adminRoleConnectionString, context, runBringSliceOffline, runBringSliceOnline, runRepairRoles, doInitSliceId, runWriteRolesToStateFile, startServices, startServicesOnConfig, stopServices, serviceStatus, joinCasaCluster, waitForFirstbootScripts, setLock, enableDisableServices, disableAllServices, runPromoteNewMaster, oldPostgresMaster, enableHA, replica, enrollmentUserString, enrollmentThumbprintString, offlineReason, useHTTPSOnly, force, jsonOutput, args) File "/usr/lib/vmware-vcopssuite/utilities/sliceConfiguration/bin/vcopsConfigureRoles.py", line 1430, in doConfigurationsAndActions runStartServices(runningRoleStateFile, rolesToModify, enableServices = enableDisableServices, services = args, force = force, context = context) File "/usr/lib/vmware-vcopssuite/utilities/sliceConfiguration/bin/vcopsConfigureRoles.py", line 370, in runStartServices force=force, context=context) File "/usr/lib/vmware-vcopssuite/utilities/sliceConfiguration/bin/vcopsPlatformServices.py", line 307, in startPlatformServices cassandra_check.wait_for_cassandra(this_node_only=True) File "/usr/lib/vmware-vcopssuite/utilities/vmware/vcops/cassandra/check.py", line 512, in wait_for_cassandra if self._retry_timeout_sec == 0 or self._are_enough_nodes_up(this_node_only): File "/usr/lib/vmware-vcopssuite/utilities/vmware/vcops/cassandra/check.py", line 284, in _are_enough_nodes_up is_CA_enabled = CassandraCheck.get_CA_enabled(logger) File "/usr/lib/vmware-vcopssuite/utilities/vmware/vcops/cassandra/check.py", line 618, in get_CA_enabled ca_state = CassandraGetApiExecutor.execute_get_api_and_return_response('localhost', '/config/cassandra/cluster/ca', logger) File "/usr/lib/vmware-vcopssuite/utilities/vmware/vcops/cassandra/cassandra_get_api_executor.py", line 51, in execute_get_api_and_return_response status_code, token_pair = vc_ops_http_utilities.login(hostname) File "/usr/lib/vmware-vcopssuite/utilities/lib/vc_ops_http_utilities.py", line 295, in login response = opener.open(authorization_request) File "/usr/lib/python3.7/urllib/request.py", line 525, in open response = self._open(req, data) File "/usr/lib/python3.7/urllib/request.py", line 543, in _open '_open', req) File "/usr/lib/python3.7/urllib/request.py", line 503, in _call_chain result = func(*args) File "/usr/lib/python3.7/urllib/request.py", line 1391, in https_open context=self._context, check_hostname=self._check_hostname) File "/usr/lib/python3.7/urllib/request.py", line 1350, in do_open raise URLError(err) urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1076)> "
The following error message is being logged during the bring up of the vROPS cluster: error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired.
The solution
Note:
Make sure to have a snapshot in place of the vROPS nodes before doing any changes.
The first thing that came in my mind was: Do the issue still persist with the default self signed certificates?
So I followed the VMware KB on reloading the vROPS nodes certificates with the default certificates.
The following steps needs to be performed on all of the vROPS nodes
unset -f pathprepend unset -f pathremove unset -f pathappend $VMWARE_PYTHON_BIN /usr/lib/vmware-casa/bin/activate_web_certificate.py DEFAULT $VMWARE_PYTHON_BIN /usr/lib/vmware-vcopssuite/utilities/bin/restartHttpd.py
Note:
The unset commands are required as a result of the python version differences from 6.x/7.x to 8.x to avoid errors.

The newly self signed SSL certificate will be active on the vROPS nodes after the reload of the default certificates.
You can verify that in your internet browser by checking the certificate details.

This is the moment we all have been waiting for. Will the bring the cluster online task work with the new default SSL certificates?

After discovering that the SSL certificate was causing the issue, we requested a new domain SSL certificate and configured it on the vROPS nodes. Performing a cluster restart worked without any issues. We can conclude that an expired domain signed SSL certificate will cause issues during the bring up of the vROPS cluster.