Troubleshooting SAN Datastore Maintenance Mode Issue: Virtual Machines Stuck at 1%

Introduction:

Maintaining and managing storage area network (SAN) datastores is crucial for ensuring the smooth operation of virtualized environments. However, administrators may encounter challenges when attempting to enter maintenance mode for SAN datastores, especially when virtual machines appear to be preventing the process from completing. In this article, we will explore a common issue where the datastore maintenance mode stalls at 1% completion, indicating that virtual machines are still active on the storage.

Understanding the Issue:

Entering maintenance mode on a SAN datastore involves migrating virtual machine disks and configurations to alternative storage resources, allowing for maintenance tasks such as hardware upgrades or storage maintenance. However, if any virtual machines remain on the datastore, the maintenance mode process may be halted to prevent data loss or disruption to active workloads.


it shows at 3 virtual machine are in storage and stopped the progress. in  the storage there are no folder and in virtual machine tab it showing three virtual machine showing one is orphan and other two are inaccessible.

 Symptoms:

When attempting to place a SAN datastore into maintenance mode, administrators may observe the following symptoms:

1. The maintenance mode process stalls at 1% completion and fails to progress further.

2. Error messages or warnings indicate that virtual machines are still using the datastore and must be migrated before maintenance mode can proceed.

3. Virtual machines appear to be powered off or migrated, but the maintenance mode process remains stuck at 1%.

Root Cause Analysis:

Several factors can contribute to virtual machines appearing to be active on a SAN datastore during maintenance mode initiation:

1. **VMware HA (High Availability) Failover:** If VMware HA is enabled, virtual machines may be automatically restarted on the SAN datastore during maintenance mode, preventing it from entering maintenance mode.

2. **Locked Files or Resources:** Locked files or resources associated with virtual machines, such as ISO files, snapshots, or vSphere HA heartbeat files, may prevent the datastore from entering maintenance mode.

3. **Incomplete VM Migration:** Failed or incomplete virtual machine migrations may leave remnants of virtual machines on the datastore, causing maintenance mode to stall.

4. **External Processes or Scripts:** External processes or scripts accessing virtual machine files on the SAN datastore may prevent maintenance mode initiation.

Troubleshooting Steps:

To resolve the issue and successfully enter maintenance mode for the SAN datastore, administrators can take the following steps:

1. Verify VMware HA Configuration: Ensure that VMware HA settings are configured appropriately and consider temporarily disabling VMware HA to prevent automatic virtual machine failover during maintenance mode.

2. Check for Locked Files: Identify and unlock any files or resources associated with virtual machines on the SAN datastore using tools such as `lsof` or `vmkfstools`.

3. Review VM Migration Status: Confirm that virtual machines are successfully migrated to alternative datastores and reattempt maintenance mode initiation.

4. Monitor External Processes: Monitor for any external processes or scripts accessing virtual machine files on the SAN datastore and temporarily suspend or terminate them during maintenance mode initiation.

5.vCLS virtual machine: vSphere Cluster Service VM is required to maintain the health of vSphere Cluster Services. Power state and resource of this VM is managed by vSphere Cluster Services.\

6. Verify the datastore any virtual machine are place and are in poweroff stage or in accessible.

7. Select the cluster and click on Configure and select VSphere cluster services--> General-->Vcls Mode edit it and change it Retreat mode and okay. if it is successful no action is required.

8. Once VCLS allowed datastore was done, The vCLS VM's moved to the new datastore and the original datastore which was stuck at 1 percent for maintenance mode task successfully entered the maintenance mode.

9. Log to verify [root@1:~] cd var/run/log/

[root@1:/vmfs/volumes/65d61937-f2e9bf83-5e98-043201166500/log] less -I vobd.log

[root@01:/vmfs/volumes/65d61937-f2e9bf83-5e98-043201166500/log] less -I vmkernel.

vmkernel.: No such file or directory

[root@01:/vmfs/volumes/65d61937-f2e9bf83-5e98-043201166500/log] less -I vmkernel.log

[root@01:/vmfs/volumes/65d61937-f2e9bf83-5e98-043201166500/log] less -I hostd.log

[root@01:/vmfs/volumes/65d61937-f2e9bf83-5e98-043201166500/log] vmware -vl

Solution Recommendation:

  • Since the vCLS VM's cannot be migrated manually, Recommendation was made to add different datastore to hold the vCLS VM's. This was done by adding new datastore to "VCLS ALLOWED" list from "Cluster->Configure->vSphere Cluster Services and Datastores->VCLS allowed."

Conclusion:

Entering maintenance mode for SAN datastores is a critical aspect of storage management in virtualized environments. When the maintenance mode process stalls at 1% completion, indicating that virtual machines are still active on the storage, administrators must troubleshoot the issue promptly to avoid downtime and ensure the successful completion of maintenance tasks. By understanding the root causes of the issue and following appropriate troubleshooting steps, administrators can mitigate the issue and maintain the stability and reliability of their virtualized infrastructure.

Disclaimer:

The information provided in this article is for informational purposes only. Administrators should exercise caution and follow best practices when performing maintenance tasks on SAN datastores to minimize the risk of data loss or system downtime. Always consult official documentation and seek assistance from qualified professionals when dealing with complex storage and virtualization issues.

https://knowledge.broadcom.com/external/article?legacyId=91890

https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.resmgmt.doc/GUID-F98C3C93-875D-4570-852B-37A38878CE0F.html

Comments

Popular posts from this blog

Zabbix Server is not working: the information dispaly may not be current

DHCP FAILED APIPA IS USED

How to install VNX Launcher that has embedded java and Firefox