ESXi datastore with corrupted partition table


My colleague told me that he always failed to vmotion virtual machines to a particular datastore. The error message does not have much information, it just says the virtual machine files on the destination datastore is not accessible even the vmotion has not started yet.

I found some relevant information in the vmkernel log:

 2014-03-26T04:36:20.189Z cpu28:6188)NMP: nmp_ThrottleLogForDevice:2318: Cmd 0x2a (0x412480f9af40, 6188) to dev “naa.60050768028081713c0000000000000f” on path “vmhba1:C0:T3:L5” Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x21 0x0. Act:NONE
2014-03-26T04:36:20.189Z cpu28:6188)ScsiDeviceIO: 2324: Cmd(0x412480f9af40) 0x2a, CmdSN 0x2b from world 6188 to dev “naa.60050768028081713c0000000000000f” failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x21 0x0.
2014-03-26T04:36:20.189Z cpu28:6188)WARNING: J3: 2663: Error committing txn callerID: 0xc1d00001 to slot 0: Not supported
2014-03-26T04:36:20.190Z cpu28:6188)WARNING: J3: 2782: Committing transaction failed: Not supported

naa.60050768028081713c0000000000000f is the disk which has the problem. In the log you can see H:0x0 D:0x2 P:0x0, H stands for Host status, H:0x0 means there is no error on the host side,  D stands for Device status, D:0x2 means checking condition, P stands for Plugin status, P:0x0 means plugin is OK. So it is more likely a storage side error. 

I double checked that datastore and found a interesting thing – the datastore size is 512GB in the SAN and also detected by the ESXi host storage adapter as 512GB, but it shows 2TB in the vCenter storage. Weird, isn’t it? 

Then I used partedUtil to check the partition table of that disk. Now, I think I found the problem – the partition table is corrupted.

~ # partedUtil get /dev/disks/naa.60050768028081713c0000000000000f
Error: The primary GPT table states that the backup GPT is located beyond the end of disk. This may happen if the disk has shrunk or partition table is corrupted. Fix, by writing backup table at the end? diskPath (/dev/disks/naa.60050768028081713c0000000000000f) disk->dev->length (1073741824) gpt->AlternateLBA (4294967295)
Error: Can’t have a partition outside the disk!
Error: Can’t have a partition outside the disk!
Unable to read partition table for device /dev/disks/naa.60050768028081713c0000000000000f 

Not sure exactly what the cause it. Someone may shrink the LUN in the SAN? I am not sure. I tried to fix the corrupted partition table but failed. It makes sense, as the disk size is not correct at all.

Fortunately, the existing virtual machines are still running fine in that datastore. What I need to do is to vmotion all virtual machines off the datastore, then detach the datastore from all hosts and unpresent the LUN in the SAN. (To permanently detach a disk, you need to run this command: esxcli storage core device detached remove -d naa.60050768028081713c0000000000000f)

Advertisements

One thought on “ESXi datastore with corrupted partition table

  1. Thanks for sharing this info, Jackie,

    So after the detach process, did you reconfigure the iSCSI LUN again from the NAS
    to fix this issue ?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s