Start a Conversation

Unsolved

This post is more than 5 years old

2298

April 7th, 2008 07:00

Replication Manager on VMWare Hosts (on Symmetrix)

All,

Firstly I am not a SAN expert (handled by another org.) and secondly I'm tearing my hair out trying to configure EMC Replciation Manager!!!

Some background...
We have a Symmetrix DMX3 SAN, which I have a two Windows 2003 Server VMs connected (no PowerPath, etc. only using SolutionsEnabler 6.5 and the same Gatekeep drive on both VMs). Using TimeFinder Cloning.

What I am trying to do is to clone a source RDM LUN from VM1 and present it to VM2. Unfortunately when I do this I receive the error "1 devices that are not visible on the mount host" and the EMC RM clone job fails.

This is how the VMs are configured (all using latets SolutionsEnabler software and in EMC RM Console the Symmetrix SAN is correctly detected)....

VM1
Hard Disk 1 - Virutal Disk (not being cloned)
Hard Disk 2 - Mapped RAW LUN (e.g. called 0BDF - source LUN being cloned)
This VM sees both LUNs in Windows and can access both.

VM2
Hard Disk 1 - Virutal Disk (not being cloned)
Hard Disk 2 - Mapped RAW LUN (e.g. called 0BEF - target cloned LUN)
This VM sees both LUNs in Windows and can access both.

In EMC Replication Manager Console
Hosts:
VM1 - Windows 2003 (Agent version: 5100)
VM2 - Windows 2003 (Agent version: 5100)
Application Sets (named: Test)
Objects: VM1 - File Systems - Z:\

Storage Pools (named: Cloning Pool)
Device: 26_0BF7, 1110001110, Not In Use, 136GB, bcv
(Storage Array is not named 1110001110, but it is the name of our storage array)
Storage Services
1110001110 - device:
26_0BF7, Not in Use, 136GB, BCV, Local Clone, Local TFClone, SAN Copy


My Problem!
When I create a Job to clone the VM1 Z:\ drive to VM2 it fails constantly on the message ".... After removing 0 in use devices, 1 devices that are not visible on the mount host..." - but there is 1 RDM device visible on the mount host (VM2)?

Any ideas on what is wrong with my configuration as for the EMC RM to constantly think that the mount host does not have the target RDM disk mapped?
What commands can i use to troubleshoot this (please bear in mind my very limited SAN knowledge) and what command do i use to release an in-use lun?

Any help would be much appreciated
thanks
Stuart

257 Posts

April 8th, 2008 01:00

Hello Stu
Welcome to the forums !

Right, lets start -

Firstly,
As best practices, you should be using dedicated Gatekeepers and not sharing gatekeepers between hosts. The reason for this is one host may have exclusive SYMAPI locks on that gatekeeper and so the other host trying to use the Gatekeeper will not be able to access the GK on the symm and so would fail whatever symapi operation it was trying to perform. You should have at least 2 dedictaed GKs per host.

Secondly,
"This VM sees both LUNs in Windows and can access both."

Never share the same lun across multiple hosts, espically Windows!

Windows attempts to access and "keep an eye" on presented volumes continously. It may be ok now while the clone lun has no NTFS partition/volumes, but as soon as you take a clone/BCV and split it off, that clone lun has the exact same signature, NTFS sysvol, etc as your production volume.
This means the potential is there for your production host to mistake the clone lun as your production lun and partmgr would swap the mount point to the clone lun disk, leading to corruption.

Unmask the clone lun from the production host, and unmask the production lun from the mount host asap.


Thirdly,
In a VMware Symmetrix environment in RM 5.1 owards (important), we now support

Production hosts - VM with RDM Mapping
Mount hosts - VM with RDM Mapping

So, you do meet the criteria there.

My concern is that you are running an unsupported version of Solutions Enabler.
Rm 5.1 (and SP1) supports up to SE 6.4.2 only. Because of this, RM may be having issues "seeing" that BCV on the mount host. Can you see the device clearly in sympd list and syminq ?

If you go to Powerlink and RM software downloads, you will see a download for the x86 Solutions Enabler 6.4.2.20 - i recommend you download and install that version on the mount host.

Let us know how it goes.
HTH
James.

8 Posts

April 8th, 2008 05:00

Hi

Sorry I seemed to have misled you as to the VM Configuration, when I mention "This VM sees both LUNs in Windows and can access both." I mean that the VM can see both the VMFS Hard Disk 1 (e.g. C:\ - System Partition for Windows) and the RDM Mapping LUN(either the source or target LUN).
The VMs only see their own RDM drives (VM1 see RRM1 LUN and VM2 sees the cloned RDM2 LUN).

As for the installing SolutionsEnabler 6.4.2.20, unfortunately I don't see this in my portal I only see 6.4.2.0. I have downloaded and installed this version but now when I discover Hosts or Run any Jobs the EMC RM Console appears and stays on the 'Starting Array Discovery on client.." section (if I look in Active Tasks, it just says Array Discover at 10% and stays there) - I've tried this several times and it remains in this position for a good few hours (then I've cancelled it!). I didn't have this issue with SolutionsEnabler 6.5.

Can someone point me to the exact location for 6.4.2.20 or email it to me (stuart,cash@gxs.com)?

As for the Gatekeeper, I am in discussion with the SAN team and hopefully they'll ensure its setup according to your instructions.

Is there anything else I can do/check to see why a) it now hangs at 10% Discover Array and b) the original problem of the mount host not seeing the target LUN even though it is assigned this LUN in the VM and can see it in Windows?

thanks in advance
Stuart

257 Posts

April 8th, 2008 06:00

Hi Stuart

Can you access the following x86 SE 6.4.2.20 file link?

http://powerlink.emc.com/km/live1/en_US/Offering_Technical/Software_Download/se64220-WINDOWS-x86.msi?mtcs=ZXZlbnRUeXBlPUttQ2xpY2tDb250ZW50RXZlbnQsZG9jdW1lbnRJZD0wOTAxNDA2NjgwMzEzNmJiLG5hdmVOb2RlPVNvZndhcmVEb3dubG9hZHMtMg__

1. Restart the RM client service.
2. Set the RM Client agent into debug mode (right click RM client in UI and properties)

3. You can examing the latest RM client debug log files in
C:\Program Files\emc\rm\logs\client\

and the SYMAPI logs in
C:\Program Files\emc\symapi\log\

to see what the current operation the RM client is performing.

Can you post 50 of the last lines of the last RM debug log and the last 10 lines of the last symapi log here if you like (just make sure to strip out any sensitive information in the log first)
I recommend you also consider opening a support call on this, who can work with you live via WebEx to understand the problem better.

James

8 Posts

April 14th, 2008 03:00

Hi James,

Thanks for the reply.
I still cannot access the link you provided, some error message about 'Access Violation'.

I have made some progress though (with the help of our SAN Engineers); we've got it to the state where the replica is being mounted on the Mount host - unfortunately this step fails with the following error..

"ERROR: ProcessPostImport failed. Verify that (1) STORPort driver, not SCSIPort, is installed on the mount host. See RM Support Matrix for correct version. (2) All the latest VSS and STORPort hot fixes are installed. (3) The correct version of PowerPath is installed on the mount host; see RM Support Matrix. If the failures are intermittent, then (1) Use the environmental variable EMC_ERM_LUNSNAP_WAIT to increase the time (in seconds) for the newly added devices to appear following a rescan on the system. (2) Use the RM Consoles Job Retry option to start an automatic retry if the job fails. If the problem persists, turn on the VSS tracing and send the output to the customer service.

2008 04 11 19:05:02 amsud389 026223 ERROR: An unexpected internal error occurred: SS_base: executeMount_implementation: failed to import shadow copy.'

I have checked and the STORPort is installed (there are 2 services called: EMC storapid and EMC storsrvd - I assume this is it?).
I have also looked for the RM Support Matrix and can only find an old version of it (May 2007); have checked and I have all the latest Windows Hotfixes installed (actually Microsoft Windows 2003 SP2 is installed on the mount host and this has the hotfixes listed).

Do I need any specific Hotfixes (Microsoft or EMC RM) applied to the Mount host - if so can someone please list them (or point me in the rright direction).

In addition there are no errors in the Event Logs on the Mount host.

Any help you can provide would be much appreciated.

thanks
Stuart

2.2K Posts

April 14th, 2008 09:00

Stuart,
The STORPort driver is a Microsoft hotfix that is not included in SP2 or in Windows Updates. The latest one supported by EMC is KB943545.

Use the E-Lab Interoperability Navigator to determine the right driver and agent levels for your hosts. For example, add in RM5.1 and Windows 2003 SP2 and you will see the required hotfixes, SAN agents etc... There are one or two hotfixes that are not included in SP2 that you will have to download and install for a supported RM configuration.

Aran

8 Posts

April 16th, 2008 08:00

Hello,

I have applied the hotfixes mentioned in the previous response (KB943545) and relevant ones listed on E-LAb Interoperability Navigator and now receive the following errors when performing the replica of a SQL Server database (the relevant RM SQL Agent is installed on both nodes)...

2008 04 16 16:29:42 someserver 100719 ERROR: A E_OUTOFMEMORY error occurred while querying the status of the DoSnapshotSet operation. The error code is: 0x8007000e.

2008 04 16 16:29:42 someserver 026202 ERROR: Storage Services operation waitVSSCompletion failed with an error as follows: Wait Snapshot completion failed.

2008 04 16 16:29:42 someserver 026084 ERROR: Storage Services operation processExecute failed with an error as follows: Storage Services has terminated with an error.

The server in question (someserver) has ample RAM (>10GB).
Any ideas on the above errors woul dbe much appreciated.

thanks
Stuart

257 Posts

April 17th, 2008 02:00

Hi Stuart

Can you please tell me what service pack you are using on Windows 2003? Sp1 or Sp2?

If Sp1, can you please let me know the latest VSS Hotfix you have installed?

Is your VM currently out of physical memory and swapping?

Cheers
James.

8 Posts

April 17th, 2008 06:00

Hi James,

The source VM is running Windows 2003 Advanced Server with SP2 and the destination/Mount Host is running Windows 2003 Standard Edition with SP2. Both have the same hotfixes applied.

Neither of the VMs are out of memory or swapping, there is no activity apart from this RM job on these servers/VMs as this is still in test/config stage.

thanks
Stuart

8 Posts

April 21st, 2008 03:00

Hi,

I have made some developments and have successfully managed to run the job and mount 2 drives on the Mount Host.

However, when I re-run the job; during the Unmount operation I receive the following error:

2008 04 21 11:05:35 mountserver Unable to refresh \\.\PHYSICALDRIVE6 because of a Windows error. This might cause unexpected behavior in the next mount..

When the job then runs and tries to mount the new replicas on the Mount host it fails and cannot mount the LUN.
However, if I Unmount the replicas off the Mount host manually, then re-run the job it works correctly and the Job automatically mounts the LUNs as expected.

Any ideas on why re-running the Job, fails to Unmount the LUN (and hence fails the Job); whereas if I manually Unmount the LUN first then the job runs without issue.

thanks in advance
Stuart

257 Posts

April 21st, 2008 04:00

Hi Stuart

Please make sure that nothing (user or process) may have a handle open to the filesystem you are trying to unmount. The error you are getting is an issue trying to refresh the devnode information in Windows that the volume is gone, but something in the device tree still has a handle on it, prehaps at a disk, volume or ntfs layer.
Some things which would cause this is
1) someone with explorer open in a directory of a mountpoint while RM is trying to unmount it
2) a third party app which puts exclusive or open monitoring locks on filesystems, for example openmanage, etc
3) an application, eg SQL running on that filesystem.

Make sure the above 1,2,3 are not your issue and let us know please.

Thanks - James.

8 Posts

April 21st, 2008 08:00

Hi James,

I have checked the VM and I can't see that anything app has access to the relevant filesystem. SQL Server does use this LUN (as the EMC RM SQL Agent components are installed on both Source/prod and Mount hosts) and once RM mounts the LUN there are scripts which attach this database back into SQL on the Mount host - hence don't expect its this thats causing the issue.

There is no anti-virus or any process that I can see that is accessing or holding open this replica LUN. There is nothing in the Event Logs either.

I'm at a loss at to whats causing this...!??

thanks
Stuart

257 Posts

April 22nd, 2008 00:00

Hey Stuart

So, this is a SQL replica, which you mount and use scripts then to attach databases on the mount volumes. At this point SQLserver.exe ** has exclusive locks & handles open on the datafiles and logs ** - this will cause the unmount issues you see if you dont lift those handles prior to unmount.

So, what you need to do prior to unmount (in a callout script or a user pre-script ) is to
1) kill all existing connections to the attached (replica) databases
2) T-SQL - forcefully detach the replica databases
3) T-SQL - run a sql script to ensure the dbs are on longer detached
4) Call RM to unmount

That is of course if you want to keep SQLserver running. If you dont need to keep SQLserver running, just stop the SQLservice and XPAgent (SQLAgent) and then unmount.

Cheers
James.

2.2K Posts

April 22nd, 2008 08:00

Stuart,
I have a similar environment where we use clones of SQL databases to mount copies of production to our developers. Since in RM there is not an option to run pre-scripts on the mount host, I have a scheduled task on the mount host that runs a few minutes before the scheduled clone job each night. The script detaches the database from SQL. That way when the clone job runs there are no conflicts. I use a post-mount script in the job to then attach the database.

Aran

8 Posts

April 23rd, 2008 07:00

Hi James, Aran,

Thanks for your responses.

I have created a IR_CALLOUT_ _ _1100.bat and placed in the relevant directory on the mount host.
The contents of the IR_CALLOUT file place the replica database into Resitricted mode and then Unattach this database (as below):

contents:
osql /E /q "EXIT(ALTER DATABASE replicaDB SET RESTRICTED_USER WITH ROLLBACK IMMEDIATE)"
osql /E /q "EXIT(sp_detach_db 'replicaDB')"
EXIT 0



The file is called and executed during by the RM Job, but still presents the 'Unable to Refresh' error.

So I'm still unsure what else could be causing this.

thanks
Stuart

2.2K Posts

April 23rd, 2008 08:00

I remember reading about using the IR_CALLOUT scripts to launch a pre-job process but haven't had time to play around with them yet. Can you confirm though that the script is processing properly and detaching the database?

you could add " -o c:\detach_database.txt" at the end of the osql command to view the results of the detach job.
No Events found!

Top