nicolasecarnot

21 Posts

49839

June 18th, 2014 04:00

Unable to resume system update, or retrieve progress.

Hi,

From our OME 1.3 and to a CentOS 6.5 (in a M620), I was hit by the bug described in 2012 about the ssh key stored in a bad place or in a bad manner, and I was able to cope with it using the deploy trick.

Now the communication between OME and this server is OK, I can see in the /var/log/secure that ssh connection are OK.

I'm trying to apply a BIOS update, and though the package download is succesful, the apply part seems stalled, saying :

"Unable to resume system update, or retrieve progress."

Next message is in french, sorry, I don't know how to translate it correctly :

"No update condition may be given when simultaneous attempt of software upgrade is run on the same server" or something.

I have absolutely no idea where to look at or how to debug.

Best regards.

Responses(13)

N

nicolasecarnot

21 Posts

0

August 20th, 2014 16:00

Bump for Rob :emotion-2:

DELL-Pupul M

1K Posts

0

August 20th, 2014 23:00

Hi Nicolas,

Sometimes if there is a process for system update or OMSA deploy already running on the linux or windows server, then you might get this error message.

There are two ways i can suggest you. One, check for the processes which are running on the linux server and kill the one which is using either runbada or omexec. You can try rebooting the server once and then see if that helps solve your problem.

Another way would be that OMSA is not installed properly, so try uninstalling and reinstalling OMSA once, then check if that helps. Let us know.

N

nicolasecarnot

21 Posts

0

August 21st, 2014 01:00

Here is what I tried :

- reboot the target server

- reboot the OME server itself

- carefully watch the /var/log/secure of the target, looking at the SSH informations

- from the OME server itself, run a single task of updating the BIOS. OMSA method is applied.

- I choose immediate launch, no GPG check, no sudo, and I provide the linux root password

- On the target, I see the SSH connection running OK :

Aug 21 08:36:23 serv-vm-adm23 sshd[22846]: Accepted password for root from 192.168.39.152 port 50049 ssh2
Aug 21 08:36:23 serv-vm-adm23 sshd[22846]: pam_unix(sshd:session): session opened for user root by (uid=0)
Aug 21 08:36:23 serv-vm-adm23 sshd[22846]: subsystem request for sftp
Aug 21 08:36:26 serv-vm-adm23 sshd[22846]: pam_unix(sshd:session): session closed for user root
Aug 21 08:36:27 serv-vm-adm23 sshd[22869]: Accepted password for root from 192.168.39.152 port 50062 ssh2
Aug 21 08:36:27 serv-vm-adm23 sshd[22869]: pam_unix(sshd:session): session opened for user root by (uid=0)
Aug 21 08:36:27 serv-vm-adm23 sshd[22869]: pam_unix(sshd:session): session closed for user root
Aug 21 08:36:28 serv-vm-adm23 sshd[22888]: Accepted password for root from 192.168.39.152 port 50075 ssh2
Aug 21 08:36:28 serv-vm-adm23 sshd[22888]: pam_unix(sshd:session): session opened for user root by (uid=0)
Aug 21 08:36:28 serv-vm-adm23 sshd[22888]: pam_unix(sshd:session): session closed for user root
Aug 21 08:37:01 serv-vm-adm23 sshd[22964]: Accepted password for root from 192.168.39.152 port 50170 ssh2
Aug 21 08:37:01 serv-vm-adm23 sshd[22964]: pam_unix(sshd:session): session opened for user root by (uid=0)
Aug 21 08:37:01 serv-vm-adm23 sshd[22964]: pam_unix(sshd:session): session closed for user root
Aug 21 08:42:01 serv-vm-adm23 sshd[23300]: Received disconnect from 192.168.39.152: 11: PPA says bye

I already witnessed such a behaviour : I then wait for long enough to discourage.

On the OME web GUI, in the execution details, I see that :

- the file has correctly been downloaded from the Dell ftp site (and I witness it is stored in the OME server, ready to be served)

- the execution details line #0 is hung telling the task is still running, but with the message "TotalStatus Unable to resume system update, or retreive progress. No system update is in progress."

- this hangs forever

I tried to search log files in the /opt/dell/srvadmin to find any useful log files, but this is HUGE and I got lost.

I also tried to :

- wait forever

- restart all the OMSA related services (but rebooting the target has already been tried)

- reinstall OMSA. This changed nothing as OMSA was already perfectly installed, and used for month for various tasks with no issue.

Please tell me if there are log files I could look at both on the OME server and on the target server.

Thank you.

Nicolas ECARNOT

DELL-Pupul M

1K Posts

0

August 22nd, 2014 05:00

Hi Nicolas,

There are log files which can be looked into. But that would be created at the time of system udpate process. I would suggest you to log a trouble ticket @800-945-3355 so that the support guys can have a look at the logs.

Although if you want to do some research, there will be 2 files, runbada.xml and dup.log in templ folder. See, you find something interesting happening.

N

nicolasecarnot

21 Posts

0

August 22nd, 2014 06:00

Hi,

As I wrote it above, I'm french so I preferred to talk with the french Dell support team.

I then opened a ticket here and the guys are having a look.

I'll report here if the situation improves.

Nicolas ECARNOT

N

nicolasecarnot

21 Posts

0

August 25th, 2014 06:00

Hi,

I received today the answer from the support, and it says :

"The issue has not been accepted by the OME experts as CentOS is not part of the supported OS list

Complete supported list can be found here :

ftp://ftp.dell.com/Manuals/all-products/esuprt_software/esuprt_ent_sys_mgmt/esuprt_ent_sys_mgmt_opnmng_essentials/dell-opnmang-essentials-v1.3_User%27s%20Guide2_en-us.pdf ".

As CentOS 6.5 is a clone of Redhat 6.5, do I have to install a Redhat 6.5 to reproduce the issue, and re-open a ticket, or may we all save some time and try to make progress on this bug?

--

Nicolas ECARNOT

DELL-Rob C

2.8K Posts

0

August 26th, 2014 09:00

Hi Nicolas,

Well it could be the case that the CentOS distro does not have all the same packages as RHEL. So this tends to impact our software and how it may or may not work. That's why we tend to certify specific distros...but we can look at this a bit more.

Let me make sure I understand some basic details of your trouble.

1. You have a CentOS managed node.

2. You have installed OMSA on it.

3. This server is discovered using the SNMP protocol and only SNMP (not SSH).

4. The server shows up correctly under the Server node in the device tree

5. You can see inventory and health of the CentOS server.

Is this all correct so far?

Ok, good.

6. Now you are trying to push a BIOS/FW update to that server and it fails.

Is this all correct? Just trying to make sure I understand all the details.

Regards,

Rob

N

nicolasecarnot

21 Posts

0

August 27th, 2014 02:00

1. You have a CentOS managed node.

Correct. CentOS 6.5, 64 bits. selinux disabled, iptables disabled.

2. You have installed OMSA on it.

Correct : 7.4.0. I'm frequently using https://myServerName:1311 and I'm happy with it.

3. This server is discovered using the SNMP protocol and only SNMP (not SSH).

This part is very blur to me :

How can I know what protocol is now associated with this server?

In the details page of the server, I see a table called Data sources showing this :

Server Administrator	7.4.0
Server Administrator (Storage Management)	4.4.0
Inventory Collector Agent	7.4.1 (BLD_266)
Integrated Dell Remote Access Controller 7	1.40.40

In the discovery range, I have set up as many discovery means I could, hoping to maximize the informations that could be retrieved from any device. Was that wrong?

So, in this range, the protocols configured are :

SNMP
WMI (close to useless, no windows there at present)
ICMP
IPMI
SSH

The troubleshooting tool was used to confirm SNMP is working towards this host target.

What is blurred to me is :

At day one, I set up a range with some protocols and some login + password, then run a discovery. Hosts are found. Good.
At day two, I modify the login+passwords inside some hosts, ans modify the login+passwords accordingly in the range protocols. What will happen to the hosts previously discovered and managed by OME?
Anyway, in the present case, I did not change anything, and the troubleshooting tool is also cofirming that the SSH connection is all right.

4. The server shows up correctly under the Server node in the device tree

Correct. It shows up in :

Modular systems > Poweredge M1000e > name_of_the_rack > servers > myLovelyBlade

5. You can see inventory and health of the CentOS server.

Correct. I can see tons of details of it, I can see the alerts tab showing when this blade is up / down / rebooted. I can see the hardware log file showing when I play unplugging / replugging a disk.

6. Now you are trying to push a BIOS/FW update to that server and it fails.

Yes. When pushing it, the first part where the OME server retrieves the .BIN file from Dell's FTP and put it inside its own disk, all this is working fine. The second part where the OME server is opening a SSH session on the CentOS host target is also working fine. Then, the update task does not progress anymore, and the details is showing "TotalStatus Unable to resume system update, or retrieve progress. No system update is in progress.". Well, I already wrote all that in this same thread. I tried to reboot the target and the OME server, with no gain.

In the target host, I see no log in /tmp. The only things that may be relevant is the creation of :

/tmp/hsperfdata_root, which contains a binary, whose strings are showing Java related things.

I'll be glad to show whatever log files Dell OME's experts will order me to show.

--

Nicolas ECARNOT

DELL-Rob C

2.8K Posts

0

August 27th, 2014 08:00

Ok, thanks for the clear answers. This helps a bit.

So on item #3, what I'm asking about is what protocols are checked in the discovery wizard associated with that range for the CentOS managed node. Given that you are using Linux, you should only have SNMP and nothing else checked. Sometimes this can be a source of trouble.

But if you have a bunch of IP addresses in there and _must_ have multiple protocols, so be it. My point is: it is best to only have the protocols you need checked.

We are probably close to the end of what we can do in the forum here since we will need to get the support guys to look at logs and that is a challenge in a forum like this.

But if you can perhaps, dig up the XML file in the tmp folder on the CentOS box that would have the same timestamp as the task execution, perhaps details in there would give us a clue. But as I said, likely you will need to raise for support. But you might take a quick look and see if you can notice anything significant in there.

Rob

N

nicolasecarnot

21 Posts

0

September 18th, 2014 06:00

Just to add some info :

- I just experienced the exact same scenario with a R620 with Oracle Linux 6.5 (you know, the Redhat clone)

DELL-Rob C

2.8K Posts

0

September 18th, 2014 09:00

Ok, so CentOS and Oracle Linux...can you try on a _supported_ OS just to see? :-)

Clone is a bit too strong of a word I think...they do package differently.

But in any event, just to confirm, you did have OMSA on Oracle yes?

And what BIOS DUP was it that failed?

Thanks!

Rob

N

nicolasecarnot

21 Posts

0

September 19th, 2014 04:00

When I'll find some time to play with it, yes, I will try on a supported Redhat.

Yes OMSA is installed and working fine on CentOS 6.5 and OL 6.5.

Good news here : on OL 6.5, things are acting differently :

- I was about to upgrade bios from 2.2.2 to 2.2.3, and though it failed on a OL6.5, I see way more things happening on the server side : I see the packages stored on the server in /var/tmp, with the xml describing the packages list (bundle.xml IIRC ?)

- Then the upgrade itself is failing, but meanwhile, I copied on the fly the big BIOS_blahblah.BIN somewhere else, and was able to run it manually, and it worked.

- Other upgrades not related to BIOS (other firmwares) DID work on this OL 6.5 !

What is difficult is on CentOS, I get really NO LOG FILE, no xml, no /tmp/something, nothing. This is hard to help you help us.

J

Jones16

6 Posts

0

December 31st, 2014 12:00

I don't know if you ever found a solution to this or not, but I spent the better part of the morning trying to track this down as well and in the process of trying to recreate the steps used by OME, I discovered that my server didn't have the zip/unzip utilities as they are not part of the CentOS minimal install. OME appears to send over the update packages as zip files for OMSA to unzip and execute. So make sure you have those commands on your system:

yum install zip unzip

View All

No Events found!

Dell OpenManage Essentials

Unable to resume system update, or retrieve progress.