开始新对话

此帖子已超过 5 年

Solved!

Go to Solution

4726

2013年10月18日 21:00

连接到5300后系统开机起不来

AIX5.1的操作系统,拔掉存储就是好的。

362 消息

2013年10月18日 21:00

这是主机设置BOOT FROM SAN了、

Skip Navigation

United States

Cookie & Privacy Policy

Help with debugging IBM AIX system hang issues when booting off an EMC fibre device.

               Article Number:000046860 Version:1

      Key Information

Audience: Level 30 = Customers

Article Type: Break Fix

Last Published: Fri May 31 00:24:15 GMT 2013

Validation Status: Final Approved

Summary:

Print Friendly View Rate This Article

Article Content

Attachments

Article Metadata

SR Linking

Article Content

Impact

Help with debugging IBM AIX system hang issues when booting off an EMC fibre device.

Issue

Host is booting off an EMC Fibre device

IBM AIX rootvg contains only one EMC device

IBM AIX host gets LED error 552 when booting of an EMC copy of the rootvg

Error msg: LED 552

Environment

Product: Symmetrix

Product: CLARiiON

System: IBM RS/6000

OS: IBM AIX 5.1 and below

OS: IBM AIX and below

Change

Customer made a copy of the rootvg using TimeFinder

Customer made a copy of the rootvg using a RDF

Secondary host booting from Cloned rootvg device

Resolution

When the initial case is open it is assumed that:   1) All pertinent data is collected including but not limited to:  ?An emcgrab from all related hosts

?A detailed description of the Storage Area Network (SAN) including hosts, FC HBAs, and FC Switches

?Details on what the customer is trying to do.details on the exact steps being used

?Details on the error massages and/or problem(s) seen.

2) All components in the environment are configured correctly and at an OS/firmware level that is supported by EMC (referance the EMC Open Host Matrix for details) including but not limited to:

?All CLARiiON parameters and EMC Director flags

?The AIX OS level and EMC AIX ODM file set on all AIX nodes

?The System Firmware on all AIX nodesThe firmware level on all IBM FC HBAs

?The firmware on all Fibre Switches

?All components are in good working order before iplimenting boot off Fibre.

?At no time is there a duplicate PVID for more the one device seen by the same AIX node. (i.e. testing on one AIX node where the STD copy and the BCV copy are seen by the same AIX host.)

If the host is currently hung, reboot the node and go into Maintenance Mode:

When the AIX node hangs trying to boot off an external device, you will need to boot into Maintenance Mode (MM) using disk 1 of the AIX OS install CD.  Install the CD, reboot the node, and before the 5th keyword appears on the banner screen (i.e. mouse, keyboard,.....speaker), hit 5 (F5) to get into MM.  Note that 1 (F1) = SMS and 8 (F8) = Open Firmware mode.  Once in MM select the option to access and mount rootvg.

Collect and send the IPL Data:

When debugging AIX boot issues and the boot images is on an external EMC device (a.k.a. EMC Fibre Boot), the outputs from the list of commands below provide detailed information about the state of the EMC device and how AIX thinks it is currently configured to the host / ODM.  Once rootvg? is mounted run the IPL Data Collect commands below and capture the output to a file for review later.

date

oslevel r

lslpp L " grep EMC

/tmp/inq                               (must use 32-bit inq? in AIX Maintenance Mode)

lsdev -Cc disk

pprootdev fix                       (only if the PVID of rootvg is on an hdiskpower device)

lspv

lsvg -p rootvg

lslv -m hd5

ls -l /dev/ipldevice

ls -l /dev/rhdisk*                   (look for ipldevice? major / minor number match)

bootlist -om normal               (hould match hdisk with ipldevice? major / minor number)

lquerypv -h /dev/rhdiskX      (here X is the device that has the PVID of rootvg?)

Collect and send the AIX boot debugger? data (if needed):

The AIX kernel has an "enter_dbg" variable in it that can be set at the beginning of the boot processing which will cause cfgmgr output to be sent to the system console.  In some cases this data can be useful in debugging boot issues.  The procedure for setting the boot debugger is as follows.

1.Preparing the System

a.  Set up KDB to present an initial debugger screen

        # bosboot -ad /dev/ipldevice -I

b.  Reboot the machine

        # shutdown -Fr

2.Setting up for Kernel boot trace

a.  When the debugger screen appears, set enter_dbg to the value we want to go

         ************* Welcome to KDB *************

         Call gimmeabreak...

         Static breakpoint:

         .gimmeabreak+000000     tweq    r8,r8               r8=0000000A

         .gimmeabreak+000004      blr                        <.kdb_init+0002C0> r3=0

         KDB(0)> mw enter_dbg

         enter_dbg+000000:  00000000  = 42

         xmdbg+000000:  00000000  = .

         KDB(0)> g

Now, detailed boot output will be displayed on the console. It should stop when it hangs (typically at 552 or 554). Capture this output and copy it to a file for review later.

b.  If your system completes booting, you will want to turn enter_dbg off

         ************* Welcome to KDB *************

         Call gimmeabreak...

         Static breakpoint:

         .gimmeabreak+000000     tweq    r8,r8               r8=0000000A

         .gimmeabreak+000004      blr                        <.kdb_init+0002C0> r3=0

         KDB(0)> mw enter_dbg

         enter_dbg+000000:  00000042  = 0

         xmdbg+000000:  00000000  = .

         KDB(0)> g

3.When finished using the boot debugger, disable it by running:

         # bosdebug -o

         # bosboot -ad /dev/ipldevice.

Other Notes and commands that may be useful or needed on occasion:

Feel free to add any other outputs you think will help explain the current problem!  Capturing the outputs of all the commands above may be hard from some AIX System Console setups.  I use a tip? session through a SUN Workstation to help cut and past the info into a text document that I can then email.  You can also use a DOS HyperTerminal window through a PC or laptop connected to the serial port of the AIX node.

lquerypv -h /dev/rhdiskX             (3 outputs possible, a) the PVID, b) prompt after a 30 sec timeout, c) prompt back quickly)

ln /dev/rhdiskX /dev/ipldevice      to create a new link for /dev/ipldevice.

powermt display dev=X               X = the hdiskpower #, or all? to display all.

bosboot -ad /dev/ipldevice          ( -vd to get verbose error info.)

odmget CuAt > /tmp/cuat_ .txt

odmget CuDv > /tmp/cudv_ .txt

odmget PdAt > /tmp/pdat_ .txt

odmget PdDv > /tmp/pddv_ .txt

Notes

Carefully consider the following when deciding to boot your host (any vendor) off the internal host storage or an external storage array (any vendor). When debugging a complex problem in the host system or Storage Area Network (SAN), it is a great asset to be able to temporarily disconnect some or all of the external storage arrays from the host and confirm that the problem does or does not still exist. If the host is booting off the external storage arrays the System Administrator loses this debug tool.

Also see EMC Knowledgebase solution 46816, List of known Boot off Fibre" (Symmetrix or CLARiiON) issues

Sample of IPL collect data:

In MM to access rootvg I selected #3 from the display below.

   1)   Volume Group 0003c89f00004c00000000fedeccfa4f contains these disks:

          hdisk4   3768     20-58-01             hdisk5   3768     20-58-01

          hdisk6   3768     20-58-01

   2)   Volume Group 0003c79f00004c00000000f76d36b466 contains these disks:

          hdisk0   8678 10-88-00-8,0             hdisk1   8678 10-88-00-9,0

   3)   Volume Group 0003c89f00004c00000000ff2b4cd6ca contains these disks:

          hdisk3   9215     20-58-01

I entered 3 here to boot of the rootvg image which is on hdisk3.

# date

Mon Sep 27 16:12:04 UTC 2004

# oslevel -r

5100-03

# lslpp -L " grep EMC

     EMC.Symmetrix.aix.rte      5.1.0.0    C     F    EMC Symmetrix AIX Support

     EMC.Symmetrix.fcp.rte      5.1.0.0    C     F    EMC Symmetrix Fibre Channel

     EMCpower.base                 3.0.5.0    C     F    PowerPath Base Driver and

     EMCpower.multi_path        3.0.5.0    C     F    PowerPath Multi_Pathing

# lsdev -Cc disk

     hdisk0      Available 10-88-00-8,0 16 Bit LVD SCSI Disk Drive

     hdisk1      Available 20-60-01     EMC Symmetrix FCP Disk

     hdisk2      Available 20-60-01     EMC Symmetrix FCP Disk

     hdisk3      Available 20-60-01     EMC Symmetrix FCP Disk

     hdisk4      Available 20-60-01     EMC Symmetrix FCP Disk

     hdisk5      Available 20-60-01     EMC Symmetrix FCP Disk

     hdisk6      Available 20-60-01     EMC Symmetrix FCP Disk

     hdiskpower0 Available 20-60-01     PowerPath Device

     hdiskpower1 Available 20-60-01     PowerPath Device

     hdiskpower2 Available 20-60-01     PowerPath Device

     hdiskpower3 Available 20-60-01     PowerPath Device

     hdiskpower4 Available 20-60-01     PowerPath Device

     hdisk8      Available 20-58-01     EMC Symmetrix FCP Disk

     hdisk9      Available 20-58-01     EMC Symmetrix FCP Disk

     hdisk10     Available 20-58-01     EMC Symmetrix FCP Disk

     hdisk11     Available 20-58-01     EMC Symmetrix FCP Disk

     hdisk12     Available 20-58-01     EMC Symmetrix FCP Disk

     hdisk13     Available 20-58-01     EMC Symmetrix FCP Disk

# lspv

     hdisk0                0003c79ffad466aa         None

     hdisk1                none                                None

     hdisk2                0003c89fd4026412         rootvg    <-- PVID of 'rpptvg'

     hdisk3                none                                None

     hdisk4                none                                None

     hdisk5                none                                None

     hdisk6                none                                None

     hdiskpower0      none                                None

     hdiskpower1      0003c89fb704e821         None

     hdiskpower2      0003c89fd8f622ea          None

     hdiskpower3      0003c89fd8f6239e          None

     hdiskpower4      0003c89fdeccecad          None

     hdisk8                none                                None

     hdisk9                0003c89fd4026412         rootvg

     hdisk10         none                                None

     hdisk11         none                                None

     hdisk12         none                                None

     hdisk13         none                                None

# lsvg -p rootvg

rootvg:

PV_NAME           PV STATE          TOTAL PPs   FREE PPs    FREE DISTRIBUTION

hdisk2                active                575              465             114..83..38..115..115

# lslv -m hd5

hd5:N/A

LP       PP1    PV1       PP2  PV2         PP3  PV3

0001  0001   hdisk2

Notice, up to now, it looks like hdisk2 or hdisk9 is the boot device.

# ls -l /dev/ipldevice

brw-------   2 root     system       21,  1 Sep 27 16:12 /dev/ipldevice

# ls -l /dev/rhdisk*

crw-------   1 root     system       14,  1 Sep 27 16:12 /dev/rhdisk0

crw-------   1 root     system       14,  2 Sep 27 16:12 /dev/rhdisk1

crw-------   1 root     system       21,  3 Sep 27 16:12 /dev/rhdisk2

crw-------   1 root     system       21,  1 Sep 27 16:12 /dev/rhdisk3

crw-------   1 root     system       21,  2 Sep 27 16:12 /dev/rhdisk4

crw-------   1 root     system       21,  4 Sep 27 16:12 /dev/rhdisk5

crw-------   1 root     system       21,  0 Sep 27 16:12 /dev/rhdisk6

# bootlist -om normal

-

But, notice now hdisk3 is pointed to by /dev/ipldevice as the boot device!!  This is a common problem.  To confirm which hdisk is really the boot device, use the lquerypv command as shown below.  It is one of the few commands that actually read the data from disk rather then from the ODM.

# lquerypv -h /dev/rhdisk2

00000000   00000000 00000000 00000000 00000000  "................"

00000010   00000000 00000000 00000000 00000000  "................"

00000020   00000000 00000000 00000000 00000000  "................"

00000030   00000000 00000000 00000000 00000000  "................"

00000040   00000000 00000000 00000000 00000000  "................"

00000050   00000000 00000000 00000000 00000000  "................"

00000060   00000000 00000000 00000000 00000000  "................"

00000070   00000000 00000000 00000000 00000000  "................"

00000080   00000000 00000000 00000000 00000000  "................"   <-- has no PVID

00000090   00000000 00000000 00000000 00000000  "................"

000000A0   00000000 00000000 00000000 00000000  "................"

000000B0   00000000 00000000 00000000 00000000  "................"

000000C0   00000000 00000000 00000000 00000000  "................"

000000D0   00000000 00000000 00000000 00000000  "................"

000000E0   00000000 00000000 00000000 00000000  "................"

000000F0   00000000 00000000 00000000 00000000  "................"

# lquerypv -h /dev/rhdisk3

00000000   C9C2D4C1 00000000 00000000 00000000  "................"

00000010   00000000 00000000 00000000 00000000  "................"

00000020   00000000 0000479D 00000000 00001100  "......G........."

00000030   00000000 00000000 00000000 00000000  "................"

00000040   01000100 000057B8 000057B8 00000000  "......W...W....."

00000050   00000000 00000000 00000000 00000000  "................"

00000060   00000000 0000479D 00000000 00001100  "......G........."

00000070   00000000 00000000 00000000 00000000  "................"

00000080   0003C89F D4026412 00000000 00000000  "......d........."  <-- PVID of rootvg!

00000090   00000000 00000000 00000000 00000000  "................"

000000A0   00000000 00000000 00000000 00000000  "................"

000000B0   00000000 00000000 00000000 00000000  "................"

000000C0   00000000 00000000 00000000 00000000  "................"

000000D0   00000000 00000000 00000000 00000000  "................"

000000E0   00000000 00000000 00000000 00000000  "................"

000000F0   00000000 00000000 00000000 00000000  "................"

To fix the issue in this scenario, run rmdev-dl hdisk2, rmdev-dl hdisk9 and reboot the node.

Sample of boot debugger? data

In this case I first searched for the second occurrence of ln, looked at the lines just before the hang at 552 and made the notes in Blue.

showled + + bootinfo -b

dvc=hdisk5

+ [ ! hdisk5 ]

+ echo rc.boot: boot device is hdisk5

+ 1>> /tmp/boot_log

+ ln /dev/rhdisk5 /dev/ipldevice        <-- Boot device is pointing to hdisk5 which should be the correct device.

+ exit 0

+ PHASE=2

+ + bootinfo -p

PLATFORM=chrp

+ [ ! -x /usr/lib/boot/bin/bootinfo_chrp ]

+ [ 2 -eq 1 ]

+ + bootinfo -t

BOOTYPE=1

+ [ 0 -ne 0 ]

+ [ -z 1 ]

+ unset pdev_to_ldev native_netboot_cfg

+ unset disknet_odm_init config_ATM

+ /usr/lib/methods/showled 0x551 VARYON IPL DEV

showled + echo rc.boot: executing "ipl_varyon -v"

+ 1>> /tmp/boot_log

+ ipl_varyon -v

Starting device is ipldevice

Starting device's PVID: 0000957a978019d40000000000000000

Root VGID: 0000957a00004c00

hdiskpower0 is in boot disk's VGDA       <-- When ipl_varyon is called it scans the ODM and finds a match for the PVID but it's the incorrect device. It should be hdisk5

NO_QUORUM 1158

Number of PVs: 1

ERROR: lvm_VaryonVG() rc=-146

+ rc=1

+ loopled 0x552 IPLVARYON ERROR     <-- Here is the 552 LED

Notes (Employees and Partners)

Note: If booting off a copy of rootvg, see Solution 46894

Attachments

Article Metadata

Product

PowerPath, Symmetrix, CLARiiON, TimeFinder, CLARiiON CX Series, Symmetrix Remote Data Facility (SRDF)

Requested Publish Date

5/25/2013 2:07 PM

External Source

Primus

Primus/Webtop solution ID

emc96534

SR Linking : 10 of 11

SR Number

Linked

Solved My Problem

Linking User

Linked Date / Time

12849375

CSKImportAdmin

4/14/2005 4:10 PM

12921990

CSKImportAdmin

4/25/2005 7:40 PM

13243886

CSKImportAdmin

6/16/2005 9:30 PM

13734335

CSKImportAdmin

9/12/2005 9:22 AM

13861351

CSKImportAdmin

9/27/2005 2:47 PM

14307205

CSKImportAdmin

12/10/2005 6:46 PM

15234554

CSKImportAdmin

5/19/2006 1:27 AM

28489280

CSKImportAdmin

3/15/2009 11:46 AM

28584616

CSKImportAdmin

4/5/2009 10:56 AM

31293640

CSKImportAdmin

11/16/2009 9:22 AM

42 消息

2013年10月19日 22:00

操作系统重装试试

4K 消息

2013年10月20日 06:00

我也怀疑是设置了Boot from SAN。开不起来的具体症状是什么?屏幕上有任何报错不?

找不到事件!

Top