Well, for one thing, that trace you posted shows a lot of output from nodeboard 2 .. 4 (prefixed 2A .. 4A), but nothing from 1A, and only one message from 1B, which isn't normal. If you swap nodeboards 1 and 4, do you get output from 1A .. 3A, but not 4A (i.e. does the error move with the nodeboard)?
Quote:
what mean Local master entering slave loop ?
There is located the problem
With nodeboard #1 down, #2 will become the local master. It doesn't have the console though, so it goes into slave loop. If you had another compute module craylinked to this one you could (probably) still access your working nodeboards.
For reference, here's a trace of another Onyx2, a deskside with 2 good nodeboards but a broken IO6G. Here too you end up with a headless node:
Code:
1A 000: Starting PROM Boot process
2A 000: Starting PROM Boot process
1A 000:
1A 000:
1A 000: IP27 PROM SGI Version 6.156 built 11:27:56 AM Nov 18, 2003
2A 000: WARNING: xbow_base: 0x9200000000000000 link: 15 Widget present, but link not
1A 000: *** Warning: MSC debug (dbg) switches are non-zero
2A 000: alive!
1A 000: *** Diag level set to None (2)
2A 000: WARNING: xbow_base: 0x9200000000000000 link: 15 Widget present, but link not
2A 000: alive!
2A 000:
2A 000:
2A 000: IP27 PROM SGI Version 6.156 built 11:27:56 AM Nov 18, 2003
2A 000: *** Warning: MSC debug (dbg) switches are non-zero
2A 000: *** Diag level set to None (2)
1A 000: Testing/Initializing memory ............... DONE
1B 000: Testing/Initializing memory ............... DONE
2A 000: Testing/Initializing memory ............... DONE
2B 000: Testing/Initializing memory ............... DONE
1A 000: Copying PROM code to memory ............... DONE
2A 000: Copying PROM code to memory ............... DONE
1A 000: Discovering local IO ...................... WARNING: xbow_base: 0x920000000
1A 000: 0000000 link: 15 Widget present, but link not alive!
1A 000: DONE
2A 000: Discovering local IO ...................... WARNING: xbow_base: 0x920000000
1A 000: Discovering NUMAlink connectivity ......... DONE
2A 000: 0000000 link: 15 Widget present, but link not alive!
1A 000: Found 3 objects (2 hubs, 1 routers) in 66354 usec
2A 000: WARNING: xbow_base: 0x9200000000000000 link: 15 Widget present, but link not
2A 000: alive!
2A 000: DONE
2A 000: Discovering NUMAlink connectivity ......... DONE
2A 000: Found 3 objects (2 hubs, 1 routers) in 1781 usec
2A 000: Waiting for peers to complete discovery.... DONE
2A 000: Recognized 390 MHz midplane
2A 000: *** Global master /hw/module/1/slot/n2 does not have a console
2A 000: Global master is /hw/module/1/slot/n2
1A 000: Waiting for peers to complete discovery.... DONE
1A 000: Recognized 390 MHz midplane
1A 000: *** Global master /hw/module/1/slot/n2 does not have a console
1A 000: Global master is /hw/module/1/slot/n2
2A 0001A 0Testing/Initializing all memory ........... DONE
2A 001:Testing/Initializing all memory ........... DONE
1A 000:Checking partitioning information ......... DONE
1A 000: *** Partition master /hw/module/1/slot/n2 does not have a console
2A 001:Checking partitioning information ......... DONE
2A 001: *** Partition master /hw/module/1/slot/n2 does not have a console
1A 000: nic_read_mfg: invalid crc16 reading redirection map page 3
1B 000: Local slave entering slave loop
1A 000:Local master entering slave loop
2B 001: Local slave entering slave loop
2A 001:*** No console found. Searching for console...
2A 001: *** No console found. You need a console to proceed.
2A 001: *** To recover: Add a BASEIO board and reset.
2A 001:
2A 001: *** Entering POD mode on node 1
2A 001: POD MSC Cac>
Also not good, but probably not the cause of your problems: you've got firmware rev 6.94 on your nodeboards, and your base I/O had rev 6.103 first and 6.80 later. Those are old (6.156 is current) and don't match so apparently you've been swapping parts. Anything else you wish to share with us that could help?
_________________
Now this is a deep dark secret, so everybody keep it quiet 
It turns out that when reset, the WD33C93 defaults to a SCSI ID of 0, and it was simpler to leave it that way... -- Dave Olson, in comp.sys.sgiCurrently in commercial service:

(2x)

In the
museum: almost every MIPS/IRIX system.
Wanted: GM1 board for Professional Series GT graphics (030-0076-003, 030-0076-004)