New Tezro owner

New to SGIs? Need help getting things going? This is the forum for you!
Forum rules
Any posts concerning pirated software or offering to buy/sell/trade commercial software are subject to removal.
User avatar
marshallh
Posts: 18
Joined: Tue Nov 03, 2009 12:53 pm

Re: New Tezro owner

Unread postby marshallh » Sat Nov 11, 2017 9:35 am

The backplane side, which has a Y-shaped split contact. The nodeboard has flat bladed contacts.


Putting in original 2x800 just spews pci/io sanity errors imediately on PROM boot.

Back to 4x700 which at least boots prom consistently:
POD diags fail:

Code: Select all

A 000 001c01: POD SysCt Dex> dgxbow
A 000 001c01:
A 000 001c01: =====> xbow_sanity diag took an exception. <=====
A 000 001c01:  EPC    : 0xffffffffbfc0fa84
A 000 001c01:  BadVA  : 0xd7fffffff5ff3dbf
A 000 001c01:  Cause  : 0x000000000000801c
A 000 001c01: Hardware Error State: (Forced error dump)
A 000 001c01: +  Errors on node Nasid 0x0 (0)
A 000 001c01: +    IP35 in /hw/module/001c01/node [serial number MVX791]
A 000 001c01: +      BEDROCK signalled following errors.
A 000 001c01: +        BEDROCK PI 0 Error Status 0 A Register: 0x80000000b73c1e0
1
A 000 001c01: +          02<->00: rrb error type 1 Uncached Partial Error
A 000 001c01: +          16<->06: message supplemental 0x78
A 000 001c01: +          24<->17: message command 0x9e Reply(PRERR)
A 000 001c01: +          61<->25: error address 0x5b << 3 = (0x2d8)
A 000 001c01: +          63<->62: error status valid (no over_run)
A 000 001c01: +        BEDROCK PI 0 Error Status 1 A Register: 0x14c80000000000
A 000 001c01: +          52<->43: crb status 0x299
A 000 001c01: +        BEDROCK IIO Widget Status Register: 0x100000210
A 000 001c01: +          32: Error (crazy) bit set
A 000 001c01: +        PRB[0]: 0x2040000000003
A 000 001c01: +          49<->49: Read Response Timeout
A 000 001c01: +        BEDROCK NI Port Error Register: 0xff
A 000 001c01: +          07<->00: Number of LLP SN errors 0xff
A 000 001c01: END Hardware Error State (Forced error dump)
A 000 001c01: xbow_sanity failed: Took an exception
A 000 001c01: RSLT xbow_sanity    FAIL                diag_rc = 80  Took an exce
ption
A 000 001c01: POD SysCt Dex> dgbrdg
A 000 001c01:
A 000 001c01: =====> bridge_sanity diag  took an exception. <=====
A 000 001c01:  EPC    : 0xc00000001fc42188
A 000 001c01:  BadVA  : 0xd7fffffff5ff3dbf
A 000 001c01:  Cause  : 0x000000000000801c
A 000 001c01: Hardware Error State: (Forced error dump)
A 000 001c01: +  Errors on node Nasid 0x0 (0)
A 000 001c01: +    IP35 in /hw/module/001c01/node [serial number MVX791]
A 000 001c01: +      BEDROCK signalled following errors.
A 000 001c01: +        BEDROCK PI 0 Error Status 0 A Register: 0xc0000000b73c1e0
1
A 000 001c01: +          02<->00: rrb error type 1 Uncached Partial Error
A 000 001c01: +          16<->06: message supplemental 0x78
A 000 001c01: +          24<->17: message command 0x9e Reply(PRERR)
A 000 001c01: +          61<->25: error address 0x5b << 3 = (0x2d8)
A 000 001c01: +          63<->62: error status valid (over_run)
A 000 001c01: +        BEDROCK PI 0 Error Status 1 A Register: 0x14c80000000001
A 000 001c01: +          20<->00: spool count 0x1
A 000 001c01: +          52<->43: crb status 0x299
A 000 001c01: +        BEDROCK PI 0 Error spool A:
A 000 001c01: +           Entry 0:
A 000 001c01: +            Cmd 0x9e(Reply:PRERR), RRB stat: P-------   CRB #0, T
5 req #0, supp 0
A 000 001c01: +            Error 1 Uncached Partial Error, Cache line address 0x
f000000 (uattr 1)
A 000 001c01: +        BEDROCK IIO Widget Status Register: 0x100000210
A 000 001c01: +          32: Error (crazy) bit set
A 000 001c01: +        PRB[0]: 0x2040000000003
A 000 001c01: +          49<->49: Read Response Timeout
A 000 001c01: +        BEDROCK NI Port Error Register: 0xff
A 000 001c01: +          07<->00: Number of LLP SN errors 0xff
A 000 001c01: END Hardware Error State (Forced error dump)
A 000 001c01: bridge_sanity failed: Took an exception
A 000 001c01: RSLT bridge_sanity  FAIL                diag_rc = 81  Took an exce
ption
A 000 001c01: POD SysCt Dex> dgpci
A 000 001c01:
A 000 001c01: =====> pcibus_sanity diag took an exception. <=====
A 000 001c01:  EPC    : 0xc00000001fc47b60
A 000 001c01:  BadVA  : 0xd7fffffff5ff3dbf
A 000 001c01:  Cause  : 0x000000000000801c
A 000 001c01: Hardware Error State: (Forced error dump)
A 000 001c01: +  Errors on node Nasid 0x0 (0)
A 000 001c01: +    IP35 in /hw/module/001c01/node [serial number MVX791]
A 000 001c01: +      BEDROCK signalled following errors.
A 000 001c01: +        BEDROCK PI 0 Error Status 0 A Register: 0xc0000000b73c1e0
1
A 000 001c01: +          02<->00: rrb error type 1 Uncached Partial Error
A 000 001c01: +          16<->06: message supplemental 0x78
A 000 001c01: +          24<->17: message command 0x9e Reply(PRERR)
A 000 001c01: +          61<->25: error address 0x5b << 3 = (0x2d8)
A 000 001c01: +          63<->62: error status valid (over_run)
A 000 001c01: +        BEDROCK PI 0 Error Status 1 A Register: 0x14c80000000002
A 000 001c01: +          20<->00: spool count 0x2
A 000 001c01: +          52<->43: crb status 0x299
A 000 001c01: +        BEDROCK PI 0 Error spool A:
A 000 001c01: +           Entry 0:
A 000 001c01: +            Cmd 0x9e(Reply:PRERR), RRB stat: P-------   CRB #0, T
5 req #0, supp 0
A 000 001c01: +            Error 1 Uncached Partial Error, Cache line address 0x
f000000 (uattr 1)
A 000 001c01: +           Entry 1:
A 000 001c01: +            Cmd 0x9e(Reply:PRERR), RRB stat: P-------   CRB #0, T
5 req #0, supp 0
A 000 001c01: +            Error 1 Uncached Partial Error, Cache line address 0x
f000000 (uattr 1)
A 000 001c01: +        BEDROCK IIO Widget Status Register: 0x100000210
A 000 001c01: +          32: Error (crazy) bit set
A 000 001c01: +        PRB[0]: 0x2040000000003
A 000 001c01: +          49<->49: Read Response Timeout
A 000 001c01: +        BEDROCK NI Port Error Register: 0xff
A 000 001c01: +          07<->00: Number of LLP SN errors 0xff
A 000 001c01: END Hardware Error State (Forced error dump)
A 000 001c01: pcibus_sanity failed: Took an exception
A 000 001c01: RSLT pcibus_sanity  FAIL                diag_rc = 83  Took an exce
ption

User avatar
Irinikus
Posts: 383
Joined: Wed Apr 27, 2016 4:25 am
Location: Cape Town, South Africa

Re: New Tezro owner

Unread postby Irinikus » Sat Nov 11, 2017 9:43 am

Confirm that this node board was working properly when you received the machine?

You are probably going to have to take a much closer look at the connectors on the mid-plane, to ensure that the Y-shaped pins are perfectly aligned. (use a magnifying glass if necessary.)

It's a pity that the damage is on the mid-plane side, as a new one of these will probably be more difficult to source.
............Image...............Image Image ImageImage Image Image Image Image...ImageSnapshot
.............Image................Image Image Image Image Image Image Image Image......ImageSnapshot
Image...Image .........................................................ImageSnapshot
......Image..........Image ...........................................................ImageSnapshot
............Image..............Image Image........................................................ImageSnapshot

User avatar
marshallh
Posts: 18
Joined: Tue Nov 03, 2009 12:53 pm

Re: New Tezro owner

Unread postby marshallh » Sat Nov 11, 2017 11:07 am

Yes, these boards both worked fine before.

Debug 0x10d verbose boot on both node boards:

4x700

Code: Select all

001c01-L1>power up
INFO: PIMM type changed, setting the default voltage margins

001c01 ATTN: Cooling system stabilized
001c01-L1>
entering console mode  001c01 CPU0, <CTRL_T> to escape to L1
Starting PROM Boot process
hubii_link_good: 9-brick attached to module 001c01.


IP35 PROM SGI Version 6.210  built 02:33:51 PM Aug 26, 2004
  built for bedrock rev. 1.1 or greater
SN12 Workstation.
Local master CPU A revision: f41
Local slave CPU B revision: f41
Local slave CPU D revision: f41
Local slave CPU C revision: f41
PROM length: 0x1686a8, BSS length: 0xa7a0, flash count: 49
Configured bedrock clock: 200.0 MHz
Status of local IO: 0x0 0x3fc03ff440a
Bedrock Rev: 2, Module: 1 (001c01) from Sys Ctlr
On PROM entry: ERR_EPC=0xc00000001fc02eac (0xc00000001fc02eac)
Configuring memory
Local memory configured: 4096 MB (premium)
*** Warning: System controller debug switches are non-zero (0x10d)
*** Diag level set to None (2)
*** Info level set to verbose
*** Boot stop requested at Global (2)
before reading NICHub NIC: 0x5448ca9b
SR1 set to 0x6000081690348000
SR0 set to 0x000000005448ca9b
Testing/Initializing memory ...............             DONE
Copying PROM code to memory ...............             Copy PROM (0x90000000180
00000) to RAM (0x9600000001a00000), len 0x1686a8
Done
DONE
Skipping secondary cache diags
Skipping secondary cache diags
Skipping secondary cache diags
Skipping secondary cache diags
CPU B switching stack into UALIAS and invalidating D-cache
CPU A switching stack into UALIAS and invalidating D-cache
CPU C switching stack into UALIAS and invalidating D-cache
CPU D switching stack into UALIAS and invalidating D-cache
CPU B switching into node 0 cached RAM
CPU A switching into node 0 cached RAM
CPU B running cached
CPU A running cached
CPU C switching into node 0 cached RAM
CPU D switching into node 0 cached RAM
CPU C running cached
CPU D running cached
Initializing kldir.
Done initializing kldir.
Initializing klconfig.
init_klcfg: nasid 0 start 9600000000030000 size 10000
Done initializing klconfig.
CPU A initialized subnode
CPU C initialized subnode
Discovering NUMAlink connectivity .........
Local hub NUMAlink is down.
*** Local network link down
DONE
Found 1 objects (1 hubs, 0 routers) in 5892 usec
Waiting for peers to complete discovery....             Discovery results:
ENTRY 0: HUB(5448ca9b)
    NASID=-1 Mod=1 Flg=0x9400000 PROM=6.210 Route=N/A
    MODULE=001c01 PARTITION=0 SPACE=RESET
    Port 1 connection: Not connected
    Port status: NF
DONE
No other nodes present; becoming global master
Global master is entry 0, NIC 0x5448ca9b, /hw/rack/001/bay/01
Global master is /hw/rack/001/bay/01
Global barrier (line 4315)Global barrier passed.
Global barrier (line 4348)Global barrier passed.
Master System Topology Graph (pre-nasid_assign):
Local Slave : Waiting for my NASID ...
ENTRY 0: HUB(5448ca9b)
    NASID=-1 Mod=1 Flg=0x9400000 PROM=6.210 Route=N/A
    MODULE=001c01 PARTITION=0 SPACE=RESET
Local Slave : Waiting for my NASID ...
Local Slave : Waiting for my NASID ...
    Port 1 connection: Not connected
    Port status: NF
Calculating NASIDs
num_routers is 0
Master System Topology Graph:
ENTRY 0: HUB(5448ca9b)
    NASID=0 Mod=1 Flg=0x9400000 PROM=6.210 Route=N/A
    MODULE=001c01 PARTITION=0 SPACE=RESET
    Port 1 connection: Not connected
    Port status: NF
Distributing routing tables
Distributing NASIDs
*** NASID assigned to 0
CPU B switching to UALIAS
CPU C switching to UALIAS
CPU D switching to UALIAS
CPU A switching to UALIAS
CPU C running in UALIAS
CPU D running in UALIAS
CPU B running in UALIAS
CPU C Flushing and invalidating caches
CPU A running in UALIAS
CPU B Flushing and invalidating caches
CPU D Flushing and invalidating caches
Changing node ID to 0
Global barrier (line 4823)Global barrier passed.
CPU A Flushing and invalidating caches
Global barrier (line 4928)Global barrier passed.
CPU B switching to node 0 cached RAM
CPU D switching to node 0 cached RAM
CPU B running cached
CPU D running cached
CPU A switching to node 0 cached RAM
CPU C switching to node 0 cached RAM
CPU A running cached
CPU C running cached
Nasids in partition:  +0
Regions in partition:  +0
Intializing any CPUless nodes..............             Global barrier (line 771
4)Global barrier passed.
Global barrier (line 7715)Global barrier passed.
DONE
Global barrier (line 5089)Global barrier passed.
hubii_link_good: No I/O brick attached to module 001c01.
nasid 0 ilcsr = 0x3fc03ff440a
Checking partitioning information .........             DONE
No other nodes present; becoming partition master
*** After partitioning ***
ENTRY 0: HUB(5448ca9b)
    NASID=0 Mod=1 Flg=0x9400000 PROM=6.210 Route=N/A
    MODULE=001c01 PARTITION=0 SPACE=RESET
    Port 1 connection: Not connected
    Port status: FE
Erecting partition fences ................                        DONE
Update config for routers connected to hubs
Update config for hubs and hubless routers
CPU B flushing cache
CPU C flushing cache
CPU A flushing cache
CPU D flushing cache
check_router_cfg: nasid 0 is_voyager 0 check_cfg = 0
Global barrier (line 5300)Global barrier passed.
Nasids in partition:  +0
Regions in partition: Local slave entering slave loop
Local slave entering slave loop
 +0Local slave entering slave loop

A 000 001c01:
A 000 001c01: *** Entering POD mode on node 0



2x800

Code: Select all

Starting PROM Boot process
hubii_link_good: 9-brick attached to module 001c01.
HUB at 0x0 attached as widget 0xa
001c01/0xa/xbow_arb: nasid= 0x0 xbow_base= 0x9200000000000000
001c01/0xa/xbow_arb: 631 SN[012] master is 0xa
Check_master: link 10 is master
hubii_link_good: 9-brick attached to module 001c01.
Check_master: link 10 is master
A 000 001c01:
A 000 001c01: *** General Exception on node 0
A 000 001c01: *** EPC: 0xc00000001fc47b60 (0xc00000001fc47b60)
A 000 001c01: *** Press ENTER to continue.



Curious what "9-brick" is?
I am suspecting the power supply may be failing. If so, it's probably doing so too quickly for L1 to notice.

Guess I'll have to find another Tezro to know for sure :D

User avatar
Irinikus
Posts: 383
Joined: Wed Apr 27, 2016 4:25 am
Location: Cape Town, South Africa

Re: New Tezro owner

Unread postby Irinikus » Thu Nov 16, 2017 8:01 pm

Have you got any closer to solving your Tezro's issue yet?
............Image...............Image Image ImageImage Image Image Image Image...ImageSnapshot
.............Image................Image Image Image Image Image Image Image Image......ImageSnapshot
Image...Image .........................................................ImageSnapshot
......Image..........Image ...........................................................ImageSnapshot
............Image..............Image Image........................................................ImageSnapshot

User avatar
EdwardSB711
Posts: 7
Joined: Sun Oct 14, 2012 7:36 am

Re: New Tezro owner

Unread postby EdwardSB711 » Sun Nov 26, 2017 3:22 pm

Nice Tezro......

I noticed you have Premium Memory and standard memory installed? try just two sticks of standard....

Your 3D video card may be smoked?????? or a small component on the card may have gone bye bye...could be repairable?

Power supply may be on its way out.....take it apart and test all caps????

There is a guy on youtube who may have a spare 32 mb video card?

I bought my Fuel from him a few years ago, he had a stash of video cards?

This guy is a bad ass hardware trouble shooter, can fix any SUN system...pretty good with SGI....

POWER ANIMATOR 8.5

mopar5150
Posts: 558
Joined: Tue Apr 24, 2012 6:02 pm
Location: Palm Springs, CA
Contact:

Re: New Tezro owner

Unread postby mopar5150 » Mon Nov 27, 2017 8:20 am

I will send you another midplane to try out, I have seen a bad IO9 board give exceptions as well.
If the thing isn't on fire it's a software problem.

:Tezro: :O3x0: :A350:

User avatar
marshallh
Posts: 18
Joined: Tue Nov 03, 2009 12:53 pm

Re: New Tezro owner

Unread postby marshallh » Mon Nov 27, 2017 12:56 pm

Many thanks! I've booted without IO9 to the same errors. The midplane really doesn't have any stuff on it besides the L1 and the two PIC PCI<>XIO bridges. I'm not really familiar enough with the IP35 architecture to infer more.


Return to “Getting Started, Documentation, Tips & Tricks”

Who is online

Users browsing this forum: No registered users and 2 guests