Nekochan Net

Official Chat Channel: #nekochan // irc.nekochan.net
It is currently Fri Oct 31, 2014 3:42 am

All times are UTC - 8 hours [ DST ]


Forum rules


Any posts concerning pirated software or offering to buy/sell/trade commercial software are subject to removal.



Post new topic Reply to topic  [ 6 posts ] 
Author Message
 Post subject: IO4 stuffing the Onyx up
Unread postPosted: Wed Oct 01, 2014 5:33 pm 
Offline
User avatar

Joined: Mon Aug 28, 2006 6:29 pm
Posts: 4311
Location: Kamloops, BC
I powered the machine on and went for a nap (and forgot to actually boot it so it sat at a prompt for five hours). When I got up and booted the system ten minutes ago it went down before the OS could do anything.

Code:
                 Starting up the system in single user mode...

Loading dksc(0,1,8)/sash: 896+111372+16725+3848 entry: 0xa80000001a64791c
3938268+850980 entry: 0xa8000000000076e0
PANIC: Bad IO Adaptor type 32 slot 3 adap 2
Exception: <vector=XUT>
Status register: 0xa2<IPL=8,KX,UX,MODE=KERNEL>
Cause register: 0xa808<CE=0,IP8,IP6,IP4,EXC=RMISS>
Exception PC: 0xa800000000006064, Exception RA: 0xa8000000000060bc
Read TLB miss exception, bad address: 0x3a0

*** Error/TimeOut Interrupt(s) Pending: 0x1000 ==
  Parity error on data from D-chip [15:0]

  VID #0's ARCS PDA:  &pda 0xa800000001898ba0, &regs 0xa80000000189f3a0, magic 0
xadacab
  vid 0, pid 8, init_sp 0x0, fault_sp 0xa800000001996120, stack_mode 1
  mode_sv 0, EPC_sv 0xa800000000006064, AT_sv 0x0, badvaddr_sv 0x3a0
  ErrEPC_sv 0x640420000242400, CacheErr_sv 0x1de1ff1c, cause_sv 0xa808, v0_sv 0x
0
  SP_sv 0xa8000000003ae200, SR_sv 0xa2, exc_sv 0x4, return_addr_sv 0xa8000000000
060bc
  notfirst 0x1, firstEPC 0xa800000000006064, nofault 0x0

PANIC: Unexpected exception

Hmm. I powered down and went to reseat the IO4. The main regulator blocks seemed excessively hot (nearly burned my hand) but the PROM didn't record temperature issues. After reseating the IO4 it started to boot, then it tanked again.

Code:
++FRU ANALYSIS BEGIN
++
++
++                      FRU Analysis Summary
++
++      IO4 BOARD
++              IO4 board in slot 3: 70% confidence.
++
++FRU ANALYSIS END
HARDWARE ERROR STATE:
+  IP25 in slot 2
+    CC in Cpu Slot 2, cpu 0
+      CC ERTOIP Register: 0x2000
+        13:Parity error on data from D-chip [31:16]
+      CC Error Address Register: 0x5046bbd8020
+        cause: read response error(1)
+        address: 0x46bbd8020
+  IO4 board in slot 3
+      IA IBUS Error Register: 0x50800
+        11: PIO ReadResponse Data Error
+        18..16: IOA number of Transaction: 5 (DANG)
+      IA EBUS Error Register: 0x2
+         1: My DATA_ERROR Received

DOUBLE PANIC: CPU 0: TLBMISS: KERNEL FAULT
PC: 0xa8000000000fe9e8 ep: 0xa8000000003ae098
EXC code:8, `Read TLB Miss '
Bad addr: 0xc0000fc000000000, cause: 0xa008<CE=0
,IP8,IP6,EXC=RMISS>
sr: 0xa3<IPL=8,KX,UX,MODE=KERNEL,EXL,IE>
Reboot started from CPU 0


Could this be a thermal issue (I am running an extra fan in the shop right now to keep air moving around) or when the -12v had a tant blow a month or so ago might that of caused something else to go screwy?

Edited: I pulled the system apart and reseated everything, including the ram before restarting with the bare three boards. The MC3 started to making a squeal and put the machine into POKA FAIL mode so I switched it out with a spare (Thank you TriOx!) and was able to boot back up into single user and eventually bring the entire system up. I'm also running with the door open and a barn fan in front of it in the ol' Crimson style! 8-) Still curious as to what might of happened. I'll try adding boards and reinstalling memory and see if the problem comes back.

EDIT 2: System came up with all the ram installed (though the slots were dirty so it took a bit of "persuasion"). Now attempting to reinstall the mezz boards.

EDIT 3: I think the second problem was that Irix does not like the ASO moving around after it's been installed and the system reconfigured. Once I switched the mezz boards around (I must of shuffled them when I repaired the VCAM) the system came up reliably and seeing how none of the previous errors were related to the ATM, Sirius or video boards we should be okay to reinstall everything else. I still don't know what caused the initial failure though.. :?

EDIT 4: The system is completely up now. Crisis averted.

_________________
:Crimson: :Onyx: :O2000: :O200: :O200: :PI: :PI: :Indigo: :Indigo: :Indigo: :Octane: :O2: :1600SW: :Indigo2: :Indigo2: :Indigo2IMP: :Indigo2IMP: :Indy: :Indy: :Indy: :Cube:

Image <-------- A very happy forum member.


Top
 Profile  
 
Unread postPosted: Thu Oct 02, 2014 9:25 am 
Offline
User avatar

Joined: Mon Aug 28, 2006 6:29 pm
Posts: 4311
Location: Kamloops, BC
Dammit. Started it up this morning and halfway through the boot it tanked. Rebooted and tried agian. Crashed even sooner.

Pulled the board out and cleaned both the mezz connector and the ASIC socket. Rebooted and it came up fine though I have the front open again to push more air through with the box fan. Two variables here. Either the board is not liking the ambient 25c in the shop with the system closed up or there's a bad connection that flakes out.

...or the unthinkable. I've cooked my ASO. :sad:

EXPERIMENT TIME! Image

I'll run the system for a few hours with the front down and the fan up, then see if it survives a reboot, close the front and run it for a few more hours and see if it knocks itself out then.

_________________
:Crimson: :Onyx: :O2000: :O200: :O200: :PI: :PI: :Indigo: :Indigo: :Indigo: :Octane: :O2: :1600SW: :Indigo2: :Indigo2: :Indigo2IMP: :Indigo2IMP: :Indy: :Indy: :Indy: :Cube:

Image <-------- A very happy forum member.


Top
 Profile  
 
Unread postPosted: Thu Oct 02, 2014 1:14 pm 
Offline
User avatar

Joined: Thu Jun 17, 2004 11:35 am
Posts: 3930
Location: Wijchen, The Netherlands
Did you read the Challenge/Onyx diagnostics Roadmap? It is a must read when dealing with a sick Onyx

Your first error message is:
Code:
PANIC: Bad IO Adaptor type 32 slot 3 adap 2

Your IO4 is in ebus slot #3. I think (quick glance at diagrm) that adap2 on the IO4 (in slot3) is the F chip connecting the VME bus.

The VME bus is connected to the IO4 (and the rest of the system) using (insert drum roll) the VCAM. The same board you've been hacking away at after it blew a tant, yes.

I would start by removing all VME cards (if any). ATM by chance? The Onyx won't run without a VCAM, but might with a sick VCAM when there's nothing bugging it from the VME side of things. The VCAM connects the graphics as well (that's adap3), so this is a long shot.

I have the 'big iron' diagnostics for IRIX 6.2, but the system must be healthy enough to boot into IRIX (6.2).

_________________
Now this is a deep dark secret, so everybody keep it quiet :)
It turns out that when reset, the WD33C93 defaults to a SCSI ID of 0, and it was simpler to leave it that way... -- Dave Olson, in comp.sys.sgi

Currently in commercial service: Image :Onyx2:(2x) :O3x02L:
In the museum: almost every MIPS/IRIX system.
Wanted: GM1 board for Professional Series GT graphics (030-0076-003, 030-0076-004)


Top
 Profile  
 
Unread postPosted: Thu Oct 02, 2014 3:28 pm 
Offline
User avatar

Joined: Mon Aug 28, 2006 6:29 pm
Posts: 4311
Location: Kamloops, BC
Good eye. I was getting thrown off by the reference to the ASO (refers to it as the DANG) and assuming slot 3 was one of the mezz slots. I guess I should read that guide over again. :P
Either way I've been running a bunch of thermal tests and warm reboots and it's been holding. We'll find out if that remains the case once it's left to cool down.

I have two VME cards. One is ATM (I don't even think it works under 6.5 so it's just in there) and the other is Sirius Video which is kind of necessary. I have IDE.IP25 sitting in my /admin folder but I don't think we ever got far enough along to use DVHtool to stuff it in the header so we could run it.

_________________
:Crimson: :Onyx: :O2000: :O200: :O200: :PI: :PI: :Indigo: :Indigo: :Indigo: :Octane: :O2: :1600SW: :Indigo2: :Indigo2: :Indigo2IMP: :Indigo2IMP: :Indy: :Indy: :Indy: :Cube:

Image <-------- A very happy forum member.


Top
 Profile  
 
Unread postPosted: Thu Oct 02, 2014 4:14 pm 
Offline

Joined: Mon Sep 12, 2011 2:28 pm
Posts: 646
Location: Boston
does it not work if you start it from sash? (ide i mean.)

_________________
:PI: :O2: :Indigo2IMP: :Indigo2IMP:


Top
 Profile  
 
Unread postPosted: Fri Oct 03, 2014 8:02 am 
Offline
User avatar

Joined: Mon Aug 28, 2006 6:29 pm
Posts: 4311
Location: Kamloops, BC
Nah, it seems fine now that I've pulled the IO4 and cleaned everything socketed. I guess it was just a bad connection because it's now even booting form a cold state.
We'll just have to cross our fingers.

_________________
:Crimson: :Onyx: :O2000: :O200: :O200: :PI: :PI: :Indigo: :Indigo: :Indigo: :Octane: :O2: :1600SW: :Indigo2: :Indigo2: :Indigo2IMP: :Indigo2IMP: :Indy: :Indy: :Indy: :Cube:

Image <-------- A very happy forum member.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 6 posts ] 

All times are UTC - 8 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group