Onyx deskside memory issues

SGI hardware problems, solutions, tips, hacks, etc.
Forum rules
Any posts concerning pirated software or offering to buy/sell/trade commercial software are subject to removal.
User avatar
bjornl
Posts: 344
Joined: Tue May 09, 2006 11:55 am
Location: Sweden

Onyx deskside memory issues

Unread postby bjornl » Tue Jan 16, 2018 4:53 am

Hi,

I decided to start up my Onyx during the winter holiday since it was a long time ago.
I discovered that I will have to do the battery surgery on the NVRAM, I also discovered that some of my memory had been disabled.
Since I manually can set the parameters to get this machine running I decided to try and fix the memory issue, or at least move it
away from Bank A. I think Bank A is a bit more important then the other banks and even though it worked I wanted no errors on Bank A.
So this is how it looked when I started my quest.

Code: Select all

IP25 SCC(E) SGI Version 6  built 10:11:49 AM May  8, 1996
R10000 2.5 194MHz BE (4-2-2/9) 1MB

Initializing hardware inventory...              ...done.
    CPU 02/00 is bootmaster
Testing Secondary Cache...                      ...passed.
Testing and clearing bus tags...                ...passed.
Configuring memory...
    Using standard interleave algorithm.
Running built-in memory test... 01
*** Self-test FAILED on slot 01, leaf 0, bank 0 (A)

                                                ...passed.
Writing cfginfo to memory
Initializing MPCONF blocks
Checking slave processor diag results..........
    Enabled 1280 Megabytes of main memory
    (Disabled 256 Megabytes of main memory)
    Enabled 2 processors
Downloading PROM header information...
Downloading PROM code...
Jumping into IO4 PROM.

PROM Segment Loader (R10000 IP25) SGI Version 2.1 Rev A MIPS3,   Sep  3, 1996
Loading and executing R10000 boot prom image...

IO4 PROM Monitor SGI Version 4.21 Rev A IP25,   Sep  3, 1996 (BE64)
Sizing caches...
Initializing exception vectors.
Initializing IO4 subsystems.
Fixing vpids...
Initializing environment

NVRAM checksum is incorrect: reinitializing.
Piggyback reads enabled.
Initializing software and devices.
All initialization and diagnostics completed.
Bootmaster processor already started.
Starting processor #1
Checking hardware inventory...
WARNING: hardware inventory is invalid.  Reinitializing...
***      Bank A on the MC3 in slot 1 failed diagnostics.
***        Reason: Memory built-in self-test failed.
***      Bank A on the MC3 in slot 1 is DISABLED.

Press <Enter> to continue


Because bank numbering on the Onyx is not the most logical I started carefully to see what changed when I move around the memory.
However things only got worse and worse. I swapped memory modules back and forth. Removed/replaced all memory but Bank A,
all memory, MC3 board.
This is how I ended up, and I can't get any memory back (with my knowledge).

Code: Select all

IP25 SCC(E) SGI Version 6  built 10:11:49 AM May  8, 1996
R10000 2.5 194MHz BE (4-2-2/9) 1MB

Initializing hardware inventory...              ...done.
    CPU 02/00 is bootmaster
Testing Secondary Cache...                      ...passed.
Testing and clearing bus tags...                ...passed.
Configuring memory...
    Using standard interleave algorithm.
Running built-in memory test... 01
*** Self-test FAILED on slot 01, leaf 0, bank 0 (A)

*** Self-test FAILED on slot 01, leaf 0, bank 1 (C)

*** Self-test FAILED on slot 01, leaf 0, bank 2 (E)

*** Self-test FAILED on slot 01, leaf 1, bank 0 (B)

*** Self-test FAILED on slot 01, leaf 1, bank 1 (D)

*** Self-test FAILED on slot 01, leaf 1, bank 2 (F)

*** CONFIGURATION FAILED: No operational memory was found

*** No memory configured
Writing cfginfo to memory
Initializing MPCONF blocks
General Exception

 EPC:   0x900000001fc140e8   ERROR-EPC: 0x35e02ce67f8bfc02
 BadVA: 0x0090200000001203   Return:    0x900000001fc140f0
 SP:    0xa8000000000fe818   A0:        0x0000000000000d00
 Cause: 0x8000801c Status: 0x24400082 Cache Error: 0xdefefffd
  Cause = ffffffff8000801c ( INT:8------- <Data Bus Err> )
*** Error/TimeOut Interrupt(s) Pending: 00000100 ==
         Addr Error on MyRequest on Ebus
Reason for entering POD mode: Unexpected exception.
Press ENTER to continue.
POD 02/00>


And on the front display

Code: Select all

General Exception
Diagnostic code #251

I hope the "General Exception" is because there is no ram in Bank A and the PROM can't go further, and not because I have
messed up the PROM as well.

Sometimes I got hundreds (or thousands) of these

Code: Select all

*** Bus error occurred while checking mem board in slot 1
*** Error/TimeOut Interrupt(s) Pending: 00101000 ==
         Parity error on data from D-chip [15:0]
         Multiple Errors detected
    EPC 900000001fc037f0 CAUSE ffffffff8000a01c BADVADDR 490300081a01203 SCCADDR 50ffffffe00
*** Bank check read pass failed
NOTE: Reconfiguring memory.

Ending with

Code: Select all

*** Bus error occ


On Onyx2 there is "ENABLEALL", "CLEARALLLOGS", "INITALLLOGS", etc in the POD, but I can't find the equivalent on the Onyx.
Well, there is "CLEAR" and "RESET" but that didn't help.

Can anyone help me with this? I have extra MC3 and IO4 board, but am afraid to put it in, just in case something can happen to them as well.

User avatar
bjornl
Posts: 344
Joined: Tue May 09, 2006 11:55 am
Location: Sweden

Re: Onyx deskside memory issues

Unread postby bjornl » Sun Jan 21, 2018 11:07 am

On a closer inspection of the back of my MC3 board it seems like I have damaged some resistors or capacitors :-(
I don't think it can be fixed with some POD commands.
This probably happened when removing or inserting the board, because I did that several times and at one time it
felt a bit stuck. A couple of the components are a loose at one end, not connected to the solderpad. It is possible
that the first bank error I tried to repair was also caused by something like this.

Finding a new MC3 board is probably difficult to find and/or quite expensive, so I'm going to attempt to solder back
those loose SMD's. It is possible I have to replace them if they have more damage then just being a bit loose.
They all look very similar. How can you tell if it is a resistor or capacitor?

The extra MC3 board I have is from my other Onyx so I will only use that to make sure the rest of the machine is ok.
It is not a long term solution, so if I don't succeed the next step will be a post in "Hardware Wanted" :-)

User avatar
Raion-Fox
Donor
Donor
Posts: 1564
Joined: Thu Jan 30, 2014 5:01 pm
Location: near King George, Virginia
Contact:

Re: Onyx deskside memory issues

Unread postby Raion-Fox » Sun Jan 21, 2018 11:15 am

Find someone who can do a repair. These boards are getting rare.
:O3x02L: R16000 700MHz 8GB RAM kanna
:Octane: R12000 300MHz SI 896MB RAM yuuka
:Octane2: R12000A 400MHz V6 2.5GB RAM
:Tezro: Quad R16000 700MHz V12 8GB RAM murasaki
:Indy: (Acclaim) R4600 133MHz XL Graphics 32MB RAM
:Indy: (Challenge S) R4600 133MHz (MIPS III Build Server)

I am probably posting from yangxiaolong, HP Z230 with Xeon E3-1230v3, 16GB RAM, GeForce 750ti, and running NetBSD and Windows 8.1 Embedded.
Owner and operator of http://irix.pw

User avatar
Irinikus
Posts: 510
Joined: Wed Apr 27, 2016 4:25 am
Location: Cape Town, South Africa

Re: Onyx deskside memory issues

Unread postby Irinikus » Sun Jan 21, 2018 11:25 am

Take a pic of the damaged board, as we may be able to help you to identify the damaged components

In the past, I have scavenged SMT components of spare (broken) boards to fix damaged ones in my systems with success.

Capacitors are usually not critical in terms of their value, depending on where they are in the circuit, but resistors on the other hand will more than likely be so.

It would help if you had a hot air pencil (SMT soldering station), but it is possible to carry out SMT repairs using a soldering iron with a very fine point.
Image ................................... Image Image Image Image Image Image Image Image Image
Image ................................... Image Image Image Image Image Image Image Image
Image ... Image
Image ........................ Image
Image ........................ Image Image

User avatar
kjaer
Posts: 433
Joined: Wed May 07, 2008 7:47 pm
Location: Seattle, WA
Contact:

Re: Onyx deskside memory issues

Unread postby kjaer » Sun Jan 21, 2018 12:19 pm

Actually, I probably have more MC3s than I need. Let me know if you don't get anywhere trying to repair it, and I'll see what's up. If it'd been an IO4 I already know for sure I have too many of those.
:OnyxR: :IRIS3130: :IRIS2400: :Onyx: :ChallengeL: :4D220VGX: :Indigo: :Octane: :Cube: :Indigo2IMP: :Indigo2: :Indy:


Return to “SGI: Hardware”

Who is online

Users browsing this forum: No registered users and 1 guest