Is this SGI Tezro beyond repair?

SGI hardware problems, solutions, tips, hacks, etc.
Forum rules
Any posts concerning pirated software or offering to buy/sell/trade commercial software are subject to removal.
Dzeimis
Posts: 4
Joined: Mon Dec 26, 2016 3:51 am

Is this SGI Tezro beyond repair?

Unread postby Dzeimis » Mon Dec 26, 2016 3:58 am

Last week I visited one warehouse with loads of hardware and found an SGI Tezro workstation. I got to take it home and let it dry for a few days. After that I vacuumed dust from it and removed dust with pressurized air. When I first turned it on, L1 LCD indicated that 1.8V rail dropped to 1.2V and the system turned off, however on next attempt it said that everything's okay.

However not everything was okay: the front panel had only solid red LED lighted which, according to manual, means "System node board failure (failed to read PROM at power on)". I removed system node board and found some of the pins on the backplane bent (picture). I tried to fix it with needle and small tweezers back to this (picture) yet it still failed to boot. Is there anything else I could do to fix this computer? Is there any information on the purpose of 100 mil headers on backplane, node board and IO board? The Tezro had no side covers when I found it, so it is possible that the jumpers were looted at some point.

I'm not sure if this is relevant, but when turning the system on, the LED on backplane board turns from green to red and one green LED turns on on the node board near connectors (the one that had bent pins). Yellow LEDs on the node board are on all the time.

robespierre
Posts: 1578
Joined: Mon Sep 12, 2011 2:28 pm
Location: Boston

Re: Is this SGI Tezro beyond repair?

Unread postby robespierre » Mon Dec 26, 2016 9:41 am

I think you're hanging on a hope and a prayer with that, but at least it's Christmas.
The diagnostics and manual are written to troubleshoot expected failures, not "this machine was left in a disused barn and became a bat nursery".
I don't think the jumpers are critical; but there are many parts that must all work for a Tezro to boot. Even low batteries will prevent them working. Have you connected to the L1 console and read the error log?
:PI: :O2: :Indigo2IMP: :Indigo2IMP:

User avatar
pentium
Posts: 4746
Joined: Mon Aug 28, 2006 6:29 pm
Location: Kamloops, BC

Re: Is this SGI Tezro beyond repair?

Unread postby pentium » Mon Dec 26, 2016 9:03 pm

I second plugging a console up to the L1 port and pasting whatever it reports here. One of the nice things about the late of the MIPS SGI's was that the L1 helped diagnose problems when the machine was being especially sick.

As for the connector I've seen those spades get bent before but it was on the CPU connector in the Power Macintosh G4 which is essentially the same thing. So long as you bend them back straight and uniform with the the others it should be fine. It isn't like the Compression Connectors used on older machines.
:Crimson: :Onyx: :O2000: :O200: :O200: :PI: :PI: :Indigo: :Indigo: :Indigo: :Octane: :O2: :1600SW: :Indigo2: :Indigo2: :Indigo2IMP: :Indigo2IMP: :Indy: :Indy: :Indy: :Cube:

Image <-------- A very happy forum member.

User avatar
jan-jaap
Donor
Donor
Posts: 4939
Joined: Thu Jun 17, 2004 11:35 am
Location: Wijchen, The Netherlands
Contact:

Re: Is this SGI Tezro beyond repair?

Unread postby jan-jaap » Tue Dec 27, 2016 2:25 am

Assuming this system didn't leave the factory with a connector with bent pins, it means someone removed the CPU board from this system, and when they put back what was inside when you bought it, didn't bother (or wasn't able) to put it back together carefully.

That unfortunately gives the impression of a system that developed a problem and either
    1. someone pulled the nodeboard, couldn't make sense of it, put it back together shoddily and got rid of the system, *or*
    2. this system was used as a donor to donate a nodeboard to another system, and the failed nodeboard of that system was put into this one.

You may be able to recover the system, but I'd be surprised if it didn't require replacing some part of it.
:PI: :Indigo: :Indigo: :Indy: :Indy: :Indy: :Indigo2: :Indigo2: :Indigo2IMP: :Octane: :Octane2: :O2: :O2+: Image :Fuel: :Tezro: :4D70G: :Skywriter: :PWRSeries: :Crimson: :ChallengeL: :Onyx: :O200: :Onyx2: :O3x02L:
To accentuate the special identity of the IRIS 4D/70, Silicon Graphics' designers selected a new color palette. The machine's coating blends dark grey, raspberry and beige colors into a pleasing harmony. (IRIS 4D/70 Superworkstation Technical Report)

Dzeimis
Posts: 4
Joined: Mon Dec 26, 2016 3:51 am

Re: Is this SGI Tezro beyond repair?

Unread postby Dzeimis » Tue Dec 27, 2016 8:26 am

I connected to L1 through null modem cable and got to console. Once I got some general errors (picture) and now it just occasionally says 'no response from 001C01 CPU0, system not responding'. I tried to get some data from the L1 controller (sorry for pictures of text, my only computer with serial port is windows 95 laptop with no simple way to transfer data from it):

config verbose

power

leds when computer is on

cpu and flash status

env

I also did "nvram reset", but it didn't change anything. Does this give any ideas to what could be wrong? Are there more tests I could do to provide more information?

EDIT: sorry, forgot to add log here. I forgot to check the log at first, so it was filled with me resetting the system. After cleaning and starting the system it looked like this. Also i tested leds with the computer off and got this.

EDIT2: Figured how to enter POD mode, now it shows General Exception errors again.

It gets stuck on leds status 0x2A, then after issuing NMI (either front button or command) it shifts through few unknown states (transients?) and lands to POD mode 0xBC/0x80 where the exceptions occur

User avatar
pentium
Posts: 4746
Joined: Mon Aug 28, 2006 6:29 pm
Location: Kamloops, BC

Re: Is this SGI Tezro beyond repair?

Unread postby pentium » Tue Jan 03, 2017 5:25 pm

From the totally limited knowledge I have of the last generation MIPS machines the best I can say is that in relation to my slightly older Origin 2000, there's a connection issue. One of the nodeboards on my system has a flaky compression connector and occasionally on a cold start it will spam the console with exception logs and the first nodeboard will otherwise fail on discovery until I power down, literally punch the nodeboard in with my fist and power back up again.

I'm still a little curious on why you had to let it dry for a few days. I recall on a few of the earlier machines there's basically a compression connection between the CPU and the board itself which is quite sensitive to moisture.
:Crimson: :Onyx: :O2000: :O200: :O200: :PI: :PI: :Indigo: :Indigo: :Indigo: :Octane: :O2: :1600SW: :Indigo2: :Indigo2: :Indigo2IMP: :Indigo2IMP: :Indy: :Indy: :Indy: :Cube:

Image <-------- A very happy forum member.

Dzeimis
Posts: 4
Joined: Mon Dec 26, 2016 3:51 am

Re: Is this SGI Tezro beyond repair?

Unread postby Dzeimis » Sun Jan 08, 2017 6:26 am

I left it for a few days in a dry room just to make sure there was no condensation inside connectors and so on. The warehouse it was stored in is just a hangar-like shelter with free air circulation from outside and that week was rather rainy and humid. I once had an explosive accident with another device stored in such conditions, so now this is my routine to let the device stay for some time and for me to do some research first.

The connection between main and node boards is not a compression as far as I understand, however it does not seem much easier to work with. The uncertainty whether this is board or connector issue is what makes me reluctant to try and get another node board. I'll get home at the end of January and I will take the mainboard out to have another look at the contacts. On the other hand, if the L1 controller identifies all the devices on the node board (VRMs, CPUs, RAMs) as working fine, doesn't that mean that all required contacts in the connector are working fine?

User avatar
pentium
Posts: 4746
Joined: Mon Aug 28, 2006 6:29 pm
Location: Kamloops, BC

Re: Is this SGI Tezro beyond repair?

Unread postby pentium » Mon Jan 09, 2017 9:14 pm

Actually you are right. If it's talking to other devices on the nodeboard than the connector (which you are correct, is not a compression connector) should be fine.
:Crimson: :Onyx: :O2000: :O200: :O200: :PI: :PI: :Indigo: :Indigo: :Indigo: :Octane: :O2: :1600SW: :Indigo2: :Indigo2: :Indigo2IMP: :Indigo2IMP: :Indy: :Indy: :Indy: :Cube:

Image <-------- A very happy forum member.

User avatar
Dodoid
Posts: 643
Joined: Mon Jul 04, 2016 1:36 pm
Location: Ottawa, Canada
Contact:

Re: Is this SGI Tezro beyond repair?

Unread postby Dodoid » Tue Jan 10, 2017 8:20 am

pentium wrote:One of the nodeboards on my system has a flaky compression connector and occasionally on a cold start it will spam the console with exception logs and the first nodeboard will otherwise fail on discovery until I power down, literally punch the nodeboard in with my fist and power back up again.


Pentium: supercomputer chiropractor :lol:
:Onyx: :O2000: :Fuel: :Octane: :Octane: :Octane: :O2: :O2: :Indigo2: :Indigo2: :Indy: :Indy:
and a small army of Image

Dzeimis
Posts: 4
Joined: Mon Dec 26, 2016 3:51 am

Re: Is this SGI Tezro beyond repair?

Unread postby Dzeimis » Thu Jul 13, 2017 9:02 am

Hey, it's me again with my Tezro in the same condition as it was half a year ago. I poked it some more and found out these things:

When plugging in the machine, it immediately complains about voltage regulator modules:

Code: Select all

INFO: Cannot enable VRM: 9
INFO: Cannot enable VRM: 10
INFO: Cannot enable VRM: 11

SGI SN1 L1 Controller
Firmware Image B: Rev. 1.26.5, Built 12/15/2003 12:58:02


Leds status when the machine is powered down:

Code: Select all

CPU  A: 0x00: PLED_RESET:  Slave loop (0x00/0x45=okay, solid 0x00=possibly hung)
CPU  B: < CPU not present >
CPU  C: 0x00: PLED_RESET:  Slave loop (0x00/0x45=okay, solid 0x00=possibly hung)
CPU  D: < CPU not present >


After powering it up, it raises some general exceptions:

Code: Select all

A 000 001c01:
A 000 001c01: *** General Exception on node 0
A 000 001c01: *** EPC: 0xc00000001fc41180 (0xc00000001fc41180)
A 000 001c01: *** Press ENTER to continue.
A 000 001c01:
A 000 001c01: *** General Exception on node 0
A 000 001c01: *** EPC: 0xc00000001fc41148 (0xc00000001fc41148)
A 000 001c01: *** Press ENTER to continue.


And trying to reboot (via soft reset) it sends the controller into panic:

Code: Select all

INFO: Cannot enable VRM: 9
INFO: Cannot enable VRM: 10
INFO: Cannot enable VRM: 11


****************************************
controller firmware panic!   resetting...
****************************************

IMAGE B: Rev. 1.26.5
[thread ID 30004b44 stack]
   TR: fff6f158 fff6ea10 fff845d4 fff848ba fff84930 fff596f6 fff4f1a0
   TR: fff6510a fff10e88 fff7fb7e fff131b0 fff13b20 fff3be50 00000000

(if you see this, please email ssh@sgi.com and include
 the output from the 'log' command and a description of
 what caused the problem)


SGI SN1 L1 Controller
Firmware Image B: Rev. 1.26.5, Built 12/15/2003 12:58:02
07/12/17 03:38:10 power up (COMMAND)
07/12/17 03:38:15 Node 0 IP53 XTalk clock 88
07/12/17 03:38:18 reset again MIPS
07/12/17 03:38:18 Cooling system stabilized
07/12/17 03:38:22 Node 0 IP53 XTalk clock 88
07/12/17 03:40:03 reset (COMMAND)
07/12/17 03:40:04 Node 0 IP53 XTalk clock 88
07/12/17 03:40:33 soft reset (COMMAND)
07/12/17 03:40:33 PANIC: ioExp.c line 571 ; Illegal I/O expander index: 35
07/12/17 03:40:34 L1 booting 1.26.5
07/12/17 03:40:35 CONTROLLER FIRMWARE PANIC!
07/12/17 03:40:35 IMAGE B: Rev. 1.26.5
07/12/17 03:40:35 [thread ID 30004b44 stack]
07/12/17 03:40:35    TR: fff6f158 fff6ea10 fff845d4 fff848ba fff84930 fff596f6 f
ff4f1a0
07/12/17 03:40:35    TR: fff6510a fff10e88 fff7fb7e fff131b0 fff13b20 fff3be50 0
0000000
07/12/17 03:40:36 Cooling system stabilized
07/12/17 03:40:36 USB0: waiting on open


A couple of questions to people who own working Tezro machines:
Does the computer say anything about VRMs when plugged in?
Does the controller go into panic when issuing soft reset after booting?
Is it worth emailing ssh@sgi.com with my second (if not fourth) hand legacy system's issues?


Return to “SGI: Hardware”

Who is online

Users browsing this forum: Bing [Bot] and 3 guests