Error messages

SGI hardware problems, solutions, tips, hacks, etc.
Forum rules
Any posts concerning pirated software or offering to buy/sell/trade commercial software are subject to removal.
Komemiute
Posts: 8
Joined: Mon Aug 29, 2011 5:54 am

Error messages

Unread postby Komemiute » Wed Jun 13, 2012 1:08 am

Good morning everyone,
having an issue never seen before... by me at least.

Machine is an Origin3000.

In the Log of a C-Brick I find this messages alternating:

SMP_WQUE Q avail - lost: 0/0/5 repl:0 (BROAD)
SMP_WQUE Q full (BROAD)

While on the L1 display I get "1.5V Low warningh limit reached @ something..." where something is lower than 1.5, obviously...

Any suggestion?

User avatar
recondas
Moderator
Moderator
Posts: 5312
Joined: Sun Jun 06, 2004 5:55 pm
Location: NC - USA

Re: Error messages

Unread postby recondas » Wed Jun 13, 2012 4:23 am

Komemiute wrote:While on the L1 display I get "1.5V Low warningh limit reached
There's not very much information in your post to work with, but I'd suspect that 1.5V low warning is probably going to end up being the Titanic end of the iceberg.

You didn't mention which log you saw the message in (an IRIX generated log or one from the L1/L2/PROM/POD?), but just to cover all of the bases I'd suggest connecting a 3800, 8, N, 1 serial terminal (via null modem cable) to the L1/console port on the problem brick so you can examine the power-on diagnostic messages.

When you set up the serial terminal program, set a very large scroll-back buffer (at least 5k lines) so you can capture the entire session. Before you close the serial terminal program, be sure to save the session as a text file.

The initial serial terminal connection with the console port will be with the level 1 (L1) controller (the L1 is running anytime power is connected to the brick). Before you power up the system run the "log" command at the L1 command prompt and capture the output. Then power up the system and Immediately press the "d" and "control" keys (on the serial terminal) simultaneously to switch you from L1 mode to console mode. The output of the power-on diagnostics session should scroll across the serial terminal window. If the problem brick powers itself down shortly after you've powered it up, repeat the power up process without switching to console mode and run "env" against the L1 as soon as the brick is powered up.

If the problem brick does remain powered through the diagnostics session, execute a "control t" keypress to switch back to L1 mode. Once you're there (with the brick powered up and running), capture the output of the L1 "log" and "env" commands.

While you still have the serial terminal configured, move your serial cable to the 'console' port on the Level 2 (L2) controller, and run/capture the output of the L2 "log" and "env" commands.

Post the captured data here (you can trim the text to include just the pertinent bits) along with an "hinv -vm" (for the entire system).
***********************************************************************
Welcome to ARMLand - 0/0x0d00
running...(sherwood-root 0607201829)
* InfiniteReality/Reality Software, IRIX 6.5 Release *
***********************************************************************

Komemiute
Posts: 8
Joined: Mon Aug 29, 2011 5:54 am

Re: Error messages

Unread postby Komemiute » Thu Jun 14, 2012 12:01 am

I'll try this up.
Thanks!


EDIT:
Result of HINV -VM

Code: Select all

Location: /hw/module/001c04/node
            IP35 Board: barcode LAW765     part 030-1604-004 rev -A
Location: /hw/module/001c04/node/cpubus/0
        IP35PIMM Board: barcode LRB847     part 030-1520-002 rev -F
Location: /hw/module/001c04/node/cpubus/1
        IP35PIMM Board: barcode MJW628     part 030-1799-002 rev -A
Location: /hw/module/001c04/Ibrick/xtalk/14
          IBRICK Board: barcode LJA190     part 030-1557-007 rev -E
Location: /hw/module/001c04/Ibrick/xtalk/15
          IBRICK Board: barcode LJA190     part 030-1557-007 rev -E
Location: /hw/module/001c07/node
            IP35 Board: barcode LZT994     part 030-1604-006 rev -A
Location: /hw/module/001c07/node/cpubus/0
        IP35PIMM Board: barcode LRC260     part 030-1520-002 rev -F
Location: /hw/module/001c07/node/cpubus/1
        IP35PIMM Board: barcode MFT587     part 030-1520-002 rev -J
Location: /hw/module/001c07/Xbrick/xtalk/8
   ODY128B_SWRDY Board: barcode MES977     part 030-1769-001 rev  C
Location: /hw/module/001c07/Xbrick/xtalk/14
         X2BRICK Board: barcode NAR815     part 030-1721-001 rev -B
Location: /hw/module/001c07/Xbrick/xtalk/15
         X2BRICK Board: barcode NAR815     part 030-1721-001 rev -B
Processor 0: 500 MHZ IP35
CPU: MIPS R14000 Processor Chip Revision: 1.4
FPU: MIPS R14010 Floating Point Chip Revision: 1.4
Processor 1: 500 MHZ IP35
CPU: MIPS R14000 Processor Chip Revision: 1.4
FPU: MIPS R14010 Floating Point Chip Revision: 1.4
Processor 2: 500 MHZ IP35
CPU: MIPS R14000 Processor Chip Revision: 2.4
FPU: MIPS R14010 Floating Point Chip Revision: 2.4
Processor 3: 500 MHZ IP35
CPU: MIPS R14000 Processor Chip Revision: 2.4
FPU: MIPS R14010 Floating Point Chip Revision: 2.4
Processor 4: 500 MHZ IP35
CPU: MIPS R14000 Processor Chip Revision: 1.4
FPU: MIPS R14010 Floating Point Chip Revision: 1.4
Processor 5: 500 MHZ IP35
CPU: MIPS R14000 Processor Chip Revision: 1.4
FPU: MIPS R14010 Floating Point Chip Revision: 1.4
Processor 6: 500 MHZ IP35
CPU: MIPS R14000 Processor Chip Revision: 1.4
FPU: MIPS R14010 Floating Point Chip Revision: 1.4
Processor 7: 500 MHZ IP35
CPU: MIPS R14000 Processor Chip Revision: 1.4
FPU: MIPS R14010 Floating Point Chip Revision: 1.4
CPU 0 at Module 001c04/Slot 0/Slice A: 500 Mhz MIPS R14000 Processor Chip (enabled)
  Processor revision: 1.4. Scache: Size 8 MB Speed 250 Mhz  Tap 0xa
CPU 1 at Module 001c04/Slot 0/Slice B: 500 Mhz MIPS R14000 Processor Chip (enabled)
  Processor revision: 1.4. Scache: Size 8 MB Speed 250 Mhz  Tap 0xa
CPU 2 at Module 001c04/Slot 0/Slice C: 500 Mhz MIPS R14000 Processor Chip (enabled)
  Processor revision: 2.4. Scache: Size 8 MB Speed 250 Mhz  Tap 0xa
CPU 3 at Module 001c04/Slot 0/Slice D: 500 Mhz MIPS R14000 Processor Chip (enabled)
  Processor revision: 2.4. Scache: Size 8 MB Speed 250 Mhz  Tap 0xa
CPU 4 at Module 001c07/Slot 0/Slice A: 500 Mhz MIPS R14000 Processor Chip (enabled)
  Processor revision: 1.4. Scache: Size 8 MB Speed 250 Mhz  Tap 0xa
CPU 5 at Module 001c07/Slot 0/Slice B: 500 Mhz MIPS R14000 Processor Chip (enabled)
  Processor revision: 1.4. Scache: Size 8 MB Speed 250 Mhz  Tap 0xa
CPU 6 at Module 001c07/Slot 0/Slice C: 500 Mhz MIPS R14000 Processor Chip (enabled)
  Processor revision: 1.4. Scache: Size 8 MB Speed 250 Mhz  Tap 0xa
CPU 7 at Module 001c07/Slot 0/Slice D: 500 Mhz MIPS R14000 Processor Chip (enabled)
  Processor revision: 1.4. Scache: Size 8 MB Speed 250 Mhz  Tap 0xa
Main memory size: 5120 Mbytes
Instruction cache size: 32 Kbytes
Data cache size: 32 Kbytes
Secondary unified instruction/data cache size: 8 Mbytes
Memory at Module 001c04/Slot 0: 1024 MB (enabled)
  Bank 0 contains 512 MB (Standard) DIMMS (enabled)
  Bank 1 contains 512 MB (Standard) DIMMS (enabled)
Memory at Module 001c07/Slot 0: 4096 MB (enabled)
  Bank 0 contains 1024 MB (Premium) DIMMS (enabled)
  Bank 1 contains 1024 MB (Premium) DIMMS (enabled)
  Bank 2 contains 1024 MB (Premium) DIMMS (enabled)
  Bank 3 contains 1024 MB (Premium) DIMMS (enabled)
Integral SCSI controller 0: Version Fibre Channel QL2200A, 33 MHz PCI
  Disk drive: unit 0 on SCSI controller 0 (unit 0)
  Disk drive: unit 0, lun 1 on SCSI controller 0 (unit 0)
  Disk drive: unit 1 on SCSI controller 0 (unit 1)
Integral SCSI controller 5: Version IEEE1394 SBP2
  IEEE1394 CDROM: node 1010031001a34c port 0 on SCSI controller 5
IOC3/IOC4 serial port: tty3
Graphics board: V12
Gigabit Ethernet: eg0, module 001c04, pci_bus 2, pci_slot 2, firmware version 12.4.10
Integral Fast Ethernet: ef0, version 1, module 001c04, pci 4
  PCI Adapter ID (vendor 0x114a, device 0x5565) PCI slot 1
  PCI Adapter ID (vendor 0x10a9, device 0x0009) PCI slot 2
  PCI Adapter ID (vendor 0x1077, device 0x2200) PCI slot 1
  PCI Adapter ID (vendor 0x10a9, device 0x0003) PCI slot 4
  PCI Adapter ID (vendor 0x11c1, device 0x5802) PCI slot 5
  PCI Adapter ID (vendor 0x104c, device 0x8009) PCI slot 6
IOC3/IOC4 external interrupts: 1
HUB in Module 001c04/Slot 0: Revision 2 Speed 200.00 Mhz (enabled)
HUB in Module 001c07/Slot 0: Revision 2 Speed 200.00 Mhz (enabled)
Dual Channel Display
IP35prom in Module 001c04/Slot n0: Revision 6.165
IP35prom in Module 001c07/Slot n0: Revision 6.166
IEEE 1394 High performance serial bus controller 0: Type: OHCI, Version 0 0
USB controller: type OHCI
USB Human Interface Device: device id 0 type keyboard
USB Human Interface Device: device id 0 type mouse
drlmsct1 4#



Result of Log command on L1 of both C-Bricks

Code: Select all



001c04:

06/13/12 01:44:59 SMP_WQUE Q avail - lost: 0/0/5 repl: 0 (BROAD)

06/13/12 01:45:00 SMP_WQUE Q full (BROAD)

06/13/12 01:45:02 SMP_WQUE Q avail - lost: 0/0/5 repl: 0 (BROAD)

06/13/12 01:45:03 SMP_WQUE Q full (BROAD)

06/13/12 01:45:05 SMP_WQUE Q avail - lost: 0/0/5 repl: 0 (BROAD)

06/13/12 01:45:06 SMP_WQUE Q full (BROAD)

06/13/12 01:45:08 SMP_WQUE Q avail - lost: 0/0/5 repl: 0 (BROAD)

06/13/12 01:45:09 SMP_WQUE Q full (BROAD)

06/13/12 01:45:11 SMP_WQUE Q avail - lost: 0/0/5 repl: 0 (BROAD)

06/13/12 01:45:12 SMP_WQUE Q full (BROAD)

06/13/12 01:45:14 SMP_WQUE Q avail - lost: 0/0/5 repl: 0 (BROAD)

06/13/12 01:45:15 SMP_WQUE Q full (BROAD)

06/13/12 01:45:17 SMP_WQUE Q avail - lost: 0/0/5 repl: 0 (BROAD)

06/13/12 01:45:18 SMP_WQUE Q full (BROAD)

06/13/12 01:45:20 SMP_WQUE Q avail - lost: 0/0/5 repl: 0 (BROAD)

06/13/12 01:45:21 SMP_WQUE Q full (BROAD)

06/13/12 01:45:23 SMP_WQUE Q avail - lost: 0/0/5 repl: 0 (BROAD)

06/13/12 01:45:24 SMP_WQUE Q full (BROAD)

06/13/12 01:45:26 SMP_WQUE Q avail - lost: 0/0/5 repl: 0 (BROAD)

06/13/12 01:45:27 SMP_WQUE Q full (BROAD)

06/13/12 01:45:29 SMP_WQUE Q avail - lost: 0/0/5 repl: 0 (BROAD)

06/13/12 01:45:30 SMP_WQUE Q full (BROAD)

06/13/12 01:45:32 SMP_WQUE Q avail - lost: 0/0/5 repl: 0 (BROAD)

06/13/12 01:45:33 SMP_WQUE Q full (BROAD)

06/13/12 01:45:35 SMP_WQUE Q avail - lost: 0/0/5 repl: 0 (BROAD)

06/13/12 01:45:36 SMP_WQUE Q full (BROAD)

06/13/12 01:45:38 SMP_WQUE Q avail - lost: 0/0/5 repl: 0 (BROAD)

06/13/12 01:45:39 SMP_WQUE Q full (BROAD)

06/13/12 01:45:41 SMP_WQUE Q avail - lost: 0/0/5 repl: 0 (BROAD)

06/13/12 01:45:42 SMP_WQUE Q full (BROAD)

06/13/12 01:45:44 SMP_WQUE Q avail - lost: 0/0/5 repl: 0 (BROAD)

06/13/12 01:45:45 SMP_WQUE Q full (BROAD)

06/13/12 01:45:47 SMP_WQUE Q avail - lost: 0/0/5 repl: 0 (BROAD)

001c07:

06/05/12 18:33:43 ALERT: Error reading monitor PIMM0 A interrupt status 1: no acknowledge

06/05/12 18:33:46 ALERT: Error reading monitor PIMM0 A interrupt status 1: no acknowledge

06/05/12 18:33:49 ALERT: Error reading monitor PIMM0 A interrupt status 1: no acknowledge

06/05/12 18:33:52 ALERT: Error reading monitor PIMM0 A interrupt status 1: no acknowledge

06/05/12 18:33:55 ALERT: Error reading monitor PIMM0 A interrupt status 1: no acknowledge

06/05/12 18:33:58 ALERT: Error reading monitor PIMM0 A interrupt status 1: no acknowledge

06/05/12 18:34:01 ALERT: Error reading monitor PIMM0 A interrupt status 1: no acknowledge

06/05/12 18:34:04 ALERT: Error reading monitor PIMM0 A interrupt status 1: no acknowledge

06/12/12 15:10:21 1.5V low fault limit reached @  1.199V.

06/12/12 15:10:26 1.5V low warning limit reached @  1.213V.

06/12/12 15:10:31 1.5V low warning limit reached @  1.213V.

06/12/12 15:10:36 1.5V low fault limit reached @  1.199V.

06/12/12 15:10:46 1.5V low fault limit reached @  1.199V.

06/12/12 15:10:56 1.5V low warning limit reached @  1.213V.

06/12/12 15:10:56 1.5V low fault limit reached @  1.199V.

06/12/12 15:11:06 1.5V low warning limit reached @  1.213V.

06/12/12 15:11:06 1.5V low fault limit reached @  1.199V.

06/12/12 15:11:16 power down (OS)

06/13/12 07:57:12 power down (COMMAND)

06/13/12 07:57:47 power up (COMMAND)

06/13/12 07:57:53 reset again MIPS

06/13/12 07:57:56 1.5V low warning limit reached @  1.269V.

06/13/12 08:34:14 power down (COMMAND)

06/13/12 08:37:24 L1 booting 1.32.15

06/13/12 08:37:27 USB0: waiting on open

06/13/12 08:37:48 power up (COMMAND)

06/13/12 08:37:54 reset again MIPS

06/13/12 08:37:54 1.5V low warning limit reached @  1.241V.


I've also found on a manual this horrible line...

6.12 Troubleshooting Power Problems
Refer to TIB 200627 for detailed information on troubleshooting
Origin 3000 series power problems.
• 1.5V low warning limit reached
The regulator (that is soldered to the IP35 motherboard at location G8B1) is faulty.

User avatar
lunatic
Posts: 42
Joined: Fri Jan 21, 2005 3:07 am
Location: Heidelberg, Germany
Contact:

Re: Error messages

Unread postby lunatic » Thu Jun 14, 2012 5:20 am

you should be able to get much more info from the L1 controller with something like this.

"* margin"
"* vrm"
"* env"
"* power"
Don't suffer from insanity...
Enjoy every minute of it.

User avatar
recondas
Moderator
Moderator
Posts: 5312
Joined: Sun Jun 06, 2004 5:55 pm
Location: NC - USA

Re: Error messages

Unread postby recondas » Thu Jun 14, 2012 6:59 am

I'd suspect there's a good chance the smp write queue errors logged by 001c04 will stop if you disconnect (and reroute the connections from) the brick with power issues (001c07).

You could try that as a test. If it works then you can decide if you want to repair, replace, or do without 001c07.
***********************************************************************
Welcome to ARMLand - 0/0x0d00
running...(sherwood-root 0607201829)
* InfiniteReality/Reality Software, IRIX 6.5 Release *
***********************************************************************


Return to “SGI: Hardware”

Who is online

Users browsing this forum: Ahrefs [Bot] and 1 guest