Tezro crashes

SGI hardware problems, solutions, tips, hacks, etc.
Forum rules
Any posts concerning pirated software or offering to buy/sell/trade commercial software are subject to removal.
diegel
Posts: 285
Joined: Tue Nov 17, 2009 2:08 am
Location: Hamburg, Germany

Tezro crashes

Unread postby diegel » Thu Mar 22, 2012 11:50 am

My Tezro crashes sometimes under network traffic:

Code: Select all

Dumpheader version 7, processor type IP35, running in M-mode

=======================
ICRASH CORE FILE REPORT
=======================

SYSTEM:
    system name:    IRIX64
    release:        6.5 (6.5.30m)
    node name:      sorcerer
    version:        07202013
    machine name:   IP35

GENERATED ON:
    Wed Feb 15 09:43:53 2012

TIME OF CRASH:
    1329295171 Wed Feb 15 09:39:31 2012

PANIC STRING:
    PANIC: CPU 0: in_cksum, ran out of data, 8 bytes left

NAMELIST:
    /unix [CREATE TIME: Thu Oct  6 18:28:43 2011]

COREFILE:
    /dev/swap [CREATE TIME: Wed Feb 15 09:42:45 2012]

================
COREFILE SUMMARY
================

    The system was brought down due to an internal panic.

===========
PUTBUF DUMP
===========
        <6>IRIX Release 6.5 IP35 Version 07202013 System V - 64 Bit
    Copyright 1987-2006 Silicon Graphics, Inc.
    All Rights Reserved.
   
    <5>NOTICE: Initialising Guaranteed Rate I/O v2 (Jul 20 2006 18:47:01)
    <5>NOTICE: /hw/module/001c01/IXbrick/xtalk/15/pci-x/1/2/scsi_ctlr/0:  1068 SAS/SATA firmware version 1.3.0.0
    <6>Selecting IO9 baseio
    <5>NOTICE: Start mounting filesystem: /
    <5>NOTICE: Starting XFS recovery on filesystem: / (dev: 0/210)
    <5>NOTICE: Ending XFS recovery for filesystem: / (/hw/module/001c01/IXbrick/xtalk/15/pci-x/0/3/scsi_ctlr/0/target/1/lun/0/disk/partition/0/block)
    <5>NOTICE: Starting failsoftd
    <5>NOTICE: XVM mirrors disabled
    <5>NOTICE: XVM snapshot disabled
    <5>NOTICE: Start mounting filesystem: /mnt
    <5>NOTICE: Ending clean XFS mount for filesystem: /mnt
    <5>NOTICE: tg0: Link up: 1000 Mbps, FULL duplex, flow_ctrl is OFF
    <5>NOTICE: pcmouse: type=3
    <5>NOTICE: pcmouse: type=3
    <5>NOTICE: pcmouse: type=3
    <4>WARNING: core: firefox-bin: PID 1084, failed to write a  text area (core file deleted)
    <5>NOTICE: pcmouse: type=3
    <5>NOTICE: pcmouse: type=3
    <5>NOTICE: pcmouse: type=3
    <4>WARNING: core: firefox-bin: PID 1628, failed to write a  text area (core file deleted)
   
    <0>PANIC: CPU 0: in_cksum, ran out of data, 8 bytes left
    <6>
    Dumping to /hw/module/001c01/IXbrick/xtalk/15/pci-x/0/3/scsi_ctlr/0/target/1/lun/0/disk/partition/1/block at block 0, space: 0x40000 pages
    <6>Waiting 5 seconds for I/O processor.
    <6>CPU 1 is the I/O processor.
    <6>Dumping low memory...<6>
    <6>Dumping static kernel pages...<6>.<6>.<6>.<6>.<6>.

===========
CPU SUMMARY
===========

  CPU 0 was in kernel mode running an xthread named 'netproc0'
  CPU 1 was in kernel mode running an sthread named 'dump2'

STACK TRACE:

===============================================================================
STACK TRACE FOR XTHREAD 0xa800000001b43800 (netproc0):

 1 dumpsys[../os/vmdump.c: 531, 0xc0000000002f32dc]
 2 syncreboot[../os/printf.c: 1686, 0xc0000000002db51c]
 3 icmn_err_tag[../os/printf.c: 597, 0xc0000000002d9e78]
 4 panic[../os/printf.c: 799, 0xc0000000002da2f0]
 5 in_cksum[../bsd/misc/in_cksum.c: 52, 0xc0000000001f5b20]
 6 rip6_input[../bsd/netinet/raw_ip6.c: 285, 0xc00000000037f1e8]
 7 icmp6_input[../bsd/netinet/ip6_icmp.c: 1095, 0xc00000000037a2ac]
 8 ip6_input[../bsd/netinet/ip6_input.c: 603, 0xc000000000377b8c]
 9 netproc[../bsd/net/netisr.c: 237, 0xc0000000001ed0fc]
10 xthread_prologue[../os/swtch.c: 1647, 0xc000000000308650]
11 xtresume[../os/swtch.c: 1695, 0xc00000000030871c]
===============================================================================

=======================
CRASH SUMMARY FOR CPU 0
=======================

 1 dumpsys[../os/vmdump.c: 531, 0xc0000000002f32dc]
 2 syncreboot[../os/printf.c: 1686, 0xc0000000002db51c]
 3 icmn_err_tag[../os/printf.c: 597, 0xc0000000002d9e78]
 4 panic[../os/printf.c: 799, 0xc0000000002da2f0]
 5 in_cksum[../bsd/misc/in_cksum.c: 52, 0xc0000000001f5b20]
 6 rip6_input[../bsd/netinet/raw_ip6.c: 285, 0xc00000000037f1e8]
 7 icmp6_input[../bsd/netinet/ip6_icmp.c: 1095, 0xc00000000037a2ac]
 8 ip6_input[../bsd/netinet/ip6_input.c: 603, 0xc000000000377b8c]
 9 netproc[../bsd/net/netisr.c: 237, 0xc0000000001ed0fc]
10 xthread_prologue[../os/swtch.c: 1647, 0xc000000000308650]
11 xtresume[../os/swtch.c: 1695, 0xc00000000030871c]

=======================
CRASH SUMMARY FOR CPU 1
=======================

 The sthread 'dump2' was running.
 1 panicspin[../os/printf.c: 1618, 0xc0000000002db40c]
 2 doacvec[../os/pda.c: 2016, 0xc0000000002c0d3c]
 3 cpuintr[../ml/SN/intr.c: 1622, 0xc000000000060938]
 4 intpend0[../ml/SN/intr.c: 1302, 0xc0000000000603c0]
 5 intr[../ml/SN/intr.c: 1466, 0xc0000000000606b4]
 6 VEC_int[../ml/LOCORE/vec_int.s: 84, 0xc000000000040198]


First i thought it is a hardware problem, but now I am sure this is a software bug. I can reproduce this problem with a second Tezro. It only happens when I am connected wit a GigE Link. It happens with the IO9 board or any other GigE card with 3Com hardware. Any idea?
:Tezro: :Fuel: :Octane2: :Octane: :Onyx2: :O2+: :O2: :Indy: :Indigo: :Cube:

User avatar
recondas
Moderator
Moderator
Posts: 5286
Joined: Sun Jun 06, 2004 5:55 pm
Location: NC - USA

Re: Tezro crashes

Unread postby recondas » Thu Mar 22, 2012 2:01 pm

diegel wrote:My Tezro crashes sometimes under network traffic...... First i thought it is a hardware problem, but now I am sure this is a software bug. I can reproduce this problem with a second Tezro.

It only happens when I am connected wit a GigE Link. It happens with the IO9 board or any other GigE card with 3Com hardware.


There maybe other differences involved, but I have a number of IP35/IP53 systems, including a Tezro and several O350s that connect to a gigabit network. Most use the native Broadcom tg/Tigon3 interface on the IO9, (one dual module system that at one time had both IO9s connected to a gigabit network), and at least one using an SGI-branded eg-based PCI gigE as the active gigabit interface (rather than the port on the IO9). Here's the Gibit section of an hinv for one with both tg and eg interfaces (the eg board is currently active):

Code: Select all

Integral Gigabit Ethernet: tg0, module 001c01, PCI bus 1 slot 4
Gigabit Ethernet: tg1, module 001c02, PCI bus 1 slot 4
Gigabit Ethernet: eg0, module 001c02, slot -1, firmware version 12.4.10

All of the IP35 systems are running IRIX 6.5.30. So far, knock on wood, I haven't seen the crash you've experienced.
***********************************************************************
Welcome to ARMLand - 0/0x0d00
running...(sherwood-root 0607201829)
* InfiniteReality/Reality Software, IRIX 6.5 Release *
***********************************************************************

User avatar
jan-jaap
Posts: 4057
Joined: Thu Jun 17, 2004 11:35 am
Location: Wijchen, The Netherlands

Re: Tezro crashes

Unread postby jan-jaap » Fri Mar 23, 2012 2:50 am

diegel wrote:My Tezro crashes sometimes under network traffic

Is that network traffic NFS by any chance?
Now this is a deep dark secret, so everybody keep it quiet :)
It turns out that when reset, the WD33C93 defaults to a SCSI ID of 0, and it was simpler to leave it that way... -- Dave Olson, in comp.sys.sgi

Currently in commercial service: Image :Onyx2:(2x) :O3x02L:
In the museum: almost every MIPS/IRIX system.
Wanted: GM1 board for Professional Series GT graphics (030-0076-003, 030-0076-004)

diegel
Posts: 285
Joined: Tue Nov 17, 2009 2:08 am
Location: Hamburg, Germany

Re: Tezro crashes

Unread postby diegel » Fri Mar 23, 2012 6:37 am

jan-jaap wrote:
diegel wrote:My Tezro crashes sometimes under network traffic

Is that network traffic NFS by any chance?

If you want to reproduce a crash like this, start lots of traceroutes with mtr an let them run over night. Telnet to devices with a tcp windows size of 65536 bytes also makes a niche crash. You can configure such a tcp windows site on Cisco routers for example.
:Tezro: :Fuel: :Octane2: :Octane: :Onyx2: :O2+: :O2: :Indy: :Indigo: :Cube:

diegel
Posts: 285
Joined: Tue Nov 17, 2009 2:08 am
Location: Hamburg, Germany

Re: Tezro crashes

Unread postby diegel » Tue Apr 03, 2012 1:56 pm

After some more investigation, I find out the crash is a memory problem. Don't mix 030-1060-003 and 030-1060-004 memory.
:Tezro: :Fuel: :Octane2: :Octane: :Onyx2: :O2+: :O2: :Indy: :Indigo: :Cube:

User avatar
smj
Posts: 1443
Joined: Mon Nov 12, 2007 7:54 pm
Location: Berkeley, CA, USA, NA, Earth, Sol
Contact:

Re: Tezro crashes

Unread postby smj » Tue Apr 03, 2012 2:20 pm

diegel wrote:After some more investigation, I find out the crash is a memory problem. Don't mix 030-1060-003 and 030-1060-004 memory.

It's really an -003/-004 issue, and not a bad/intermittent DIMM issue? You've reproduced, and it only occurred with mixed -00x versions? I'm using a number of these in a Fuel and O300 and assumed there was no issue with the revisions.

Can somebody else with appropriate versions double-check? I'll put this on the list for when I can see the tops of my O300s again and inventory/move the memory parts - I want to know if this is an IP35-family issue or Tezro-specific...
Then? :IRIS3130: ... Now? :O3x02L: :A3504L:- :A3502L: :1600SW:+MLA :Fuel: :Octane2: :Octane: :Indigo2IMP: ... Other: DEC :BA213: :BA123: Sun, DG AViiON, NeXT :Cube:

diegel
Posts: 285
Joined: Tue Nov 17, 2009 2:08 am
Location: Hamburg, Germany

Re: Tezro crashes

Unread postby diegel » Wed Apr 04, 2012 2:31 am

smj wrote:
diegel wrote:After some more investigation, I find out the crash is a memory problem. Don't mix 030-1060-003 and 030-1060-004 memory.

It's really an -003/-004 issue, and not a bad/intermittent DIMM issue? You've reproduced, and it only occurred with mixed -00x versions? I'm using a number of these in a Fuel and O300 and assumed there was no issue with the revisions.

Can somebody else with appropriate versions double-check? I'll put this on the list for when I can see the tops of my O300s again and inventory/move the memory parts - I want to know if this is an IP35-family issue or Tezro-specific...

I think it is an IP35 family issue. I have seen this problem in my Tezro and my Fuel. Currently I have 4GB -004 memory in my Fuel and don't see any problem. I have 6GB -003 in my Tezro and that works fine also.
:Tezro: :Fuel: :Octane2: :Octane: :Onyx2: :O2+: :O2: :Indy: :Indigo: :Cube:

User avatar
smj
Posts: 1443
Joined: Mon Nov 12, 2007 7:54 pm
Location: Berkeley, CA, USA, NA, Earth, Sol
Contact:

Re: Tezro crashes

Unread postby smj » Wed Apr 04, 2012 2:55 am

Thanks for the additional info. I'll have to take a look and note which revisions I've got in which machines, which banks, et cetera...
Then? :IRIS3130: ... Now? :O3x02L: :A3504L:- :A3502L: :1600SW:+MLA :Fuel: :Octane2: :Octane: :Indigo2IMP: ... Other: DEC :BA213: :BA123: Sun, DG AViiON, NeXT :Cube:

hamei
Posts: 10000
Joined: Tue Feb 24, 2004 4:10 pm
Location: over the rainbow

Re: Tezro crashes

Unread postby hamei » Wed Apr 04, 2012 11:55 pm

diegel wrote:I think it is an IP35 family issue. I have seen this problem in my Tezro and my Fuel. Currently I have 4GB -004 memory in my Fuel and don't see any problem. I have 6GB -003 in my Tezro and that works fine also.

Strange. I have had mixed memory types in both the Fuel and the Origin 350 and haven't noticed any unusual crashes. I only use it as a desktop tho, so crash causes are harder to pin down to a single source.

User avatar
jan-jaap
Posts: 4057
Joined: Thu Jun 17, 2004 11:35 am
Location: Wijchen, The Netherlands

Re: Tezro crashes

Unread postby jan-jaap » Thu Apr 05, 2012 12:41 am

I suspect a faulty DIMM. Diagnostics for Fuel are (were?) freely available, if you say it crashes there too you've got a good test case.
Now this is a deep dark secret, so everybody keep it quiet :)
It turns out that when reset, the WD33C93 defaults to a SCSI ID of 0, and it was simpler to leave it that way... -- Dave Olson, in comp.sys.sgi

Currently in commercial service: Image :Onyx2:(2x) :O3x02L:
In the museum: almost every MIPS/IRIX system.
Wanted: GM1 board for Professional Series GT graphics (030-0076-003, 030-0076-004)


Return to “SGI: Hardware”

Who is online

Users browsing this forum: No registered users and 2 guests