Altix 450 stability problem

SGI hardware problems, solutions, tips, hacks, etc.
Forum rules
Any posts concerning pirated software or offering to buy/sell/trade commercial software are subject to removal.
Posts: 1
Joined: Fri Nov 18, 2011 11:45 pm

Altix 450 stability problem

Unread postby Twilight » Sat Nov 19, 2011 1:08 am

We have Altix 450 entry level machine with 3 blades, 1 CBrick each. This server is divided into 3 partition (each blade is separate partition).
Each partition have installed RHEL 5.7 with Oracle RAC+Oracle Enterprise Database 10.2. RAC cluster working with NUMALink interface XP0. ProPack is NOT installed, because SGI does not support RHEL now. PROM version 1.39, L1/L2 1.54.0.

Third node of this machine had failure of L1 controller. After controller replacing i have start testing machine stability. And i've found problem - if any node was shutdown with "pwr d" command without previous shutdown of operation system (simulated power failure), two remaining node can hang, reboot (by oprocd, hangchecktimer or MCA handler), drop to POD or remain stable. When one of the nodes starting up, others freezes for 5-8 seconds. Starting up node position in this moment: "Switching to RAM ..........................DONE Discovering NUMAlink connectivity .........DONE".
I have not found any records about accepting trials, so this problem may not be connected with L1 replacement. I've working with this machine only since L1 replacing, that was two month ago. We have active support from SGI, but they still does not have an answer what happened and what to do.
I requested newest PROM and L1/L2 sowtware from SGI, but i'm not sure this will help.

Right now it's all looks like NUMALink hang, because this problem happens only if nodes connected with numalink cables.
Is this behaviour are malfunction? I never worked with SGI machines. What can i do with this problem?

P.S.Sorry for my dreadful English

Return to “SGI: Hardware”

Who is online

Users browsing this forum: No registered users and 1 guest