Nekochan Net

Official Chat Channel: #nekochan // irc.nekochan.net
It is currently Tue Jul 22, 2014 1:33 pm

All times are UTC - 8 hours


Forum rules


Any posts concerning pirated software or offering to buy/sell/trade commercial software are subject to removal.



Post new topic Reply to topic  [ 5 posts ] 
Author Message
Unread postPosted: Tue Apr 10, 2012 12:20 pm 
Offline

Joined: Thu Oct 11, 2007 9:04 am
Posts: 19
Greetings;

I have a five-rack Onyx2 that I've had for some time now that I'm trying to bring back to the land of the living. I've never had it running, it's been mostly in storage, and so as I'm bringing things up I'm finding what's borked (lost a node board and a PSU so far) and what lives. Each rack comprises of an upper Graphics module and a lower Compute module. For now I'm using just a single Graphics head, but CrayLink daisy-chaining all of the Computes into a string (not enough CrayLinks to do anything else).

Difficulty: I appear to have only one working MMSC, so I'm doing this all by hand.

My problem seems to be working out how to dictate the Global Master so I know who the head-node is. I seem to have fairly good luck bringing up three racks and getting it booting into IRIX, but when I add the other two racks, the Global Master tends to waffle around (/hw/module/2/n/1? No, wait, now it's 3/n/1. Make a decision!)

I had initially assumed that the last rack powered on would always become the Master, but this isn't the case, so there must be some way of telling it what's what when manually bringing things up. Any help would be gratefully received.

Thanks!

- JP


Top
 Profile  
 
Unread postPosted: Wed Apr 11, 2012 7:20 am 
Offline
Moderator
Moderator
User avatar

Joined: Sun Jun 06, 2004 4:55 pm
Posts: 5190
Location: NC - USA
jpaul wrote:
I have a five-rack Onyx2 that I've had for some time now that I'm trying to bring back to the land of the living....Difficulty: I appear to have only one working MMSC, so I'm doing this all by hand.
Having a working MMSC per rack would make setting up a five-rack configuration a more user-friendly proposition. If you have non-working MMSCs in the other four racks, you may be able to revive them. The power supply in the MMSC is not known for longevity. There have been a number of topics on nekochan that discuss replacing the OEM power supply with an ATX-style PS (Pontus has even supplied a few photos of the conversion). If you'd rather not use an external supply, some MMSCs used an Artesyn NFN40-7608 PS, if you'd like additional details or specifications, the data sheet for that PS is available as a PDF. This nekochan post mentions the Mean Well PT-65B as a drop in replacement for the MMSC power supply. The Mean Well PT-65B seems to be readily available in the $20 - $30 range (depending on the source).

Quote:
My problem seems to be working out how to dictate the Global Master so I know who the head-node is. I seem to have fairly good luck bringing up three racks and getting it booting into IRIX, but when I add the other two racks, the Global Master tends to waffle around (/hw/module/2/n/1? No, wait, now it's 3/n/1. Make a decision!)

I had initially assumed that the last rack powered on would always become the Master, but this isn't the case, so there must be some way of telling it what's what when manually bringing things up. Any help would be gratefully received.
Haven't ever tried a multi-rack O2k/Onyx2 without using a linked MMSC per rack, but it might be possible the module numbers assigned in each individual PROM conflict, and the assignment of master ends up being which ever module that completes power on diagnostics first. If that's the case, (and you haven't already done so), you might take a look at the PROM commands "modnum" and "mvmodule". Depending on your PROM revision, you should be able to bring up brief usage synopses at the PROM command line by querying "help modnum" or "help mvmodule" (you may find assigning module numbers easier if you connect directly to the MSC of each unlinked compute module separately).

O2K/Onyx2 hardware stores an inventory found during the last successful power-on diagnostics session, if you change the location or configuration of what the inventory expects, the power-on diagnostics routine doesn't always play nice and may disable relocated, unexpected or unconfigured hardware as a fail-safe. If that happens, or you'd like to make sure any changes you've made have been assimilated, there's some background on the process of clearing stale entries from the power-on diagnostic (POD) logs in this thread. And for what its worth, If I were faced with the same situation I'd probably work on getting each individual rack up and running to minimize headaches when you CrayLink all five.

Good luck with the system!

_________________
***********************************************************************
Welcome to ARMLand - 0/0x0d00
running...(sherwood-root 0607201829)
* InfiniteReality/Reality Software, IRIX 6.5 Release *
***********************************************************************


Top
 Profile  
 
Unread postPosted: Thu Apr 12, 2012 10:02 am 
Offline

Joined: Wed Jul 19, 2006 7:37 am
Posts: 5749
Location: Renton, WA
Make sure you're following a correct topology when NUMAlinking, because the machines do expect a hypercube and will not work right if the connections are in unsupported topologies.

_________________
Damn the torpedoes, full speed ahead!

There are those who say I'm a bit of a curmudgeon. To them I reply: "GET OFF MY LAWN!"

:Indigo: :Octane: :Indigo2: :Indigo2IMP: :Indy: :PI: :O3x0: :ChallengeL: :O2000R: (single-CM)


Top
 Profile  
 
Unread postPosted: Thu Apr 12, 2012 12:00 pm 
Offline

Joined: Thu Oct 11, 2007 9:04 am
Posts: 19
SAQ wrote:
Make sure you're following a correct topology when NUMAlinking, because the machines do expect a hypercube and will not work right if the connections are in unsupported topologies.


I'm not sure if this is correct, or otherwise.
I have no good reason to doubt your wisdom - except that when I've had the unit running with three racks side-by-side, I was daisy-chaining them together. Lowest ports on r1 router to lowest on r2, middle on r2 to middle on r3... and it booted up and displayed all 24 processors in PROM, and again in IRIX.

I have the impression it'll work just fine... but it will be noticeably faster if appropriately configured.

My serial term died on me last night, so I haven't had a chance to get modnum, but that is my next line of attack.

- JP


Top
 Profile  
 
Unread postPosted: Fri Apr 13, 2012 9:20 am 
Offline

Joined: Sat Mar 29, 2008 9:26 am
Posts: 89
Location: WI
I had a similar issue with a 2 rack Origin 2000 system where one of the modules in the second rack kept trying to come up as the global master, even though module 1 in rack 1 was already the master (which it should be). It ended up being a problem on one module in the second rack. The thread recondas referenced (http://forums.nekochan.net/viewtopic.php?f=3&t=17194&p=134279&#p134279) is what got that system back up and running.

You mentioned 5 racks, but what is the module configuration?

_________________
:A3504L: :320: :1600SW: :Indy: :O2: :O2: :O2+: :Octane2: :O2000R: :Indigo2IMP: :Indigo2IMP:


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 5 posts ] 

All times are UTC - 8 hours


Who is online

Users browsing this forum: Bing [Bot], Google [Bot] and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group