Mirroring Techpubs Docs

New to SGIs? Need help getting things going? This is the forum for you!
Forum rules
Any posts concerning pirated software or offering to buy/sell/trade commercial software are subject to removal.
User avatar
DraconianTimes
Posts: 205
Joined: Fri Mar 05, 2004 5:39 am
Location: Leafy Surrey, UK

Mirroring Techpubs Docs

Unread postby DraconianTimes » Tue May 09, 2006 12:55 pm

I asked a question about a year or so back about making my own mirror of techpubs but was unsuccessful. In light of recent SGI developments, I'd like to create my own snapshots of all the current SGI & IRIX docs should they become unavailable in the future. I've got original IRIX CD media with various docs, but I'd really like to grab the newest document versions direct from SGI themselves and then burn them onto my own DVDs.

In my previous attempts I had tried using wget in various ways to grab the info, but this ended up giving me all sorts of issues with their CGI setup. Has anyone here got a sure-fire way of grabbing the docs (preferably PDFs) off of the SGI Techpubs site (in an automated fashion)?

Many thanks in advance!
Nick

User avatar
zafunk
Posts: 1060
Joined: Fri Feb 13, 2004 12:51 am
Location: Victoria, BC, Canada

Unread postby zafunk » Tue May 09, 2006 1:18 pm

Have you tried Zoontf's technique?

viewtopic.php?t=4241&highlight=wget+techpubs

It worked for me (before SGI revamped the site). I seem to remember having to remove a couple of commands to get it to run. YMMV.
Last edited by josehill on Sun Apr 05, 2009 4:59 pm, edited 1 time in total.
Reason: fixed link

User avatar
DraconianTimes
Posts: 205
Joined: Fri Mar 05, 2004 5:39 am
Location: Leafy Surrey, UK

Unread postby DraconianTimes » Tue May 09, 2006 2:32 pm

I've tried the suggestions in the link but am having trouble getting it working.

Code: Select all

$ cat grabdocs2.sh

#!/bin/sh
wget -r --accept="*.pdf,download.cgi*" \
--reject="browse.cgi,summary.cgi,init.cgi,help.cgi,feedback.cgi,shownew.cgi,listdocs.cgi" \
--domains=techpubs.sgi.com -nd -i techpubs.txt 2> log.txt &     

$ cat techpubs.txt                                                                                             
http://techpubs.sgi.com/library/tpl/cgi-bin/browse.cgi?db=bks&coll=hdwr&pth=ALL
http://techpubs.sgi.com/library/tpl/cgi-bin/browse.cgi?db=bks&coll=0650&pth=ALL           


It runs and dumps the html for each download's link page e.g.

Code: Select all

...
-rw-r--r--  1 nick  nick  13641 May  9 22:16 download.cgi?coll=hdwr&db=bks&docnumber=860-0218-002
-rw-r--r--  1 nick  nick  13627 May  9 22:16 download.cgi?coll=hdwr&db=bks&docnumber=860-0219-002
-rw-r--r--  1 nick  nick  13549 May  9 22:25 download.cgi?coll=hdwr&db=bks&docnumber=860-0220-001
-rw-r--r--  1 nick  nick  13557 May  9 22:25 download.cgi?coll=hdwr&db=bks&docnumber=860-0221-001
-rw-r--r--  1 nick  nick  13505 May  9 22:26 download.cgi?coll=hdwr&db=bks&docnumber=860-0222-001
-rw-r--r--  1 nick  nick  13576 May  9 22:26 download.cgi?coll=hdwr&db=bks&docnumber=860-0223-001 
...


Any ideas where I might be going wrong?

Thanks!
Nick

User avatar
jan-jaap
Posts: 3979
Joined: Thu Jun 17, 2004 11:35 am
Location: Wijchen, The Netherlands

Unread postby jan-jaap » Wed May 10, 2006 1:52 am

I'm sure there must have been a nice spike in the network traffic @ sgi.com yesterday, I refreshed my mirror as well :wink:

This is probably not the best way, but here's how I did it. Various Linuxisms (debian 3.1) may be hidden in here.

Code: Select all

#!/bin/bash

#set -x

# Freeware (fw) doesn't have books in it's collection
COLLECTIONS="0530 0620 0630 0640 0650 hdwr linux nt"

WGETOPT="-m -nv -T60 -t0 -nH --cut-dirs=2"

for coll in $COLLECTIONS; do
    mkdir manuals_$coll
    echo "#!/bin/sh" > wget_$coll.sh
    chmod 755 wget_$coll.sh
    echo "cd  manuals_$coll" >> wget_$coll.sh
    lynx -dump -width=999 "http://techpubs.sgi.com/library/tpl/cgi-bin/browse.cgi?db=bks&coll=$coll&pth=ALL" > dump_$coll.txt
    # Get part numbers
    grep "download.cgi" < dump_$coll.txt | cut -d '=' -f4 | sort > manuals_$coll.txt
    MANUALS=`cat manuals_$coll.txt`
    for book in $MANUALS; do
        major=`echo $book | cut -c5`
        echo "wget "$WGETOPT" http://techpubs.sgi.com/library/manuals/"$major"000/"$book"/pdf/"$book".pdf" >> wget_$coll.sh
        echo "wget "$WGETOPT" http://techpubs.sgi.com/library/manuals/"$major"000/"$book"/dl/"$book".html.tgz" >> wget_$coll.sh
    done
done;


This creates dirs 'manuals_0530' etc. and scripts 'wget_0530.sh' etc.

Scripts look like this:

Code: Select all

#!/bin/sh
cd  manuals_0530
wget -m -nv -T60 -t0 -nH --cut-dirs=2 http://techpubs.sgi.com/library/manuals/0000/007-0603-100/pdf/007-0603-100.pdf
wget -m -nv -T60 -t0 -nH --cut-dirs=2 http://techpubs.sgi.com/library/manuals/0000/007-0603-100/dl/007-0603-100.html.tgz
...

After inspecting the "wget_*.sh you run them.

Expect these download volumes (kB)
184928 manuals_0530
247840 manuals_0620
252076 manuals_0630
284396 manuals_0640
562524 manuals_0650
805100 manuals_hdwr
210080 manuals_linux
68356 manuals_nt

This is both the online (html.tgz) and pdf versions.
Now this is a deep dark secret, so everybody keep it quiet :)
It turns out that when reset, the WD33C93 defaults to a SCSI ID of 0, and it was simpler to leave it that way... -- Dave Olson, in comp.sys.sgi

Currently in commercial service: Image :Onyx2:(2x) :O3x02L:
In the museum: almost every MIPS/IRIX system.
Wanted: GM1 board for Professional Series GT graphics (030-0076-003, 030-0076-004)

yarrumevets
Posts: 116
Joined: Sat Sep 25, 2004 3:50 am
Location: Sydney, Australia

Unread postby yarrumevets » Wed May 10, 2006 6:13 am

2 Jan-Jaap: working perfectly for me on my Linux box (Fedora Core 2) - well done champ, thanks very much indeed!
Steve

User avatar
nekonoko
Site Admin
Site Admin
Posts: 7999
Joined: Thu Jan 23, 2003 2:31 am
Location: Pleasanton, California
Contact:

Unread postby nekonoko » Wed May 10, 2006 1:25 pm

jan-jaap wrote:IVarious Linuxisms (debian 3.1) may be hidden in here.


Works on IRIX too provided you change the first line to point to your local bash binary (and you have wget/lynx someplace). Been pulling down the docs for several hours now, thanks!
Twitter: @neko_no_ko
IRIX Release 4.0.5 IP12 Version 06151813 System V
Copyright 1987-1992 Silicon Graphics, Inc.
All Rights Reserved.

User avatar
DraconianTimes
Posts: 205
Joined: Fri Mar 05, 2004 5:39 am
Location: Leafy Surrey, UK

Unread postby DraconianTimes » Wed May 10, 2006 2:35 pm

jan-jaap wrote:Various Linuxisms (debian 3.1) may be hidden in here.


I changed /bin/bash to /bin/sh (ksh) here on OpenBSD and it seems to be working just fine. Am now downloading the needed 6.5 and Hardware docs to my local server.

Really great scripts - very much appreciated.

Many thanks again.

Nick

User avatar
Spidy
Posts: 333
Joined: Sat Jan 25, 2003 11:17 pm
Location: Melbourne, Australia

Unread postby Spidy » Wed May 10, 2006 3:07 pm

It might be worth mirroring them here, right next to the nekoware stuff?

They don't seem all that big :wink:
Man is the only animal smart enough to build the Empire State Building, and the only one stupid enough to jump off it.

User avatar
nekonoko
Site Admin
Site Admin
Posts: 7999
Joined: Thu Jan 23, 2003 2:31 am
Location: Pleasanton, California
Contact:

Unread postby nekonoko » Wed May 10, 2006 3:25 pm

Spidy wrote:It might be worth mirroring them here, right next to the nekoware stuff?

They don't seem all that big :wink:


Can't do that without their permission I'm afraid.
Twitter: @neko_no_ko
IRIX Release 4.0.5 IP12 Version 06151813 System V
Copyright 1987-1992 Silicon Graphics, Inc.
All Rights Reserved.

User avatar
ipaddict
Posts: 652
Joined: Tue Mar 14, 2006 3:14 am
Location: ::1

Unread postby ipaddict » Wed May 10, 2006 4:04 pm

Thank you jan-jaap! I am currently dumping everything onto my O200 and converting the PDFs to plaintext for easy greping. :D

Any idea how large this is going to get, and how often SGI really updates their doc tree? I wonder if I did an indepedant dump next week, how large the diff would be? Hmmm...

[Edit]Turned out to be just shy of 2.5GB. *LOTS* of 404s, though.[/Edit]
Last edited by ipaddict on Thu May 11, 2006 9:13 am, edited 1 time in total.

magellan
Posts: 230
Joined: Thu Mar 24, 2005 2:13 am
Location: Scotland

Unread postby magellan » Thu May 11, 2006 12:24 am

And what mirroring supportfolio patches as well :twisted:
BTW mirroring TPL works great on OSX as well.
:O2: :Indy: (KO) :Octane: (KO)

Looking for:
1600sw, O2 cam, Fuel

User avatar
compuman86
Posts: 120
Joined: Sat Dec 08, 2007 3:05 pm
Location: algonquin, il

Re: Mirroring Techpubs Docs

Unread postby compuman86 » Sun Apr 05, 2009 11:15 am

ok, ive got it running now on my server, 5 minutes ago and up to 100mb already, what all does this archive? is there a way to archive the freeware too? how about patches? I'm trying to archive all that I would ever need before it disappears!! :D
My SGI systems (in order received) :Indy: Deaconblues :Indigo: Badsneakers :Indigo2: Greenearrings :Indigo2IMP: Kidcharlemagne :Octane: Haitiandivorce :O2: Aja :320: :1600SW: MidnightCruiser
(looking for) :Fuel: Pretzellogic :Tezro: Blackfriday

User avatar
josehill
Moderator
Moderator
Posts: 2979
Joined: Mon Jun 06, 2005 9:53 pm
Location: USA
Contact:

Re: Mirroring Techpubs Docs

Unread postby josehill » Sun Apr 05, 2009 12:04 pm

ok, ive got it running now on my server, 5 minutes ago and up to 100mb already, what all does this archive?
No offense intended, but if you are running a script that is thwacking a website without understanding what the script is actually doing, you probably shouldn't be running the script, even if it comes from a reputable character like jan-jaap...

User avatar
compuman86
Posts: 120
Joined: Sat Dec 08, 2007 3:05 pm
Location: algonquin, il

Re: Mirroring Techpubs Docs

Unread postby compuman86 » Sun Apr 05, 2009 1:26 pm

No offense intended,
none taken
but if you are running a script that is thwacking a website without understanding what the script is actually doing, you probably shouldn't be running the script, even if it comes from a reputable character like jan-jaap...

I know, I know. I have a good general idea of what it does, but I just wanted to confirm what was happening here, and ask if there was a way to archive some of the other portions of the site. I misunderstood that the techpubs site also held software, and patches, I realize thats supportfolio. so the correct question from me is, is there a way to archive supportfolio? without clicking each link?
My SGI systems (in order received) :Indy: Deaconblues :Indigo: Badsneakers :Indigo2: Greenearrings :Indigo2IMP: Kidcharlemagne :Octane: Haitiandivorce :O2: Aja :320: :1600SW: MidnightCruiser
(looking for) :Fuel: Pretzellogic :Tezro: Blackfriday

User avatar
deBug
Posts: 840
Joined: Mon Feb 27, 2006 2:44 pm
Location: Sweden

Re: Mirroring Techpubs Docs

Unread postby deBug » Sun Apr 05, 2009 1:54 pm

OK, I have completed the download of the tech pubs from Diego.
He also provided me with some sites that he mirrored five years ago.
There is still some more stuff that I will download from him whenever he can find some more time.

So thank you so much Diego!


It is all available at the Swedish Nekoware Mirror http://se.mirror.nekoware.se
I have also put up some other miscellaneous SGI info and files I had.

If you have more stuff you can contribute like mirrored sites or IRIX patches/software
please PM me and I will put it up there.

I intend to keep this server up for a long time so there is no need for you to download it all from it :-)

Enjoy!
//deBug
Mein Führer, I can walk!


Return to “Getting Started, Documentation, Tips & Tricks”

Who is online

Users browsing this forum: No registered users and 1 guest