Nekochan Net

Official Chat Channel: #nekochan // irc.nekochan.net
It is currently Wed Apr 23, 2014 4:57 pm

All times are UTC - 8 hours


Forum rules


Any posts concerning pirated software or offering to buy/sell/trade commercial software are subject to removal.



Post new topic Reply to topic  [ 37 posts ]  Go to page 1, 2, 3  Next
Author Message
 Post subject: Mirroring Techpubs Docs
Unread postPosted: Tue May 09, 2006 11:55 am 
Offline
User avatar

Joined: Fri Mar 05, 2004 4:39 am
Posts: 205
Location: Leafy Surrey, UK
I asked a question about a year or so back about making my own mirror of techpubs but was unsuccessful. In light of recent SGI developments, I'd like to create my own snapshots of all the current SGI & IRIX docs should they become unavailable in the future. I've got original IRIX CD media with various docs, but I'd really like to grab the newest document versions direct from SGI themselves and then burn them onto my own DVDs.

In my previous attempts I had tried using wget in various ways to grab the info, but this ended up giving me all sorts of issues with their CGI setup. Has anyone here got a sure-fire way of grabbing the docs (preferably PDFs) off of the SGI Techpubs site (in an automated fashion)?

Many thanks in advance!
Nick


Top
 Profile  
 
 Post subject:
Unread postPosted: Tue May 09, 2006 12:18 pm 
Offline
User avatar

Joined: Thu Feb 12, 2004 11:51 pm
Posts: 1060
Location: Victoria, BC, Canada
Have you tried Zoontf's technique?

viewtopic.php?t=4241&highlight=wget+techpubs

It worked for me (before SGI revamped the site). I seem to remember having to remove a couple of commands to get it to run. YMMV.


Last edited by josehill on Sun Apr 05, 2009 3:59 pm, edited 1 time in total.
fixed link


Top
 Profile  
 
 Post subject:
Unread postPosted: Tue May 09, 2006 1:32 pm 
Offline
User avatar

Joined: Fri Mar 05, 2004 4:39 am
Posts: 205
Location: Leafy Surrey, UK
I've tried the suggestions in the link but am having trouble getting it working.

Code:
$ cat grabdocs2.sh

#!/bin/sh
wget -r --accept="*.pdf,download.cgi*" \
--reject="browse.cgi,summary.cgi,init.cgi,help.cgi,feedback.cgi,shownew.cgi,listdocs.cgi" \
--domains=techpubs.sgi.com -nd -i techpubs.txt 2> log.txt &     

$ cat techpubs.txt                                                                                             
http://techpubs.sgi.com/library/tpl/cgi-bin/browse.cgi?db=bks&coll=hdwr&pth=ALL
http://techpubs.sgi.com/library/tpl/cgi-bin/browse.cgi?db=bks&coll=0650&pth=ALL           


It runs and dumps the html for each download's link page e.g.
Code:
...
-rw-r--r--  1 nick  nick  13641 May  9 22:16 download.cgi?coll=hdwr&db=bks&docnumber=860-0218-002
-rw-r--r--  1 nick  nick  13627 May  9 22:16 download.cgi?coll=hdwr&db=bks&docnumber=860-0219-002
-rw-r--r--  1 nick  nick  13549 May  9 22:25 download.cgi?coll=hdwr&db=bks&docnumber=860-0220-001
-rw-r--r--  1 nick  nick  13557 May  9 22:25 download.cgi?coll=hdwr&db=bks&docnumber=860-0221-001
-rw-r--r--  1 nick  nick  13505 May  9 22:26 download.cgi?coll=hdwr&db=bks&docnumber=860-0222-001
-rw-r--r--  1 nick  nick  13576 May  9 22:26 download.cgi?coll=hdwr&db=bks&docnumber=860-0223-001 
...


Any ideas where I might be going wrong?

Thanks!
Nick


Top
 Profile  
 
 Post subject:
Unread postPosted: Wed May 10, 2006 12:52 am 
Offline
User avatar

Joined: Thu Jun 17, 2004 10:35 am
Posts: 3774
Location: Wijchen, The Netherlands
I'm sure there must have been a nice spike in the network traffic @ sgi.com yesterday, I refreshed my mirror as well :wink:

This is probably not the best way, but here's how I did it. Various Linuxisms (debian 3.1) may be hidden in here.

Code:
#!/bin/bash

#set -x

# Freeware (fw) doesn't have books in it's collection
COLLECTIONS="0530 0620 0630 0640 0650 hdwr linux nt"

WGETOPT="-m -nv -T60 -t0 -nH --cut-dirs=2"

for coll in $COLLECTIONS; do
    mkdir manuals_$coll
    echo "#!/bin/sh" > wget_$coll.sh
    chmod 755 wget_$coll.sh
    echo "cd  manuals_$coll" >> wget_$coll.sh
    lynx -dump -width=999 "http://techpubs.sgi.com/library/tpl/cgi-bin/browse.cgi?db=bks&coll=$coll&pth=ALL" > dump_$coll.txt
    # Get part numbers
    grep "download.cgi" < dump_$coll.txt | cut -d '=' -f4 | sort > manuals_$coll.txt
    MANUALS=`cat manuals_$coll.txt`
    for book in $MANUALS; do
        major=`echo $book | cut -c5`
        echo "wget "$WGETOPT" http://techpubs.sgi.com/library/manuals/"$major"000/"$book"/pdf/"$book".pdf" >> wget_$coll.sh
        echo "wget "$WGETOPT" http://techpubs.sgi.com/library/manuals/"$major"000/"$book"/dl/"$book".html.tgz" >> wget_$coll.sh
    done
done;


This creates dirs 'manuals_0530' etc. and scripts 'wget_0530.sh' etc.

Scripts look like this:
Code:
#!/bin/sh
cd  manuals_0530
wget -m -nv -T60 -t0 -nH --cut-dirs=2 http://techpubs.sgi.com/library/manuals/0000/007-0603-100/pdf/007-0603-100.pdf
wget -m -nv -T60 -t0 -nH --cut-dirs=2 http://techpubs.sgi.com/library/manuals/0000/007-0603-100/dl/007-0603-100.html.tgz
...

After inspecting the "wget_*.sh you run them.

Expect these download volumes (kB)
184928 manuals_0530
247840 manuals_0620
252076 manuals_0630
284396 manuals_0640
562524 manuals_0650
805100 manuals_hdwr
210080 manuals_linux
68356 manuals_nt

This is both the online (html.tgz) and pdf versions.

_________________
Now this is a deep dark secret, so everybody keep it quiet :)
It turns out that when reset, the WD33C93 defaults to a SCSI ID of 0, and it was simpler to leave it that way... -- Dave Olson, in comp.sys.sgi

Currently in commercial service: Image :Onyx2:(2x) :O3x02L:
In the museum: almost every MIPS/IRIX system.
Wanted: GM1 board for Professional Series GT graphics (030-0076-003, 030-0076-004)


Top
 Profile  
 
 Post subject:
Unread postPosted: Wed May 10, 2006 5:13 am 
Offline

Joined: Sat Sep 25, 2004 2:50 am
Posts: 116
Location: Sydney, Australia
2 Jan-Jaap: working perfectly for me on my Linux box (Fedora Core 2) - well done champ, thanks very much indeed!
Steve


Top
 Profile  
 
 Post subject:
Unread postPosted: Wed May 10, 2006 12:25 pm 
Offline
Site Admin
Site Admin
User avatar

Joined: Thu Jan 23, 2003 1:31 am
Posts: 7956
Location: Pleasanton, California
jan-jaap wrote:
IVarious Linuxisms (debian 3.1) may be hidden in here.


Works on IRIX too provided you change the first line to point to your local bash binary (and you have wget/lynx someplace). Been pulling down the docs for several hours now, thanks!

_________________
Twitter: @neko_no_ko
IRIX Release 4.0.5 IP12 Version 06151813 System V
Copyright 1987-1992 Silicon Graphics, Inc.
All Rights Reserved.


Top
 Profile  
 
 Post subject:
Unread postPosted: Wed May 10, 2006 1:35 pm 
Offline
User avatar

Joined: Fri Mar 05, 2004 4:39 am
Posts: 205
Location: Leafy Surrey, UK
jan-jaap wrote:
Various Linuxisms (debian 3.1) may be hidden in here.


I changed /bin/bash to /bin/sh (ksh) here on OpenBSD and it seems to be working just fine. Am now downloading the needed 6.5 and Hardware docs to my local server.

Really great scripts - very much appreciated.

Many thanks again.

Nick


Top
 Profile  
 
 Post subject:
Unread postPosted: Wed May 10, 2006 2:07 pm 
Offline
User avatar

Joined: Sat Jan 25, 2003 10:17 pm
Posts: 333
Location: Melbourne, Australia
It might be worth mirroring them here, right next to the nekoware stuff?

They don't seem all that big :wink:

_________________
Man is the only animal smart enough to build the Empire State Building, and the only one stupid enough to jump off it.


Top
 Profile  
 
 Post subject:
Unread postPosted: Wed May 10, 2006 2:25 pm 
Offline
Site Admin
Site Admin
User avatar

Joined: Thu Jan 23, 2003 1:31 am
Posts: 7956
Location: Pleasanton, California
Spidy wrote:
It might be worth mirroring them here, right next to the nekoware stuff?

They don't seem all that big :wink:


Can't do that without their permission I'm afraid.

_________________
Twitter: @neko_no_ko
IRIX Release 4.0.5 IP12 Version 06151813 System V
Copyright 1987-1992 Silicon Graphics, Inc.
All Rights Reserved.


Top
 Profile  
 
 Post subject:
Unread postPosted: Wed May 10, 2006 3:04 pm 
Offline
User avatar

Joined: Tue Mar 14, 2006 2:14 am
Posts: 652
Location: ::1
Thank you jan-jaap! I am currently dumping everything onto my O200 and converting the PDFs to plaintext for easy greping. :D

Any idea how large this is going to get, and how often SGI really updates their doc tree? I wonder if I did an indepedant dump next week, how large the diff would be? Hmmm...

[Edit]Turned out to be just shy of 2.5GB. *LOTS* of 404s, though.[/Edit]


Last edited by ipaddict on Thu May 11, 2006 8:13 am, edited 1 time in total.

Top
 Profile  
 
 Post subject:
Unread postPosted: Wed May 10, 2006 11:24 pm 
Offline

Joined: Thu Mar 24, 2005 1:13 am
Posts: 230
Location: Scotland
And what mirroring supportfolio patches as well :twisted:
BTW mirroring TPL works great on OSX as well.

_________________
:O2: :Indy: (KO) :Octane: (KO)

Looking for:
1600sw, O2 cam, Fuel


Top
 Profile  
 
Unread postPosted: Sun Apr 05, 2009 10:15 am 
Offline
User avatar

Joined: Sat Dec 08, 2007 2:05 pm
Posts: 120
Location: algonquin, il
ok, ive got it running now on my server, 5 minutes ago and up to 100mb already, what all does this archive? is there a way to archive the freeware too? how about patches? I'm trying to archive all that I would ever need before it disappears!! :D

_________________
My SGI systems (in order received) :Indy: Deaconblues :Indigo: Badsneakers :Indigo2: Greenearrings :Indigo2IMP: Kidcharlemagne :Octane: Haitiandivorce :O2: Aja :320: :1600SW: MidnightCruiser
(looking for) :Fuel: Pretzellogic :Tezro: Blackfriday


Top
 Profile  
 
Unread postPosted: Sun Apr 05, 2009 11:04 am 
Offline
Moderator
Moderator
User avatar

Joined: Mon Jun 06, 2005 8:53 pm
Posts: 2857
Location: USA
Quote:
ok, ive got it running now on my server, 5 minutes ago and up to 100mb already, what all does this archive?
No offense intended, but if you are running a script that is thwacking a website without understanding what the script is actually doing, you probably shouldn't be running the script, even if it comes from a reputable character like jan-jaap...


Top
 Profile  
 
Unread postPosted: Sun Apr 05, 2009 12:26 pm 
Offline
User avatar

Joined: Sat Dec 08, 2007 2:05 pm
Posts: 120
Location: algonquin, il
Quote:
No offense intended,
none taken
Quote:
but if you are running a script that is thwacking a website without understanding what the script is actually doing, you probably shouldn't be running the script, even if it comes from a reputable character like jan-jaap...

I know, I know. I have a good general idea of what it does, but I just wanted to confirm what was happening here, and ask if there was a way to archive some of the other portions of the site. I misunderstood that the techpubs site also held software, and patches, I realize thats supportfolio. so the correct question from me is, is there a way to archive supportfolio? without clicking each link?

_________________
My SGI systems (in order received) :Indy: Deaconblues :Indigo: Badsneakers :Indigo2: Greenearrings :Indigo2IMP: Kidcharlemagne :Octane: Haitiandivorce :O2: Aja :320: :1600SW: MidnightCruiser
(looking for) :Fuel: Pretzellogic :Tezro: Blackfriday


Top
 Profile  
 
Unread postPosted: Sun Apr 05, 2009 12:54 pm 
Offline
User avatar

Joined: Mon Feb 27, 2006 1:44 pm
Posts: 839
Location: Sweden
OK, I have completed the download of the tech pubs from Diego.
He also provided me with some sites that he mirrored five years ago.
There is still some more stuff that I will download from him whenever he can find some more time.

So thank you so much Diego!


It is all available at the Swedish Nekoware Mirror http://se.mirror.nekoware.se
I have also put up some other miscellaneous SGI info and files I had.

If you have more stuff you can contribute like mirrored sites or IRIX patches/software
please PM me and I will put it up there.

I intend to keep this server up for a long time so there is no need for you to download it all from it :-)

Enjoy!
//deBug

_________________
Mein Führer, I can walk!


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 37 posts ]  Go to page 1, 2, 3  Next

All times are UTC - 8 hours


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group