Friday, October 23, 2009

Special Wget commands

Wget is a free utility for non-interactive download of files from the Web. It supports http, https, and ftp protocols, as well as retrieval through http proxies. Every linux user use terminal, and using the terminal need from you to have some knowledge of command line, so today i choosed you a list of command line wget that perhaps you diden`t hear about them before.

1- Wget basic command list this surely you did hear see it and use it before:
cd picdir/ && wget -nd -pHEKk
Mean store the current browsable photo to the current picdir directory

wget -c http://www.unixmen/

Download a file with the ability to stop the download and resume later
wget -r -nd -np -l1 -A '*.jpg'
Download a site of files to the current directory

wget -r
Download an entire website

echo 'wget -c' | at 09:00
Start a download at any given time

wget ftp://remote/filex.iso/
The usage of FTP is as simple. Wget will take care of login and password

wget --limit-rate=30k
Limit download of a link in 30 Kb/s

wget -nv --spider --force-html -i bookmarks.html
Check the links in a fi

wget --mirror
Update a local copy of a website

2- This wget command save a html page and convert it to a .pdf
wget $URL | htmldoc --webpage -f "$URL".pdf - ; xpdf "$URL".pdf &

3- Wget command to get photos from picasa Album :
wget 'link of a Picasa WebAlbum'
-O - |perl -e'while(<>){while(s/"media":{"content":\[{"url":"(.+?\.JPG)//){print
"$1\n"}}' |wget -w1 -i -

4-Check twitter if you can connect :
wget -q -O -

5- Wget command to get all the Zips files and Pdf from a website :
wget --reject html,htm --accept pdf,zip -rl1 url

If the website use https then :
wget --reject html,htm --accept pdf,zip -rl1 --no-check-certificate https-url

6- Wget command to check if a remote file exist :
wget --spider -v

7- Wget command to download files from Rapideshare primium
wget -c -t 1 --load-cookies ~/.cookies/rapidshare

8- Wget command to extract a tarball file from a host without local saving :
wget -qO - "" | tar zxvf -

9- Block known dirty hosts from reaching your machine :
wget -qO -|awk '!/#|[a-z]/&&/./{print "iptables -A INPUT -s "$1" -j DROP"}'

Blacklisted is a compiled list of all known dirty hosts (botnets, spammers, bruteforcers, etc.) which is updated on an hourly basis. This command will get the list and create the rules for you, if you want them automatically blocked, append |sh to the end of the command line. It's a more practical solution to block all and allow in specifics however, there are many who don't or can't do this which is where this script will come in handy. For those using ipfw, a quick fix would be {print "add deny ip from "$1" to any}. Posted in the sample output are the top two entries. Be advised the blacklisted file itself filters out RFC1918 addresses (10.x.x.x, 172.16-31.x.x, 192.168.x.x) however, it is advisable you check/parse the list before you implement the rules

10- Wget command to download the entire website :
wget --random-wait -r -p -e robots=off -U mozilla