Archive for June 29th, 2011

Linux Wget : The Ultimate Command Line Downloader

GNU Wget is a free utility for non-interactive download of files from the Web. It supports http, https, and ftp protocols, as well as retrieval through http proxies.Its features include recursive download, conversion of links for offline viewing of local HTML, support for proxies, and much more.

Basic wget Commands

To download a single file from the web use:

wget http://www.website.com/myfile.rar

For downloading a large file, for example an ISO image, this could take some time. If your Internet connection goes down, then what do you do? You will have to start the download again. If you are downloading a 700Mb ISO image on a slow connection, this could be very annoying! To get around this problem, you can use the -c parameter. This will continue the download after any disruptions. The command is like :

wget -c http://www.website.com/ubuntu.iso

Some websites that do not allow you to download files using a download manager.To overcome this use the command given below.By using this command it will pass wget off as being a Mozilla web browser

wget -U mozilla http://www.website.com/flower.jpg

Downloading Entire Site

Wget is also able to download an entire website. But because this can put a heavy load upon the server, wget will obey the robots.txt file.For simple cases we can use the command like :

wget -r -p -l 2 http://www.website.com

-r = wget recursively
-p = download all files (incl. images) necessary to render the html pages
-l 2 = descend maximum 2 levels (default is 5)

So if you don’t want wget to obey the robots.txt file,you can simply add -e robots=off to the command like this:

wget -r -p -e robots=off http://www.website.com

Some sites will not let you download the entire site, they will check your browsers identity. To overcome this, use -U mozilla

wget -r -p -e robots=off -U mozilla http://www.website.com

Some websites will not allow you to download the entire site. If the server sees that you are downloading a large amount of files, it may automatically add you to it’s black list. The way around this is to wait a few seconds after every download. The way to do this using wget is by including –wait=X (where X is the amount of seconds).You can also use the parameter: –random-wait to let wget chose a random number of seconds to wait. To do this use the command as:

wget --random-wait -r -p -e robots=off -U mozilla http://www.website.com