LINUXMAKER, OpenSource, Tutorials

Website downloads using wget

With wget you can create a complete, static clone of a website. In this way it is possible, for example, to provide a website for offline use.

The prerequisite is the use of the command line tool "wget", which is part of every Linux distribution as well as MacOS. A complete website can be downloaded with the following input:

wget --recursive --no-clobber --page-requisites --html-extension --convert-links --domains www.example.com https://www.example.com/

Options

--recursive
Downloads pages recursively, following all links.

--no-clobber
If the download is interrupted, pages that have already been downloaded will not be downloaded again.

--page-requisites
Also downloads the content (images, scripts) required to display the page.

--html-extension
Saves all pages as HTML files

--convert-linksConverts the links so that the downloaded files link to each other (instead of the original source on the Internet).

--domains example.com
Only downloads pages from the domains specified here.

Procedure for extensive websites

In the case of particularly large websites, downloading all pages can take a long time and, above all, load the web server or ensure that the crawling computer is blacklisted. To avoid this, the following two options can be used:

--wait=20
Waits 20 seconds between page views (can of course also be set lower).

--limit-rate=20k
Limits the download speed to 20K (which would be very defensive).