Mirroring a wordpress website with wget

6,721

Solution 1

Short answer : You can't, that's how the internet works.

Long answer :

Two factors make what you want impossible, and that's by design.

1) PHP files aren't provided to client, they are evaluated server-side to produce HTML documents that are then sent to the client. That allow the developper to keep the source code of his website to himself, which increases security. (Even though Wordpress, here, is open-source)

2) Most of the website content is stored in a database, which is no more available to you than the php files(if it is, that's a severe security flaw), as it is also the server who queries it to produce the HTML result.

All you can do is get a static version of the website. WinHTTPTracker for Windows, for example, allows you to do that. There must be equivalent tools for linux.

Solution 2

It's a general fault thinking that a PHP file can be grabbed with WGET. When you run

wget -rkp -l3 -np -nH --cut-dirs=1 http://www.sharons.org.uk/
wget -r http://www.sharons.org.uk
wget --user-agent=Mozilla --content-disposition --mirror --convert-links - A php -E -K -p http://www.sharons.org.uk/

or anything like that, on the server side a lot of things happen:

  • The web server notifies the REQUEST from you / wget
  • The web server then executes php against index.php or any other requested
  • PHP querries MySQL as instructed from wordpress php files
  • PHP then returns to the web server HTML only data
  • This data is returned to the user as what you see as the home page.

The correct approach to your problem is

  • SSH into your server, or login to the administration interface (cPanel, WHM, etc.)
  • Archive or grab the whole public_html or the root directory of your site
  • Connect to your MySQL Server and backup Wordpress database by myqsladmin or phpMyAdmin

P.S: if this is your own blog, as you state, credentials/logins should not be a problem

P.S2: as i suspect, you are trying to mirror someone else site without explicit permission, and this is out of superuser.com scope

Sorry if i misunderstood

Solution 3

Just done similar on my ubuntu server .. but you can see if my steps can help you with your issue.. ok, lets'go.

I have standard LAMP on my server and I had to mirror site to godaddy,, and easiest way was with wget,, I did it like this:

  • killed my apache service => /etc/init.d/apache2/stop
  • possition my self to root folder of my website => cd /var/www/webroot
  • run local python server on http port => python -m SimpleHTTPServer 80
  • on my godaddy server ssh pulled whole site => wget -m http://web-site.com

this flag -m is for mirror,, perfect mirror.. and it works :)

Do not forget to change your wp-config.php password if someone in the meantime pulled your site also down with connection pars :)

that's it :)

hth, krex

Share:
6,721

Related videos on Youtube

boudiccas
Author by

boudiccas

Updated on September 18, 2022

Comments

  • boudiccas
    boudiccas almost 2 years

    I'm trying to download a wordpress web-site, my blog actually, and to get the php files as well. So far I've tried -

    wget -rkp -l3 -np -nH --cut-dirs=1 http://www.sharons.org.uk/
    wget -r http://www.sharons.org.uk
    wget --user-agent=Mozilla --content-disposition --mirror --convert-links - A php -E -K -p http://www.sharons.org.uk/
    

    but I can't get past the first index.html page.

    How can I do it please?

  • boudiccas
    boudiccas over 10 years
    Sorry, it is my website and blog, and I rsync to it, but I'm just trying to learn how to get the php files as well, and get past the just one index.html.
  • Sir.pOpE
    Sir.pOpE over 10 years
    Ok, i understand, as i try to explain, .php files are never sent to user in RAW form, they are processed by the PHP Hypertext Processor itself. The output is then rerouted to the user. Using wget you behave as an ordinary site user.
  • Kamil Maciorowski
    Kamil Maciorowski over 6 years
    My wget says Both --no-clobber and --convert-links were specified, only --convert-links will be used. I guess the command is not optimal then.
  • minimallinux
    minimallinux over 6 years
    Still got the whole site with wget only using --convert-links ?
  • nilon
    nilon about 3 years
    @minimallinux No. Not at all.
  • nilon
    nilon about 3 years
    Could you please elaborate? This seems interesting.
  • nilon
    nilon about 3 years
    Was this answer actually useful to someone?