Import Images From A Migrated WordPress


Here's how to solve a common WordPress problem.

I want to re-import all my blog's images into the media library. I've moved my blog to a new host - but kept the same domain name. I started with a new WordPress install, the uploads folder still has all my images, but WordPress can't see them.

None of the plugins I found worked with huge amounts of images spread across multiple directories.

One Line Fix

This uses the WP-CLI program. Install it on your server, then run:

./wp-cli.phar media import wp-content/uploads/2017/08/example.jpg --post_id=123 --skip-copy

That imports the image, attaches it to a post, and keeps it on disk in the same location.

Easy! But how do we get all the images, spanning a thousand blog posts?

Get All Post IDs

./wp-cli.phar post list --field=ID

Get the HTML of the post

./wp-cli.phar post get 123 --field=post_content

Extract the images

We can use PHP's DOMDocument to grab the images from the HTML.

$html = "<html>....";
$doc = new DOMDocument();
$doc->loadHTML($html);
$images = $doc->getElementsByTagName('img');
foreach($images as $img) { 
   echo $img->getAttribute('src') . "\n";
}

And that spits out all the images in the post.

Putting it all together

BACK UP YOUR DATABASE FIRST, YOU MUPPET!

ahem

This will be a mixture of PHP and bash scripts.

  1. Save all the post IDs to a file.
  2. Iterate through the IDs and save the HTML to separate files, named after the post ID.
  3. Go through each file, grab every image URL.
  4. If the image URL starts with "https://shkspr.mobi" add it to the media library and associate it with the post.

1 Save all the post IDs

./wp-cli.phar post list --field=ID > posts.txt

Takes a minute to run.

2 Generate the HTML

cat posts.txt | while read line ; do ./wp-cli.phar post get $line --field=post_content > test/$line.html ; done

Takes a few minutes to run.

3 Grab images

Within the test directory create commands.php:

<?php
$start = "https://shkspr.mobi/blog/";

foreach (glob("*.html") as $file) {
    if($file == '.' || $file == '..') continue;
    $id = substr($file, 0, -5);
    $html = file_get_contents($file);
    $doc = new DOMDocument();
    $doc->loadHTML($html);
    $images = $doc->getElementsByTagName('img');

    foreach($images as $img) {
       $imgSrc = $img->getAttribute('src');
       $starter = substr( $imgSrc, 0, strlen($start) );
       if($starter == $start) {
          $imgLocation = substr( $imgSrc, strlen($start) );
          echo "./wp-cli.phar media import {$imgLocation} --post_id={$id} --skip-copy\n";
       }
    }
}

This can be run as php commands.php > commands.txt

After a few seconds, we now have a text file full of

./wp-cli.phar media import wp-content/uploads/2000/11/Pratchett-Truth.jpg --post_id=24322 --skip-copy
./wp-cli.phar media import wp-content/uploads/1999/12/tezza.jpg --post_id=24353 --skip-copy
./wp-cli.phar media import wp-content/uploads/1999/12/frontpag.gif --post_id=24353 --skip-copy

To run all the the commands:

  1. Move the commands.txt into the root of your blog.
  2. Run bash commands.txt

This will take a long time, depending on how many commands you have. Mine ran overnight.

You media library will now be full! All images will be correctly associated with their posts.

Leave a Reply

Your email address will not be published. Required fields are marked *