Import Images From A Migrated WordPress
Here's how to solve a common WordPress problem.
I want to re-import all my blog's images into the media library. I've moved my blog to a new host - but kept the same domain name. I started with a new WordPress install, the uploads
folder still has all my images, but WordPress can't see them.
None of the plugins I found worked with huge amounts of images spread across multiple directories.
One Line Fix
This uses the WP-CLI program. Install it on your server, then run:
./wp-cli.phar media import wp-content/uploads/2017/08/example.jpg --post_id=123 --skip-copy
That imports the image, attaches it to a post, and keeps it on disk in the same location.
Easy! But how do we get all the images, spanning a thousand blog posts?
Get All Post IDs
./wp-cli.phar post list --field=ID
Get the HTML of the post
./wp-cli.phar post get 123 --field=post_content
Extract the images
We can use PHP's DOMDocument to grab the images from the HTML.
PHP$html = "<html>....";
$doc = new DOMDocument();
$doc->loadHTML($html);
$images = $doc->getElementsByTagName('img');
foreach($images as $img) {
echo $img->getAttribute('src') . "\n";
}
And that spits out all the images in the post.
Putting it all together
BACK UP YOUR DATABASE FIRST, YOU MUPPET!
ahem
This will be a mixture of PHP and bash scripts.
- Save all the post IDs to a file.
- Iterate through the IDs and save the HTML to separate files, named after the post ID.
- Go through each file, grab every image URL.
- If the image URL starts with "https://shkspr.mobi" add it to the media library and associate it with the post.
1 Save all the post IDs
./wp-cli.phar post list --field=ID > posts.txt
Takes a minute to run.
2 Generate the HTML
cat posts.txt | while read line ; do ./wp-cli.phar post get $line --field=post_content > test/$line.html ; done
Takes a few minutes to run.
3 Grab images
Within the test
directory create commands.php
:
PHP$start = "https://shkspr.mobi/blog/";
foreach (glob("*.html") as $file) {
if($file == '.' || $file == '..') continue;
$id = substr($file, 0, -5);
$html = file_get_contents($file);
$doc = new DOMDocument();
$doc->loadHTML($html);
$images = $doc->getElementsByTagName('img');
foreach($images as $img) {
$imgSrc = $img->getAttribute('src');
$starter = substr( $imgSrc, 0, strlen($start) );
if($starter == $start) {
$imgLocation = substr( $imgSrc, strlen($start) );
echo "./wp-cli.phar media import {$imgLocation} --post_id={$id} --skip-copy\n";
}
}
}
This can be run as php commands.php > commands.txt
After a few seconds, we now have a text file full of
./wp-cli.phar media import wp-content/uploads/2000/11/Pratchett-Truth.jpg --post_id=24322 --skip-copy
./wp-cli.phar media import wp-content/uploads/1999/12/tezza.jpg --post_id=24353 --skip-copy
./wp-cli.phar media import wp-content/uploads/1999/12/frontpag.gif --post_id=24353 --skip-copy
To run all the the commands:
- Move the
commands.txt
into the root of your blog. - Run
bash commands.txt
This will take a long time, depending on how many commands you have. Mine ran overnight.
You media library will now be full! All images will be correctly associated with their posts.
Henry says:
This seems to work only by your eyes, with your knowledge. You should do a real step by step. I understand the purpose of the post, but couldn't do anything. I stoped in the "./wp-cli.phar: No such file or directory".
@edent says:
Hi Henry, did you click on the "Install WP-CLI" link? It takes you to https://wp-cli.org/ which has all the instructions for installing it.
I didn't include those instructions because it will depend on the server you are using and - as this blog post is over two years old - I didn't want it getting outdated.