Screenscraping Album Artwork From The Linux Command Line

Like many people, I've collected a fair number of CDs over the years. As hard-drives and MicroSD cards have got larger and cheaper, I've gradually been ripping them to FLAC. Most CD rippers automatically tag the music files with the correct metadata and, nowadays, they will also download and embed album artwork as well.

(As an aside, it always boggled my mind that CDs don't come with metadata burned onto the disc. Even a single spare megabyte would be enough to hold detailed track listing, artwork, etc.)

Back when I started, there was no way to get album artwork. Most media players will recognise that if a .jpg is in a folder with music, then it should be treated as the album artwork. This file is usually called "cover.jpg" or "albumart.jpg" - but that's only convention; any name will do.

So, rather than re-rip all by CDs, I wrote a quick bash script to scrape the images from First the script and then some notes about the choices I made when writing it.

#!/bin/bash -e
# This simple script will fetch the cover art for the album information provided on the command line.
# It will then download that cover image, and place it into the child directory.
# ./ 
# get_coverart Beatles/Sgt Pepper
# get_coverart Beatles/Sgt_Pepper
# get_coverart "Beatles - Sgt_Pepper"
# To auto-populate all directories in the current directory, run the following command
# find . -type d -exec ./get_coverart "{}" ;

# Escape any problematic character
encoded="$(perl -MURI::Escape -e 'print uri_escape($ARGV[0]);' "$albumpath")"

# Skip if a cover.jpg exists in the directory
if [ -f "$albumpath/cover.jpg" ]
echo "$albumpath/cover.jpg already exists"

# Tell the user what is going on
echo ""
echo "Searching for: [$1]"

# scraping
echo "Searching ... [$url]"

# Grab the first Amazon image without an underscore (usually the largest version)
coverurl=`wget -qO - "$url" | grep -m 1 -o '*/[%0-9a-zA-Z.,-]*.jpg'`

echo "Cover URL: [$coverurl]"

# Save the imager
wget "$coverurl" -O "$albumpath/cover.jpg"


I originally suggested this as an enhancement for the popular ABCDE Linux ripper.
It's based off this older, now obsolete, script. uses images from - why not just use the Amazon API?

The Amazon API is great but it requires that you get an account with Amazon and include an API key with every request. That means you can't just dump the script on a box and start downloading - you'd need to configure it first?

Why the change from XPATH?

I love XPATH and use it regularly. What I found when deploying this script to a new Ubuntu install was that xmllint wasn't installed by default. On the other hand, grep is installed on every machine. Seeing as how the Amazon images are a fixed pattern, a regular expression works just fine.

This will automatically download the first one. As this is a command line tool, there's no practical way to display the various images.
I did look at ASCII art conversion, but that's problematic.
Some albums work well - e.g. Little Mix's DNA


Whereas Sgt Pepper is hard to make out.


There are amazing tools like aview - but again, that's an extra program which the user might not have.

If your album directories are sensibly named, the first hit is usually good enough.

Hang on! There's a mistake!

Quite probably, this is a quick and dirty script. I'm sure there are lots of edge case and (no-doubt) some poor coding practices. If you wish to contribute a patch, please drop it in the comments.

Share this post on…

9 thoughts on “Screenscraping Album Artwork From The Linux Command Line”

  1. says:

    I use beets ( which does almost all the management of my 80+GB music library. It tags, fetches art, manages the directory structure and is totally customizable.

    Your script works well, there's an <relative> at the last line.

    1. I've started using beets as well. Bit of a pain to configure, but once it works it's incredible. I have found that it does miss some album art though - hence the script.

      Thanks for the correction - now amended.

  2. doesn't like it if you send a query that contains parenthesis. For example: "Devo-Recombo_DNA_(disc_1_of_2_-_Sequence_A)" won't yield any results. Abcde names directories like this for multi-disc packages. An apparently decent way around this is to lop off the string submitted to albumart at the first paren encountered. Another enhancement is to refrain from writing a cover.jpg file when it didn't find a cover image. Here's a sloppy diff-ish thing of what I'm talking about

    +title=`echo $albumpath | cut -d'(' -f1`
    -encoded="$(perl -MURI::Escape -e 'print uri_escape($ARGV[0]);' "$albumpath")"
    +encoded="$(perl -MURI::Escape -e 'print uri_escape($ARGV[0]);' "$title")"
    -echo "Searching for: [$1]"
    +echo "Searching for: [$title]"
    +if [ -z $coverurl ] ; then
    + echo "Unable to find cover art for $title."
    + exit

  3. Oli says:

    A litte different concept ...


    if [ $# != 1 ]
    echo "Need Path for Search!"
    exit 1

    cd $1
    find . -maxdepth 2 -mindepth 2 -type d | while read -r dir

    # Escape any problematic character
    encoded="$(perl -MURI::Escape -e 'print uri_escape($ARGV[0]);' "$albumpath")"

    # Skip if a cover.jpg exists in the directory
    if [ -f "$albumpath/cover.jpg" ]
    echo "$albumpath/cover.jpg already exists"

    # Tell the user what is going on
    echo ""
    echo "Searching for: [$albumpath]"

    # scraping
    echo "Searching ... [$url]"

    # Grab the first Amazon image without an underscore (usually the largest version)
    coverurl=`wget -qO - "$url" | grep -m 1 -o '*/[%0-9a-zA-Z.,-]*.jpg'`

    if [ "x" == "x$coverurl" ]
    albumpath=`dirname "$albumpath"`
    echo "Neuer Versuch mit '$albumpath'"
    encoded="$(perl -MURI::Escape -e 'print uri_escape($ARGV[0]);' "$albumpath")"
    coverurl=`wget -qO - "$url" | grep -m 1 -o '*/[%0-9a-zA-Z.,-]*.jpg'`

    if [ "x" != "x$coverurl" ]
    echo "Cover URL: [$coverurl]"
    # Save the imager
    wget "$coverurl" -O "$dir/cover.jpg" 2$> /dev/null

  4. Will says:

    I had to use the following URL format for it to work today:

  5. Will says:

    And to get the above format with +'s in it, I found it useful to do this (since all my directory names use _ not spaces)


    # Split albumpath into artist and album

    # I need to replace the underscores with a '+' character

    echo "artist: $artist"
    echo "album: $album"


    # Scrape

    echo "Searching ... [$url]"
    coverurl=`wget -qO - "$url" | grep -m 1 -o '*/[%0-9a-zA-Z.,-]*.jpg'`

    echo "Cover URL: [$coverurl]"

    # Save the image jpg file
    wget "$coverurl" -O "$DEST_DIR/cover.jpg"

    And running it like this: james_morrison/undiscovered

    gives this output:
    artist: james+morrison
    album: undiscovered

    Searching for album art for: [james_morrison/undiscovered]
    Searching for: [james+morrison+undiscovered]
    Searching ... []
    Cover URL: []
    --2014-12-06 09:48:32--
    Resolving (,,, ...
    Connecting to (||:80... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 35448 (35K) [image/jpeg]
    Saving to: ‘/home/music/flac/james_morrison/undiscovered/cover.jpg’

    100%[======================================>] 35,448 --.-K/s in 0.02s

    2014-12-06 09:48:32 (1.85 MB/s) - ‘/home/music/flac/james_morrison/undiscovered/cover.jpg’ saved [35448/35448]

    And finally, to get cover art for ALL my music in one fell swoop, I ran this:

    for i in *; do for j in $i/*; do $j; done; done

  6. You may be interested to see that abcde now has the capability to download album art, I apologise for not using the approach that you suggested! The eventual successful patches came from the same thread on GoogleCode where your patch was suggested...

    Currently available only in the git version but it will go mainstream when 2.6.1 is released. The abcde FAQ in git has some detailed information on how to get it all working although sane defaults should guarantee a good result anyway. I am planning a web page with more detailed information, this will come out in a week or so...


What are your reckons?

All comments are moderated and may not be published immediately. Your email address will not be published.Allowed HTML: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <p> <pre> <br> <img src="" alt="" title="" srcset="">