Screenscraping Album Artwork From The Linux Command Line

album art cd cli hacking linux programming tools · 9 comments · 950 words · Viewed ~4,817 times.

Like many people, I've collected a fair number of CDs over the years. As hard-drives and MicroSD cards have got larger and cheaper, I've gradually been ripping them to FLAC. Most CD rippers automatically tag the music files with the correct metadata and, nowadays, they will also download and embed album artwork as well.

(As an aside, it always boggled my mind that CDs don't come with metadata burned onto the disc. Even a single spare megabyte would be enough to hold detailed track listing, artwork, etc.)

Back when I started, there was no way to get album artwork. Most media players will recognise that if a .jpg is in a folder with music, then it should be treated as the album artwork. This file is usually called "cover.jpg" or "albumart.jpg" - but that's only convention; any name will do.

So, rather than re-rip all by CDs, I wrote a quick bash script to scrape the images from albumart.org. First the script and then some notes about the choices I made when writing it.

 Bash#!/bin/bash -e
# get_coverart.sh
#
# This simple script will fetch the cover art for the album information provided on the command line.
# It will then download that cover image, and place it into the child directory.
#
# ./get_coverart.sh <relative -path>
#
# get_coverart Beatles/Sgt Pepper
#
# get_coverart Beatles/Sgt_Pepper
#
# get_coverart "Beatles - Sgt_Pepper"
#
# To auto-populate all directories in the current directory, run the following command
#
# find . -type d -exec ./get_coverart "{}" ;
albumpath="$1"

# Escape any problematic character
encoded="$(perl -MURI::Escape -e 'print uri_escape($ARGV[0]);' "$albumpath")"

# Skip if a cover.jpg exists in the directory
if [ -f "$albumpath/cover.jpg" ]
then
echo "$albumpath/cover.jpg already exists"
exit
fi

# Tell the user what is going on
echo ""
echo "Searching for: [$1]"

# scraping AlbumArt.org
url="http://www.albumart.org/index.php?skey=$encoded&itempage=1&newsearch=1&searchindex=Music"
echo "Searching ... [$url]"

# Grab the first Amazon image without an underscore (usually the largest version)
coverurl=`wget -qO - "$url" | grep -m 1 -o 'http://ecx.images-amazon.com/images/I/*/[%0-9a-zA-Z.,-]*.jpg'`

echo "Cover URL: [$coverurl]"

# Save the imager
wget "$coverurl" -O "$albumpath/cover.jpg"

Notes

I originally suggested this as an enhancement for the popular ABCDE Linux ripper. It's based off this older, now obsolete, script.

AlbumArt.org uses images from Amazon.com - why not just use the Amazon API?

The Amazon API is great but it requires that you get an account with Amazon and include an API key with every request. That means you can't just dump the script on a box and start downloading - you'd need to configure it first?

Why the change from XPATH?

I love XPATH and use it regularly. What I found when deploying this script to a new Ubuntu install was that xmllint wasn't installed by default. On the other hand, grep is installed on every machine. Seeing as how the Amazon images are a fixed pattern, a regular expression works just fine.

What if there are multiple results from a search?

This will automatically download the first one. As this is a command line tool, there's no practical way to display the various images. I did look at ASCII art conversion, but that's problematic. Some albums work well - e.g. Little Mix's DNA

.=====+++O887?+++.+===~~INMNMMMMMN?~==~.
.=7I+=~~NMM8ND$=+.====~OMMMMMMMMMMMD~=~.
.~=+=ONMD8ZNNN88I.~~~~:MMMM=?MMMMMMMN=~.
.~==ZNN+:,,,N8ODO.:~~~=NMMZZ:$MMMMMMN=~.
.===NZNII..+O88ONI.=~:=,DM+$::7NMMMMN:~.
.+=7DD8,::~,,DD88O.OOIZOMMM~=7.+MMND+~,:
.=+8DND+,:~,=DD8DD~,8DZ+:NMZ?~?8MMMNNMO~
.==DNNNNI=++NDDNNDZ$MNOOOON7++IMMM8?==~M
.+?NMNNN78N8MNNNNNOI?+???I?I+=+IM8$+++I:
.=NMMMNDN==MNNMNMN$=+?8$I+?+==+???III==+
.8NMMMNN?=+M$IIII:I,77II7D?+~~=??:ONMNZ?
.MNMNMMZ7~+MO7~II:7=7???77$,......,:??=M
.MMMM8=.......,~=~=.I:I,7,I=+==+=====+=.
...,~=?8D878DO+~~~=?~=7++NNM8NNMD===+==.
.==~=~DD8DNNON8+7I=+?=78ND~:::~NNNI===~.
.==~~~NI:,,:I88+:~~~.~=8M?:,?,=+NOM7+=~.
.=~=~~8~,...,?8~~~~~.+=MDND:,I::8MMZ=+=.
.==~~~ZDD+.+7+$~~~~~,,~MM7:~=.,=DNNMI+=.
.~~~~:?:,~,,,7=:~::~~.+NMMI7:~:~NDNNN+=.
.~~==~:I,,,,~D::~:~~=.DMDNNI::O+NN8DN+=.
.====~=:,,,~N~:::~~==.N8NDDN+=~+DOODN$=.
.+====~D+,M~~,.,:====.ND8NNN+::=DD8OND=.
.??,++++I~,?+:,,:::~+87D8DNM:::~MNODND=.
.?=,.~=~,+,$+?+??++~IMNNNMNN$,,,MNNMNNOI
..$+~~=.:~,Z====++++7MMNMNM,~,,,MMNNNMM.

Whereas Sgt Pepper is hard to make out.

:::::::::::::::::::::::~~~~~~::::::,,,,,
:,:::::,~::==:::O7~~:~8~=~~=Z+~:I:::,,,.
~+.:+8~D7:::~+8??:==I:O?~$:~$ZI=7IO:,,~:
:I$=,N$=:8O7?7+I?=.8?78OI:~+I~7O$Z$O+?$:
,O$.~D8$~~Z+ZD7$$DOO=8ZZ?OI8?:D7=887Z8Z?
~87+Z$+:~8+,:OD8O~=NDD7OO$+Z8D:ZDDDD88Z?
$O88Z8?DDD?ZZN+ODI,D?88D887$7=D8D+D8O8Z=
,7ID=88II7IOZ7ZZZ7$:===$77~I:~88ZIIO88$?
~..DDDDDDDDOIN=7$7~?I?=OIZ$I:$Z7+?Z+88ZI
Z+,8DDDDDDD+D7O?8ZZ777?8Z8I==OD~8~D$7OZ?
,,D8DDDDNDDDO8Z=$Z8+IZDOZ+ZI+D?88+78O8~=
I?ZDDDDNDN8N=+O+7II$7?ZO+Z$7I8DDOZ?DDII=
=~DDDONNNNNDZZOZZI+?Z+87OOZOZ$8D888D8+II
+=D8NNDDDNDD8N+Z,?OZ~?~88ODOZ+ODDD88DI:?
~~DDDDD8DNDDDO~??:,+7~=DD8DNDDD8788:D,::
~=D8==8DDNNNDO=$=?O$~I:DDO$?DDDDDDD=?OZ+
OD8DZONNDNNDNNN=~7~Z+,DNN8?$DNZZ?DDDO8$7
7O=O8:D78DZNZZN77$7OD8?7$I7=?~D$O8O~~Z7I
=Z$O88D$N$O8N88ZZZ$ON8NDNNZO7ON$7OD8?Z$~
IZD8ZOD8O$OOD$8DDNZNN$DDNN$8ZDNDO$DDZ8OI
7ZDO8ODDO8DOD8N88$ZNDDNDNN8DD8N8$OOZ8O$?
IZDNNNNN8NNNNNNNMZ7N=+Z7O=O7ZI:~DD8DD8OI
Z8DND+$Z+OD+NNNDN8DNDNNNN++I+$=ZDDDDDD8?
ONNDDNO8=D8NNNNDDZD8~NNNONNNNNN8NDDDDDD$
8D88ZNNNOONNNNNDDD$ODNDD8ZMNNDN8D8NNDDDI

There are amazing tools like aview - but again, that's an extra program which the user might not have.

If your album directories are sensibly named, the first hit is usually good enough.

Hang on! There's a mistake!

Quite probably, this is a quick and dirty script. I'm sure there are lots of edge case and (no-doubt) some poor coding practices. If you wish to contribute a patch, please drop it in the comments.

9 thoughts on “Screenscraping Album Artwork From The Linux Command Line”

2013-04-22 15:42

thameera says:

I use beets (http://beets.radbox.org/) which does almost all the management of my 80+GB music library. It tags, fetches art, manages the directory structure and is totally customizable.

Your script works well, there's an <relative> at the last line.

Reply
1. 2013-04-23 09:04
  
  Terence Eden says:
  
  I've started using beets as well. Bit of a pain to configure, but once it works it's incredible. I have found that it does miss some album art though - hence the script.
  
  Thanks for the correction - now amended.
  
  Reply
2014-03-17 08:38

David Griffith says:

Albumart.org doesn't like it if you send a query that contains parenthesis. For example: "Devo-Recombo_DNA_(disc_1_of_2_-_Sequence_A)" won't yield any results. Abcde names directories like this for multi-disc packages. An apparently decent way around this is to lop off the string submitted to albumart at the first paren encountered. Another enhancement is to refrain from writing a cover.jpg file when it didn't find a cover image. Here's a sloppy diff-ish thing of what I'm talking about

albumpath="$1" +title=echo $albumpath | cut -d'(' -f1 ... -encoded="$(perl -MURI::Escape -e 'print uri_escape($ARGV[0]);' "$albumpath")" +encoded="$(perl -MURI::Escape -e 'print uri_escape($ARGV[0]);' "$title")" ... -echo "Searching for: [$1]" +echo "Searching for: [$title]" ... +if [ -z $coverurl ] ; then + echo "Unable to find cover art for $title." + exit +fi

Reply

2014-08-27 20:47

Oli says:

A litte different concept ...

#!/bin/bash

if [ $# != 1 ] then echo "Need Path for Search!" exit 1 fi

cd $1 find . -maxdepth 2 -mindepth 2 -type d | while read -r dir do albumpath="$dir"

# Escape any problematic character
encoded="$(perl -MURI::Escape -e 'print uri_escape($ARGV[0]);' "$albumpath")"

# Skip if a cover.jpg exists in the directory
if [ -f "$albumpath/cover.jpg" ]
then
    echo "$albumpath/cover.jpg already exists"
    continue
fi

# Tell the user what is going on
echo ""
echo "Searching for: [$albumpath]"

# scraping AlbumArt.org
url="http://www.albumart.org/index.php?searchkey=$encoded&amp;itempage=1&amp;newsearch=1&amp;searchindex=Music"
echo "Searching ... [$url]"

# Grab the first Amazon image without an underscore (usually the largest version)
coverurl=`wget -qO - "$url" | grep -m 1 -o 'http://ecx.images-amazon.com/images/I/*/[%0-9a-zA-Z.,-]*.jpg'`

if [ "x" == "x$coverurl" ]
then
    albumpath=`dirname "$albumpath"`
    echo "Neuer Versuch mit '$albumpath'"
    encoded="$(perl -MURI::Escape -e 'print uri_escape($ARGV[0]);' "$albumpath")"
    url="http://www.albumart.org/index.php?searchkey=$encoded&amp;itempage=1&amp;newsearch=1&amp;searchindex=Music"
    coverurl=`wget -qO - "$url" | grep -m 1 -o 'http://ecx.images-amazon.com/images/I/*/[%0-9a-zA-Z.,-]*.jpg'`
fi

if [ "x" != "x$coverurl" ]
then
    echo "Cover URL: [$coverurl]"
    # Save the imager
    wget "$coverurl" -O "$dir/cover.jpg" 2$&gt; /dev/null
fi

done

2014-12-05 17:24

Will says:

I had to use the following URL format for it to work today: url="http://www.albumart.org/index.php?searchk=abba+gold&itempage=1&newsearch=1&searchindex=Music"

Reply
2014-12-06 10:52

Will says:

And to get the above format with +'s in it, I found it useful to do this (since all my directory names use _ not spaces)

GUTS OF MY SCRIPT

albumpath="$1"

Split albumpath into artist and album

artist_input="${albumpath%/}" album_input="${albumpath##/}"

I need to replace the underscores with a '+' character

artist=${artist_input//[]/+} album=${album_input//[]/+}

echo "artist: $artist" echo "album: $album"

search_terms="$artist+$album"

Scrape AlbumArt.org

url="http://www.albumart.org/index.php?searchk=$search_terms&itempage=1&newsearch=1&searchindex=Music"

echo "Searching ... [$url]" coverurl=wget -qO - "$url" | grep -m 1 -o 'http://ecx.images-amazon.com/images/I/*/[%0-9a-zA-Z.,-]*.jpg'

echo "Cover URL: [$coverurl]"

Save the image jpg file

wget "$coverurl" -O "$DEST_DIR/cover.jpg"

RUNNING IT

And running it like this:

GetCoverArt.sh james_morrison/undiscovered

gives this output: artist: james+morrison album: undiscovered

Searching for album art for: [james_morrison/undiscovered] Searching for: [james+morrison+undiscovered] Searching ... [http://www.albumart.org/index.php?searchk=james+morrison+undiscovered&itempage=1&newsearch=1&searchindex=Music%5D Cover URL: [http://ecx.images-amazon.com/images/I/51SAKEc0HEL.jpg%5D --2014-12-06 09:48:32-- http://ecx.images-amazon.com/images/I/51SAKEc0HEL.jpg Resolving ecx.images-amazon.com (ecx.images-amazon.com)... 54.230.198.142, 54.230.199.201, 54.230.199.90, ... Connecting to ecx.images-amazon.com (ecx.images-amazon.com)|54.230.198.142|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 35448 (35K) [image/jpeg] Saving to: ‘/home/music/flac/james_morrison/undiscovered/cover.jpg’

100%[======================================>] 35,448 --.-K/s in 0.02s

2014-12-06 09:48:32 (1.85 MB/s) - ‘/home/music/flac/james_morrison/undiscovered/cover.jpg’ saved [35448/35448]

DO ALL MUSIC

And finally, to get cover art for ALL my music in one fell swoop, I ran this:

for i in *; do for j in $i/*; do GetCoverArt.sh $j; done; done

Reply
2015-05-10 04:36

Andrew Strong says:

You may be interested to see that abcde now has the capability to download album art, I apologise for not using the approach that you suggested! The eventual successful patches came from the same thread on GoogleCode where your patch was suggested...

Currently available only in the git version but it will go mainstream when 2.6.1 is released. The abcde FAQ in git has some detailed information on how to get it all working although sane defaults should guarantee a good result anyway. I am planning a web page with more detailed information, this will come out in a week or so...

Reply
1. 2015-05-10 11:26
  
  Terence Eden says:
  
  Brilliant news! Thanks Andrew 🙂
  
  Reply
2015-05-22 02:33

Andrew says:

OK the preliminary web page is done:

abcde: Downloading Album Art... http://www.andrews-corner.org/getalbumart.html

Still a little fine tuning to do but it should definitely get the word out that abcde is ready for album art 🙂

Reply

Notes

AlbumArt.org uses images from Amazon.com - why not just use the Amazon API?

Why the change from XPATH?

What if there are multiple results from a search?

Hang on! There's a mistake!

Share this post on…

9 thoughts on “Screenscraping Album Artwork From The Linux Command Line”

thameera says:

Terence Eden says:

David Griffith says:

Oli says:

Will says:

Will says:

GUTS OF MY SCRIPT

Split albumpath into artist and album

I need to replace the underscores with a '+' character

Scrape AlbumArt.org

Save the image jpg file

RUNNING IT

DO ALL MUSIC

Andrew Strong says:

Terence Eden says:

Andrew says:

What are your reckons? Cancel reply