Screenscraping Album Artwork From The Linux Command Line
Like many people, I've collected a fair number of CDs over the years. As hard-drives and MicroSD cards have got larger and cheaper, I've gradually been ripping them to FLAC. Most CD rippers automatically tag the music files with the correct metadata and, nowadays, they will also download and embed album artwork as well.
(As an aside, it always boggled my mind that CDs don't come with metadata burned onto the disc. Even a single spare megabyte would be enough to hold detailed track listing, artwork, etc.)
Back when I started, there was no way to get album artwork. Most media players will recognise that if a .jpg is in a folder with music, then it should be treated as the album artwork. This file is usually called "cover.jpg" or "albumart.jpg" - but that's only convention; any name will do.
So, rather than re-rip all by CDs, I wrote a quick bash script to scrape the images from albumart.org. First the script and then some notes about the choices I made when writing it.
#!/bin/bash -e
# get_coverart.sh
#
# This simple script will fetch the cover art for the album information provided on the command line.
# It will then download that cover image, and place it into the child directory.
#
# ./get_coverart.sh
#
# get_coverart Beatles/Sgt Pepper
#
# get_coverart Beatles/Sgt_Pepper
#
# get_coverart "Beatles - Sgt_Pepper"
#
# To auto-populate all directories in the current directory, run the following command
#
# find . -type d -exec ./get_coverart "{}" ;
albumpath="$1"
# Escape any problematic character
encoded="$(perl -MURI::Escape -e 'print uri_escape($ARGV[0]);' "$albumpath")"
# Skip if a cover.jpg exists in the directory
if [ -f "$albumpath/cover.jpg" ]
then
echo "$albumpath/cover.jpg already exists"
exit
fi
# Tell the user what is going on
echo ""
echo "Searching for: [$1]"
# scraping AlbumArt.org
url="http://www.albumart.org/index.php?skey=$encoded&itempage=1&newsearch=1&searchindex=Music"
echo "Searching ... [$url]"
# Grab the first Amazon image without an underscore (usually the largest version)
coverurl=`wget -qO - "$url" | grep -m 1 -o 'http://ecx.images-amazon.com/images/I/*/[%0-9a-zA-Z.,-]*.jpg'`
echo "Cover URL: [$coverurl]"
# Save the imager
wget "$coverurl" -O "$albumpath/cover.jpg"
Notes
I originally suggested this as an enhancement for the popular ABCDE Linux ripper. It's based off this older, now obsolete, script.
AlbumArt.org uses images from Amazon.com - why not just use the Amazon API?
The Amazon API is great but it requires that you get an account with Amazon and include an API key with every request. That means you can't just dump the script on a box and start downloading - you'd need to configure it first?
Why the change from XPATH?
I love XPATH and use it regularly. What I found when deploying this script to a new Ubuntu install was that xmllint wasn't installed by default. On the other hand, grep is installed on every machine. Seeing as how the Amazon images are a fixed pattern, a regular expression works just fine.
What if there are multiple results from a search?
This will automatically download the first one. As this is a command line tool, there's no practical way to display the various images. I did look at ASCII art conversion, but that's problematic. Some albums work well - e.g. Little Mix's DNA
.=====+++O887?+++.+===~~INMNMMMMMN?~==~. .=7I+=~~NMM8ND$=+.====~OMMMMMMMMMMMD~=~. .~=+=ONMD8ZNNN88I.~~~~:MMMM=?MMMMMMMN=~. .~==ZNN+:,,,N8ODO.:~~~=NMMZZ:$MMMMMMN=~. .===NZNII..+O88ONI.=~:=,DM+$::7NMMMMN:~. .+=7DD8,::~,,DD88O.OOIZOMMM~=7.+MMND+~,: .=+8DND+,:~,=DD8DD~,8DZ+:NMZ?~?8MMMNNMO~ .==DNNNNI=++NDDNNDZ$MNOOOON7++IMMM8?==~M .+?NMNNN78N8MNNNNNOI?+???I?I+=+IM8$+++I: .=NMMMNDN==MNNMNMN$=+?8$I+?+==+???III==+ .8NMMMNN?=+M$IIII:I,77II7D?+~~=??:ONMNZ? .MNMNMMZ7~+MO7~II:7=7???77$,......,:??=M .MMMM8=.......,~=~=.I:I,7,I=+==+=====+=. ...,~=?8D878DO+~~~=?~=7++NNM8NNMD===+==. .==~=~DD8DNNON8+7I=+?=78ND~:::~NNNI===~. .==~~~NI:,,:I88+:~~~.~=8M?:,?,=+NOM7+=~. .=~=~~8~,...,?8~~~~~.+=MDND:,I::8MMZ=+=. .==~~~ZDD+.+7+$~~~~~,,~MM7:~=.,=DNNMI+=. .~~~~:?:,~,,,7=:~::~~.+NMMI7:~:~NDNNN+=. .~~==~:I,,,,~D::~:~~=.DMDNNI::O+NN8DN+=. .====~=:,,,~N~:::~~==.N8NDDN+=~+DOODN$=. .+====~D+,M~~,.,:====.ND8NNN+::=DD8OND=. .??,++++I~,?+:,,:::~+87D8DNM:::~MNODND=. .?=,.~=~,+,$+?+??++~IMNNNMNN$,,,MNNMNNOI ..$+~~=.:~,Z====++++7MMNMNM,~,,,MMNNNMM.
Whereas Sgt Pepper is hard to make out.
:::::::::::::::::::::::~~~~~~::::::,,,,, :,:::::,~::==:::O7~~:~8~=~~=Z+~:I:::,,,. ~+.:+8~D7:::~+8??:==I:O?~$:~$ZI=7IO:,,~: :I$=,N$=:8O7?7+I?=.8?78OI:~+I~7O$Z$O+?$: ,O$.~D8$~~Z+ZD7$$DOO=8ZZ?OI8?:D7=887Z8Z? ~87+Z$+:~8+,:OD8O~=NDD7OO$+Z8D:ZDDDD88Z? $O88Z8?DDD?ZZN+ODI,D?88D887$7=D8D+D8O8Z= ,7ID=88II7IOZ7ZZZ7$:===$77~I:~88ZIIO88$? ~..DDDDDDDDOIN=7$7~?I?=OIZ$I:$Z7+?Z+88ZI Z+,8DDDDDDD+D7O?8ZZ777?8Z8I==OD~8~D$7OZ? ,,D8DDDDNDDDO8Z=$Z8+IZDOZ+ZI+D?88+78O8~= I?ZDDDDNDN8N=+O+7II$7?ZO+Z$7I8DDOZ?DDII= =~DDDONNNNNDZZOZZI+?Z+87OOZOZ$8D888D8+II +=D8NNDDDNDD8N+Z,?OZ~?~88ODOZ+ODDD88DI:? ~~DDDDD8DNDDDO~??:,+7~=DD8DNDDD8788:D,:: ~=D8==8DDNNNDO=$=?O$~I:DDO$?DDDDDDD=?OZ+ OD8DZONNDNNDNNN=~7~Z+,DNN8?$DNZZ?DDDO8$7 7O=O8:D78DZNZZN77$7OD8?7$I7=?~D$O8O~~Z7I =Z$O88D$N$O8N88ZZZ$ON8NDNNZO7ON$7OD8?Z$~ IZD8ZOD8O$OOD$8DDNZNN$DDNN$8ZDNDO$DDZ8OI 7ZDO8ODDO8DOD8N88$ZNDDNDNN8DD8N8$OOZ8O$? IZDNNNNN8NNNNNNNMZ7N=+Z7O=O7ZI:~DD8DD8OI Z8DND+$Z+OD+NNNDN8DNDNNNN++I+$=ZDDDDDD8? ONNDDNO8=D8NNNNDDZD8~NNNONNNNNN8NDDDDDD$ 8D88ZNNNOONNNNNDDD$ODNDD8ZMNNDN8D8NNDDDI
There are amazing tools like aview - but again, that's an extra program which the user might not have.
If your album directories are sensibly named, the first hit is usually good enough.
Hang on! There's a mistake!
Quite probably, this is a quick and dirty script. I'm sure there are lots of edge case and (no-doubt) some poor coding practices. If you wish to contribute a patch, please drop it in the comments.
thameera says:
I use beets (http://beets.radbox.org/) which does almost all the management of my 80+GB music library. It tags, fetches art, manages the directory structure and is totally customizable.
Your script works well, there's an <relative> at the last line.
Terence Eden says:
I've started using beets as well. Bit of a pain to configure, but once it works it's incredible. I have found that it does miss some album art though - hence the script.
Thanks for the correction - now amended.
David Griffith says:
Albumart.org doesn't like it if you send a query that contains parenthesis. For example: "Devo-Recombo_DNA_(disc_1_of_2_-_Sequence_A)" won't yield any results. Abcde names directories like this for multi-disc packages. An apparently decent way around this is to lop off the string submitted to albumart at the first paren encountered. Another enhancement is to refrain from writing a cover.jpg file when it didn't find a cover image. Here's a sloppy diff-ish thing of what I'm talking about
albumpath="$1" +title=
echo $albumpath | cut -d'(' -f1
... -encoded="$(perl -MURI::Escape -e 'print uri_escape($ARGV[0]);' "$albumpath")" +encoded="$(perl -MURI::Escape -e 'print uri_escape($ARGV[0]);' "$title")" ... -echo "Searching for: [$1]" +echo "Searching for: [$title]" ... +if [ -z $coverurl ] ; then + echo "Unable to find cover art for $title." + exit +fiOli says:
A litte different concept ...
#!/bin/bash
if [ $# != 1 ] then echo "Need Path for Search!" exit 1 fi
cd $1 find . -maxdepth 2 -mindepth 2 -type d | while read -r dir do albumpath="$dir"
done
Will says:
I had to use the following URL format for it to work today: url="http://www.albumart.org/index.php?searchk=abba+gold&itempage=1&newsearch=1&searchindex=Music"
Will says:
And to get the above format with +'s in it, I found it useful to do this (since all my directory names use _ not spaces)
GUTS OF MY SCRIPT
albumpath="$1"
Split albumpath into artist and album
artist_input="${albumpath%/}" album_input="${albumpath##/}"
I need to replace the underscores with a '+' character
artist=${artist_input//[]/+} album=${album_input//[]/+}
echo "artist: $artist" echo "album: $album"
search_terms="$artist+$album"
Scrape AlbumArt.org
url="http://www.albumart.org/index.php?searchk=$search_terms&itempage=1&newsearch=1&searchindex=Music"
echo "Searching ... [$url]" coverurl=
wget -qO - "$url" | grep -m 1 -o 'http://ecx.images-amazon.com/images/I/*/[%0-9a-zA-Z.,-]*.jpg'
echo "Cover URL: [$coverurl]"
Save the image jpg file
wget "$coverurl" -O "$DEST_DIR/cover.jpg"
RUNNING IT
And running it like this:
GetCoverArt.sh james_morrison/undiscovered
gives this output: artist: james+morrison album: undiscovered
Searching for album art for: [james_morrison/undiscovered] Searching for: [james+morrison+undiscovered] Searching ... [http://www.albumart.org/index.php?searchk=james+morrison+undiscovered&itempage=1&newsearch=1&searchindex=Music%5D Cover URL: [http://ecx.images-amazon.com/images/I/51SAKEc0HEL.jpg%5D --2014-12-06 09:48:32-- http://ecx.images-amazon.com/images/I/51SAKEc0HEL.jpg Resolving ecx.images-amazon.com (ecx.images-amazon.com)... 54.230.198.142, 54.230.199.201, 54.230.199.90, ... Connecting to ecx.images-amazon.com (ecx.images-amazon.com)|54.230.198.142|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 35448 (35K) [image/jpeg] Saving to: ‘/home/music/flac/james_morrison/undiscovered/cover.jpg’
100%[======================================>] 35,448 --.-K/s in 0.02s
2014-12-06 09:48:32 (1.85 MB/s) - ‘/home/music/flac/james_morrison/undiscovered/cover.jpg’ saved [35448/35448]
DO ALL MUSIC
And finally, to get cover art for ALL my music in one fell swoop, I ran this:
for i in *; do for j in $i/*; do GetCoverArt.sh $j; done; done
Andrew Strong says:
You may be interested to see that abcde now has the capability to download album art, I apologise for not using the approach that you suggested! The eventual successful patches came from the same thread on GoogleCode where your patch was suggested...
Currently available only in the git version but it will go mainstream when 2.6.1 is released. The abcde FAQ in git has some detailed information on how to get it all working although sane defaults should guarantee a good result anyway. I am planning a web page with more detailed information, this will come out in a week or so...
Terence Eden says:
Brilliant news! Thanks Andrew 🙂
Andrew says:
OK the preliminary web page is done:
abcde: Downloading Album Art... http://www.andrews-corner.org/getalbumart.html
Still a little fine tuning to do but it should definitely get the word out that abcde is ready for album art 🙂