An API for Amazon Wishlists
In the glorious past, Amazon had an API for interacting with its "Wishlist" service. Not any more though.
So, here's the inspiring story of how a rag-tag band of adventurers brought it back from the dead!
Several years ago, Justin Scarpetti created a tool to extract data from an Amazon wishlist - the imaginatively named Amazon Wish Lister. It used that most vulgar of programming practices - Screen Scraping!
Yup, gobble up the HTML and attempt to parse it. Needs must in a dire situation. It worked, but it felt dirty!
Actually, scratch that. It mostly worked. It wasn't retrieving prices due to a change in the page structure and it was hard coded to the US site.
Time to fix things!
The Aim
I want to add something to my Amazon Wishlist and have it automatically appear on my Secret Santa Tumblr.
The Changes
- Add support for international Amazon stores - not just .com. This was pretty easy, all Amazon stores use the same URL structure for their wishlists and (mostly) the same HTML for their pages. Some Amazon stores don't have wishlists at all.
- Get the prices. Again, a simple fix - although it is at the mercy of Amazon changing their page structure. Such are the perils of screen scraping.
- Better XML Output. Some weirdos prefer XML to JSON. Fools! This was intended to be a precursor to RSS support before I realised there was an easier way to do it.
- Get the AISN. Due to Amazon's consistent URl structure, it was easy to grab the item's ID. Useful for the next few parts.
- Bigger images. By default, Amazon gives fairly small images as standard. A little URl manipulation and we can get much larger picture served over HTTPS. Weirdly, all images are served from the .com site.
- Amazon Affiliate Support. One of my major reasons for doing this project is that I earn a small amount of revenue from Amazon whenever someone clicks my links & then goes on to buy something.
- RSS Feed Generation. With all that groundwork laid, I was able to quickly generate a (mostly) valid RSS feed. Every time something is added to the wishlist, the RSS feed updates. Perfect!
Rather than monkeying around with the Tumblr API, I just used IFTTT. It checks the RSS and should post any new items onto the blog.
The Result
IT WORKS! Visit my FiverFun Tumblr to see it in action. Each time I find a cool stocking-filler under five quid, I add it to a specific wishlist. IFTT picks it up and sends it to Tumblr.
You can download the updated code from GitHub
Bugs
It wouldn't be an Open Source project without a few bugs!
If you can help with any of these, please chip in over at GitHub.
- Japanese text encoding is hard!. I'm not sure if it's a Shift_JS vs UTF-8 issue, or a multibyte thing - but something is screwing up the text formatting.
- India's prices are nested within another set of tags. Need a way to always get the innermost content.
- Both of the above may be related to the screenscraping library - the ancient phpQuery. It may be worth either fixing it or moving onto a new library.
- International dates are problematic. I need to find a way to easily convert them into RFC-822 format.
- Transient text encoding issues - mostly, I think, due to vendors copy & pasting from MS Word.
If you've enjoyed this post, you can buy something for me from my Amazon Wishlist :-)
Lester G says:
"Need a way to always get the innermost content." "Both of the above may be related to the screenscraping library - the ancient phpQuery. It may be worth either fixing it or moving onto a new library." "International dates are problematic. I need to find a way to easily convert them into RFC-822 format." "Transient text encoding issues - mostly, I think, due to vendors copy & pasting from MS Word."
You, sir, need the Ultimate Web Scraper Toolkit. The included TagFilter class is for dealing with ugly HTML that Simple HTML DOM can't handle. Here's a link:
https://github.com/cubiclesoft/ultimate-web-scraper
For clean UTF-8 handling, you'll want the UTF-8 class from here:
https://github.com/cubiclesoft/php-libs/