"Need a way to always get the innermost content." "Both of the above may be related to the screenscraping library - the ancient phpQuery. It may be worth either fixing it or moving onto a new library." "International dates are problematic. I need to find a way to easily convert them into RFC-822 format." "Transient text encoding issues - mostly, I think, due to vendors copy & pasting from MS Word." You, sir, need the Ultimate Web Scraper Toolkit. The included TagFilter class is for dealing with ugly HTML that Simple HTML DOM can't handle. Here's a link: https://github.com/cubiclesoft/ultimate-web-scraper For clean UTF-8 handling, you'll want the UTF-8 class from here: https://github.com/cubiclesoft/php-libs/