“Need a way to always get the innermost content.”
“Both of the above may be related to the screenscraping library – the ancient phpQuery. It may be worth either fixing it or moving onto a new library.”
“International dates are problematic. I need to find a way to easily convert them into RFC-822 format.”
“Transient text encoding issues – mostly, I think, due to vendors copy & pasting from MS Word.”

You, sir, need the Ultimate Web Scraper Toolkit. The included TagFilter class is for dealing with ugly HTML that Simple HTML DOM can’t handle. Here’s a link:

https://github.com/cubiclesoft/ultimate-web-scraper

For clean UTF-8 handling, you’ll want the UTF-8 class from here:

https://github.com/cubiclesoft/php-libs/