"Need a way to always get the innermost content."
"Both of the above may be related to the screenscraping library - the ancient phpQuery. It may be worth either fixing it or moving onto a new library."
"International dates are problematic. I need to find a way to easily convert them into RFC-822 format."
"Transient text encoding issues - mostly, I think, due to vendors copy & pasting from MS Word."
You, sir, need the Ultimate Web Scraper Toolkit. The included TagFilter class is for dealing with ugly HTML that Simple HTML DOM can't handle. Here's a link:
For clean UTF-8 handling, you'll want the UTF-8 class from here: