“Need a way to always get the innermost content.”
“Both of the above may be related to the screenscraping library – the ancient phpQuery. It may be worth either fixing it or moving onto a new library.”
“International dates are problematic. I need to find a way to easily convert them into RFC-822 format.”
“Transient text encoding issues – mostly, I think, due to vendors copy & pasting from MS Word.”

You, sir, need the Ultimate Web Scraper Toolkit. The included TagFilter class is for dealing with ugly HTML that Simple HTML DOM can’t handle. Here’s a link:


For clean UTF-8 handling, you’ll want the UTF-8 class from here: