Working around an old and buggy HTML Tidy in PHP


Dan Q very kindly shared his script to make WordPress do good HTML. But I couldn't get it working.

Looking at the HTML it was spitting out, the meta generator said it was HTML Tidy version 5.6.0. That's quite old! I confirmed this by running:

PHP PHPecho tidy_get_release();

Which spat out 2017/11/25. Aha!

There are a few bugs in this version of HTML Tidy, some of which are fixed in later versions.

Here's how to fix them.

Auto Indent doesn't work. This is fixed by manually specifying "indent" => 2

Indent with tabs doesn't work. So I told it to indent with 8 spaces using "indent-spaces" => 8,

Then I used a regex (naughty!) to replace 8 spaces with a tab.

PHP PHP$tidy = preg_replace( '/        /', "\t", $tidy );

Older versions of Tidy don't support newer HTML elements like <search>. This can be fixed with "new-blocklevel-tags" => "search",

The <summary> element isn't closed properly. This was an annoying one. I had to manually rewrite my HTML to remove an <h2> element from inside the summary.

Although not really a bug, I like to have HTML comments on a newline.

PHP PHP$tidy = preg_replace( '/><!--/', ">\n<!--", $tidy );

Sadly, the last release of HTML Tidy was back in 2021. While some of the above bugs are fixed, there are more piling up.

So I'll continue with these workarounds for now. Hit "view source" and tell me what you think!


Share this post on…

  • Mastodon
  • Facebook
  • LinkedIn
  • BlueSky
  • Threads
  • Reddit
  • HackerNews
  • Lobsters
  • WhatsApp
  • Telegram

What are your reckons?

All comments are moderated and may not be published immediately. Your email address will not be published.

Allowed HTML: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <p> <pre> <br> <img src="" alt="" title="" srcset="">