Introducing Pretty Print HTML for PHP 8.4


I'm delight to announce the first release of my opinionated HTML Pretty Printer for new versions of PHP.

There are several prettifiers on Packagist, but I think mine is the only one which works with the new Dom\HTMLDocument class.

What

This takes hard-to-read HTML like:

<!doctype html><html><head><meta charset="UTF-8"></head><body><div id="main" class="news main"><h1 id="top">Title</h1><p>How <em>exciting</em>!</p></div>

And pretty-prints it with some opinionated formatting:

 HTML<!doctype html>
<html>
    <head>
        <meta charset=UTF-8>
    </head>
    <body>
        <div class="main news" id=main>
            <h1 id=top>Title</h1>
            <p>How <em>exciting</em>!</p>
        </div>
    </body>
</html>

All elements are indented where possible. Attributes are sorted alphabetically. Attribute variables are unquoted if possible. CSS and JS are unaltered. These options are configurable.

To get an idea of what it outputs, take a look at the source code of this page!

How

This is designed to be simple to use, but with enough options to be useful to as many people as possible.

 PHP//  HTML as a string:
$html = "<div>This is <span> an <em>example</em>";
//  Or as a file:
$html = file_get_contents( "example.html" );

//  Turn the HTML into a Dom\HTMLDocument
$dom = \Dom\HTMLDocument::createFromString( $html, LIBXML_NOERROR, "UTF-8" );

//  Create the pretty printer
$formatter = new Edent\PrettyPrintHtml\PrettyPrintHtml();

//  Output the result
echo $formatter->serializeHtml( $dom );

Limitations

Whitespace is hard. There are many different types. Sometimes it is for display, sometimes it isn't. Adding extra newlines and tabs almost certainly will cause layout changes somewhere on your page.

You can either change your CSS to minimise this, add elements to the preserveElements list to stop them being altered, or re-write your original HTML. The choice is yours.

Why

As was written long ago:

A computer language is not just a way of getting a computer to perform operations but rather … it is a novel formal medium for expressing ideas about methodology. Thus, programs must be written for people to read, and only incidentally for machines to execute.

PHP's new Dom\HTMLDocument class produces syntactically valid HTML code. The code is very easy for a computer to parse. But because there is no indenting, the code is difficult for a human to parse.

Adding newlines and indents before every new element can introduce spacing errors when the HTML is rendered to screen. Some of these can be fixed with extra CSS, some cannot

This pretty-printer attempts to make code readable for humans by striking a balance between legibility when rendered on screen or viewed as source code.

Why is human readability so important?

As Ana Rodrigues said:

Today's heavily optimized websites have largely killed the "view source" learning experience. The code is minified, bundled, and often incomprehensible to beginners trying to understand how things work. […] I want anyone, regardless of skill level, to inspect elements, understand the structure, and learn from readable code.

Using this pretty printer should give you and your users an excellent "view source" experience, without sacrificing the browser's ability to render the code.

Next Steps

I'm sure there are many bugs and oddities. I'd love you to report any problems on GitLab. Feel free to contribute test-cases and code.


Share this post on…

What are your reckons?

All comments are moderated and may not be published immediately. Your email address will not be published.

See allowed HTML elements: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <p> <pre> <br> <img src="" alt="" title="" srcset="">