Stop using preg_* on HTML and start using \Dom\HTMLDocument instead


The PHP logo.

It is a truth universally acknowledged that a programmer in possession of some HTML will eventually try to parse it with a regular expression. This makes many people very angry and is widely regarded as a bad move. In the bad old days, it was somewhat understandable for a PHP coder to run a quick-and-dirty preg_replace() on a scrap of code. They probably could control the input and there wasn't …

Continue reading →

An opinionated HTML Serializer for PHP 8.4


The HTML5 Logo.

A few days ago, I wrote a shitty pretty-printer for PHP 8.4's new Dom\HTMLDocument class. I've since re-written it to be faster and more stylistically correct. It turns this: <html lang="en-GB"><head><title id="something">Test</title></head><body><h1 class="top upper">Testing</h1><main><p>Some <em>HTML</em> and an <img src="example.png" alt="Alternate Text"></p>Text not in an…

Continue reading →

How to check something isn't an email address?


The HTML5 Logo.

In web-development circles, it is a well-known fact that trying to validate an email using a regular expression is… complex. The full set of modern email standards allows for such wonderful addresses as: chief.o'brien+ds9@spásárthach.भारत So determining whether or not your user has entered a valid email address becomes an ever-increasing challenge. But what if you have the opposite issue? Suppo…

Continue reading →

I think I kind of hate lazy loading


An icon representing a broken image.

Yesterday I was on a train. I clicked on a link and my browser loaded a long article for me to read. Halfway through reading it, the train went into a tunnel and I lost signal. That meant I couldn't see the images on the other half of the page for the rest of the journey. I had a sea of broken images. Even though the page had fully loaded, the images were set to "lazy" loading. When my…

Continue reading →

This link is only available by keyboard navigation


The HTML5 Logo.

There's a link, right here ➡️⬅️ but, if you're on a touchscreen, you can't tap on it. Using a mouse? Nope, that won't work either. The only way to navigate to it is via keyboard navigation. Hit your Tab ⭾ button! There's a little bit of me wants to build an entire website which can only be navigated by keyboard. What would the world look like if Engelbart never invented the mouse? Or if Johnson…

Continue reading →

Can this device make a phone call?


The Doctor holding Ryan's phone.

I want to detect if a web browser is running on a device which is capable of placing a telephone call. Is that possible? I'm going to go with a cautious "no - not quite". Although there are several proxies which get you part of the way there. Here's a link to a telephone number "call me!" - the HTML is: &lt;a href=&quot;tel:+441234815259&quot;&gt;call me!&lt;/a&gt; You can use tel: in just…

Continue reading →

Does AI mean we don't need the Semantic Web?


User: Read the following sentence: "Our opening hours are: Weekdays 10 until 7. Weekend 10 until 10 (Early closing 9 o'clock Sunday)." ChatGPT: The given sentence states the opening hours of a certain establishment. Here is the breakdown of the opening hours mentioned: Weekdays: The establishment is open from 10 in the morning until 7 in the evening. Weekend: On weekends, the establishment is open from 10 in the morning until 10 at night. Early closing on Sunday: On Sundays, the establishment closes earlier than usual, at 9 o'clock in the evening. Please note that this information is based solely on the provided sentence, and the specific establishment or its type is not mentioned. User: Is the establishment open at 2130 on Sunday? ChatGPT: Based on the information provided in the sentence, the establishment has early closing on Sunday, with the closing time set at 9 o'clock in the evening. Therefore, the establishment would not be open at 21:30 (or 9:30 PM) on Sunday.

If you hang around with computerists long enough, they start talking about the Semantic Web. If you can represent human knowledge in a way that's easy for computers to understand it will be transformative for information processing. But computers, traditionally, haven't been very good at parsing ambiguous human text. Suppose you saw this text written for a human: Our opening hours are:…

Continue reading →

The limits of CSS styling select options


Checkboxes in a line.

Sometimes you learn the most from failures! I wanted a <select multiple> element where the <options> were laid out in a grid. I nearly got there. It's possible to have the <option>s in a horizontal row - but only on Chrome and Firefox. Here's a quick fiddle showing the results: As you can see, it's possible to do some pretty extravagant styling of the individual <options> you can even…

Continue reading →

Responsive Yearly Calendar with Flexbox


A grid of calendars. One calendar item has moved to the next line.

This blog has a calendar showing my yearly archives. It was in a table layout - which made sense when I first designed it - but had a few spacing niggles and was hard to make responsive. Now, it behaves like this: The code is relatively straightforward. The HTML for the calendar looks like this: <div class="calendars"> <div class="calendar"> <div class="calendar-year">2018</div> …

Continue reading →

How to password protect a static HTML page with no JS


Screenshot of some garbled text on screen.

I recently saw Robin Moisson's method of password protecting a statically served HTML page. It's quite neat! But it does rely on JavaScript. That got me wondering if there was a way to encrypt a static page only using CSS? And... I think I've done it! I'll warn you now, this is a deeply stupid way to solve the problem. Here's a screencast of the demo in action: …

Continue reading →

What's the most malicious thing you can do with an injected HTML heading element?


The HTML5 Logo.

A bit of a thought experiment - similar to my Minimum Viable XSS and SVG injection investigations. I recently found a popular website which echoed back user input. It correctly sanitised < to &lt; to prevent any HTML injection. Except… It let through <h2> elements unaltered! Why? I suspect because the output was: &lt;h2&gt;Your search for ... returned no results&lt;/h2&gt; And, somehow, the …

Continue reading →

Simultaneous Translation in HTML


The HTML5 Logo.

How do you show two languages simultaneously in HTML? If you want to show text in a foreign language, the markup is simple: &lt;html lang=&quot;en-GB&quot;&gt; ... As Caesar said: &lt;i lang=&quot;la&quot;&gt;veni vidi vici&lt;/i&gt; That says the page is in British English (en-GB) but the specific phrase is in Latin (la). But how can you offer an in-text translation of that phrase into the…

Continue reading →