<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/rss-style.xsl" type="text/xsl"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	    xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	     xmlns:dc="http://purl.org/dc/elements/1.1/"
	   xmlns:atom="http://www.w3.org/2005/Atom"
	     xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	  xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>
<channel>
	<title>php &#8211; Terence Eden’s Blog</title>
	<atom:link href="https://shkspr.mobi/blog/tag/php/feed/" rel="self" type="application/rss+xml" />
	<link>https://shkspr.mobi/blog</link>
	<description>Regular nonsense about tech and its effects 🙃</description>
	<lastBuildDate>Sun, 15 Mar 2026 13:04:02 +0000</lastBuildDate>
	<language>en-GB</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>

<image>
	<url>https://shkspr.mobi/blog/wp-content/uploads/2023/07/cropped-avatar-32x32.jpeg</url>
	<title>php &#8211; Terence Eden’s Blog</title>
	<link>https://shkspr.mobi/blog</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title><![CDATA[Some updates to ActivityBot]]></title>
		<link>https://shkspr.mobi/blog/2026/03/some-updates-to-activitybot/</link>
					<comments>https://shkspr.mobi/blog/2026/03/some-updates-to-activitybot/#comments</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Mon, 16 Mar 2026 12:34:57 +0000</pubDate>
				<category><![CDATA[/etc/]]></category>
		<category><![CDATA[ActivityBot]]></category>
		<category><![CDATA[ActivityPub]]></category>
		<category><![CDATA[mastodon]]></category>
		<category><![CDATA[php]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=68592</guid>

					<description><![CDATA[I couple of years ago, I developed ActivityBot - the simplest way to build Mastodon Bots. It is a single PHP file which can run an entire ActivityPub server and it is less than 80KB.  It works! You can follow @openbenches@bot.openbenches.org to see the latest entries on OpenBenches.org, and @colours@colours.bots.edent.tel for a slice of colour in your day, and @solar@solar.bots.edent.tel to see…]]></description>
										<content:encoded><![CDATA[<p>I couple of years ago, I developed <a href="https://shkspr.mobi/blog/2024/11/introducing-activitybot-the-simplest-way-to-build-mastodon-bots/">ActivityBot - the simplest way to build Mastodon Bots</a>. It is a <em>single</em> PHP file which can run an entire ActivityPub server and it is less than 80KB.</p>

<p>It works! You can follow <code>@openbenches@bot.openbenches.org</code> to see the latest entries on OpenBenches.org, and <code>@colours@colours.bots.edent.tel</code> for a slice of colour in your day, and <code>@solar@solar.bots.edent.tel</code> to see what my solar panels are up to.</p>

<p>This is <em>so</em> easy to use. Copy the PHP file (and a <code>.env</code> and <code>.htaccess</code>) to literally any web host running PHP 8.5 and you have a fully-fledged bot which can post to Mastodon.</p>

<p><a href="https://gitlab.com/edent/activity-bot/">Grab the code and start today</a>!</p>

<h2 id="features"><a href="https://shkspr.mobi/blog/2026/03/some-updates-to-activitybot/#features">Features</a></h2>

<p>Over the years I've added a few more features to it, so I thought I'd run through what they are. Note, this is all hand-written. No sycophantic plagiarism machines were involved in this code or blog post. I just really like emoji, OK⁉️</p>

<h3 id="%f0%9f%94%8d-be-discovered-on-the-fediverse"><a href="https://shkspr.mobi/blog/2026/03/some-updates-to-activitybot/#%f0%9f%94%8d-be-discovered-on-the-fediverse">🔍 Be discovered on the Fediverse</a></h3>

<p>This is the big one, you can find <code>@example@example.viii.fi</code> on your favourite Fediverse client.  This is thanks to its WebFinger support.</p>

<h3 id="%f0%9f%91%89-be-followed-by-other-accounts"><a href="https://shkspr.mobi/blog/2026/03/some-updates-to-activitybot/#%f0%9f%91%89-be-followed-by-other-accounts">👉 Be followed by other accounts</a></h3>

<p>No point being discovered if you can't be followed. This accepts follow requests and sends back a signed accept.</p>

<h3 id="%f0%9f%9a%ab-be-unfollowed-by-accounts"><a href="https://shkspr.mobi/blog/2026/03/some-updates-to-activitybot/#%f0%9f%9a%ab-be-unfollowed-by-accounts">🚫 Be unfollowed by accounts</a></h3>

<p>Sometimes people want to unfollow. Too bad, so sad. Again, this will accept the undo request and delete the unfollowing user's information.</p>

<h3 id="%f0%9f%93%a9-send-messages-to-the-fediverse"><a href="https://shkspr.mobi/blog/2026/03/some-updates-to-activitybot/#%f0%9f%93%a9-send-messages-to-the-fediverse">📩 Send messages to the Fediverse</a></h3>

<p>If a bot can be followed, but never posts, does it make a sound? This sends a post to all of your followers' (shared) inboxes. Includes some HTML formatting.</p>

<h3 id="%f0%9f%92%8c-send-direct-messages-to-users"><a href="https://shkspr.mobi/blog/2026/03/some-updates-to-activitybot/#%f0%9f%92%8c-send-direct-messages-to-users">💌 Send direct messages to users</a></h3>

<p>Not every message is for the wider public. If you want a bot which sends you a private message, this'll set the visibility correctly.</p>

<h3 id="%f0%9f%93%b7-attach-images-alt-text-to-a-message-%f0%9f%86%95%f0%9f%86%95"><a href="https://shkspr.mobi/blog/2026/03/some-updates-to-activitybot/#%f0%9f%93%b7-attach-images-alt-text-to-a-message-%f0%9f%86%95%f0%9f%86%95">📷 Attach images &amp; alt text to a message 🆕🆕</a></h3>

<p>A picture is worth a thousand words. But those pictures are meaningless without alt text. Attach as many images as you like. Note, most Mastodon services only accept a maximum of four.</p>

<h3 id="%f0%9f%8d%bf-video-upload-%f0%9f%86%95%f0%9f%86%95"><a href="https://shkspr.mobi/blog/2026/03/some-updates-to-activitybot/#%f0%9f%8d%bf-video-upload-%f0%9f%86%95%f0%9f%86%95">🍿 Video Upload 🆕🆕</a></h3>

<p>No transcoding or anything fancy. Upload a video and it'll be sent to your followers.</p>

<h3 id="%f0%9f%94%8a-audio-upload-%f0%9f%86%95%f0%9f%86%95"><a href="https://shkspr.mobi/blog/2026/03/some-updates-to-activitybot/#%f0%9f%94%8a-audio-upload-%f0%9f%86%95%f0%9f%86%95">🔊 Audio Upload 🆕🆕</a></h3>

<p>Same as video. Raw audio posted to your followers' feeds.</p>

<h3 id="%f0%9f%95%b8%ef%b8%8f-autolink-urls-hashtags-and-mentions"><a href="https://shkspr.mobi/blog/2026/03/some-updates-to-activitybot/#%f0%9f%95%b8%ef%b8%8f-autolink-urls-hashtags-and-mentions">🕸️ Autolink URls, hashtags, and @ mentions</a></h3>

<p>Including URls, tags, and mentions are <em>mostly</em> autolinked correctly. There's a lot of fuzziness in how it works.</p>

<h3 id="%f0%9f%a7%b5-threads"><a href="https://shkspr.mobi/blog/2026/03/some-updates-to-activitybot/#%f0%9f%a7%b5-threads">🧵 Threads</a></h3>

<p>You can reply to specific messages in order to create a thread.</p>

<h3 id="%f0%9f%91%88-follow-unfollow-block-and-unblock-other-accounts"><a href="https://shkspr.mobi/blog/2026/03/some-updates-to-activitybot/#%f0%9f%91%88-follow-unfollow-block-and-unblock-other-accounts">👈 Follow, Unfollow, Block, and Unblock other accounts</a></h3>

<p>It might be useful for you to remove followers or follow specific accounts.</p>

<h3 id="%f0%9f%97%91%ef%b8%8f-delete-posted-messages-and-their-attachments-%f0%9f%86%95%f0%9f%86%95"><a href="https://shkspr.mobi/blog/2026/03/some-updates-to-activitybot/#%f0%9f%97%91%ef%b8%8f-delete-posted-messages-and-their-attachments-%f0%9f%86%95%f0%9f%86%95">🗑️ Delete posted messages and their attachments 🆕🆕</a></h3>

<p>We all make mistakes. This will delete your post along with any attachments and send that delete message to everyone. Note, because of the federated nature of the Fediverse, you cannot guarantee that a remote server will delete anything.</p>

<h3 id="%e2%9c%8f%ef%b8%8f-edit-posts-%f0%9f%86%95%f0%9f%86%95"><a href="https://shkspr.mobi/blog/2026/03/some-updates-to-activitybot/#%e2%9c%8f%ef%b8%8f-edit-posts-%f0%9f%86%95%f0%9f%86%95">✏️ Edit Posts 🆕🆕</a></h3>

<p>If you don't want to delete and re-post, you can edit your existing posts.</p>

<h3 id="%f0%9f%a6%8b-bridge-to-bluesky-with-your-domain-name-via-bridgy-fed"><a href="https://shkspr.mobi/blog/2026/03/some-updates-to-activitybot/#%f0%9f%a6%8b-bridge-to-bluesky-with-your-domain-name-via-bridgy-fed">🦋 Bridge to BlueSky with your domain name via Bridgy Fed</a></h3>

<p>Not everyone is on the Fediverse. If you want to bridge to BlueSky, you can use the <a href="https://fed.brid.gy/">Bridgy Fed service</a>.</p>

<h3 id="%f0%9f%9a%9a-move-followers-from-an-old-account-and-to-a-new-account-%f0%9f%86%95%f0%9f%86%95"><a href="https://shkspr.mobi/blog/2026/03/some-updates-to-activitybot/#%f0%9f%9a%9a-move-followers-from-an-old-account-and-to-a-new-account-%f0%9f%86%95%f0%9f%86%95">🚚 Move followers from an old account and to a new account 🆕🆕</a></h3>

<p>Perhaps you started as <code>@electric@sex.pants</code> but now you want to become <code>@chaste@nunslife.biz</code> - no worries! You can tell followers you've moved and what your new name is.</p>

<p>Similarly, if ActivityBot is no longer right for you, it's simple to tell your existing follower to move to your new account.</p>

<h3 id="%f0%9f%97%a8%ef%b8%8f-allow-quote-posts-%f0%9f%86%95%f0%9f%86%95"><a href="https://shkspr.mobi/blog/2026/03/some-updates-to-activitybot/#%f0%9f%97%a8%ef%b8%8f-allow-quote-posts-%f0%9f%86%95%f0%9f%86%95">🗨️ Allow quote posts 🆕🆕</a></h3>

<p>Rather than just reposting your message, this sets the quote policy to allow people to share your message and attach some commentary of your own.</p>

<h3 id="%f0%9f%91%80-show-followers"><a href="https://shkspr.mobi/blog/2026/03/some-updates-to-activitybot/#%f0%9f%91%80-show-followers">👀 Show followers</a></h3>

<p>Your follower count isn't just a number, it is a living list of <em>who</em> chooses to follow you.</p>

<h3 id="%e2%9a%a0%ef%b8%8f-content-warnings-%f0%9f%86%95%f0%9f%86%95"><a href="https://shkspr.mobi/blog/2026/03/some-updates-to-activitybot/#%e2%9a%a0%ef%b8%8f-content-warnings-%f0%9f%86%95%f0%9f%86%95">⚠️ Content Warnings 🆕🆕</a></h3>

<p>Perhaps you want to hide a bit of what you're saying. Add a content warning to hide part of your message.</p>

<h3 id="%f0%9f%94%8f-verify-cryptographic-signatures"><a href="https://shkspr.mobi/blog/2026/03/some-updates-to-activitybot/#%f0%9f%94%8f-verify-cryptographic-signatures">🔏 Verify cryptographic signatures</a></h3>

<p><a href="https://shkspr.mobi/blog/2024/03/i-made-a-mistake-in-verifying-http-message-signatures/">HTTP Message Signatures is <em>hard</em></a>. I think I've mostly got it sorted.</p>

<h3 id="%f0%9f%aa%b5-log-sent-messages-and-errors"><a href="https://shkspr.mobi/blog/2026/03/some-updates-to-activitybot/#%f0%9f%aa%b5-log-sent-messages-and-errors">🪵 Log sent messages and errors</a></h3>

<p>This is primarily a learning aide, so have a rummage through the logs and see what's going on.</p>

<h3 id="%f0%9f%9a%ae-clear-logs-when-there-are-too-many"><a href="https://shkspr.mobi/blog/2026/03/some-updates-to-activitybot/#%f0%9f%9a%ae-clear-logs-when-there-are-too-many">🚮 Clear logs when there are too many</a></h3>

<p>ActivityPub is a <em>chatty</em> protocol. Your server can easily fill up with hundreds of thousands of messages from others. This regularly prunes down to something more manageable.</p>

<h3 id="%ef%b8%8f%e2%83%a3-hashed-passwords-for-posting-%f0%9f%86%95%f0%9f%86%95"><a href="https://shkspr.mobi/blog/2026/03/some-updates-to-activitybot/#%ef%b8%8f%e2%83%a3-hashed-passwords-for-posting-%f0%9f%86%95%f0%9f%86%95">#️⃣ Hashed passwords for posting 🆕🆕</a></h3>

<p>Bit of a guilty moment here. I was originally storing the password in plaintext. Naughty! Passwords are now salted and hashed.</p>

<h3 id="%f0%9f%92%bb-basic-website-for-showing-posts"><a href="https://shkspr.mobi/blog/2026/03/some-updates-to-activitybot/#%f0%9f%92%bb-basic-website-for-showing-posts">💻 Basic website for showing posts</a></h3>

<p>A nice-enough looking front end if people want to view the posts directly on your domain.</p>

<h2 id="some-deficiencies"><a href="https://shkspr.mobi/blog/2026/03/some-updates-to-activitybot/#some-deficiencies">Some Deficiencies</a></h2>

<p>Not every piece of software is perfect. ActivityBot is less perfect than most things. Here are some of the things it can't do and, perhaps, will never do.  If you'd like to help tackle any of these, <a href="https://gitlab.com/edent/activity-bot/">fork the code from my git repo</a>!</p>

<h3 id="%e2%8f%b3-retry-failed-messages"><a href="https://shkspr.mobi/blog/2026/03/some-updates-to-activitybot/#%e2%8f%b3-retry-failed-messages">⏳ Retry Failed Messages</a></h3>

<p>A <em>proper</em> Mastodon server will keep trying to send messages to unresponsive hosts. ActivityBot is one-and-done. If a remote server didn't respond in time, or was offline, or something else went wrong - it may not get the message.</p>

<h3 id="%f0%9f%94%84-reposts-announce-quote"><a href="https://shkspr.mobi/blog/2026/03/some-updates-to-activitybot/#%f0%9f%94%84-reposts-announce-quote">🔄 Reposts / Announce / Quote</a></h3>

<p>You cannot boost other posts, or even your own. Nor can you send quote posts.</p>

<h3 id="%f0%9f%a4%96-act-on-instructions"><a href="https://shkspr.mobi/blog/2026/03/some-updates-to-activitybot/#%f0%9f%a4%96-act-on-instructions">🤖 Act On Instructions</a></h3>

<p>This is a basic bot. It contains no logic. If you send it a message asking it to take action, it will not. You will need to build something else to make it truly interactive.</p>

<h3 id="%f0%9f%93%a5-receive-messages"><a href="https://shkspr.mobi/blog/2026/03/some-updates-to-activitybot/#%f0%9f%93%a5-receive-messages">📥 Receive Messages</a></h3>

<p>In fact, other than the follow / unfollow stuff, the bot can't receive any messages from the Fediverse. It doesn't know when a post has been replied to, liked, or reposted.</p>

<h3 id="%f0%9f%98%8e-set-post-visibility"><a href="https://shkspr.mobi/blog/2026/03/some-updates-to-activitybot/#%f0%9f%98%8e-set-post-visibility">😎 Set Post Visibility</a></h3>

<p>Your posts are either public or a DM. There's no support for things like quiet followers.</p>

<h3 id="%f0%9f%93%8a-create-polls"><a href="https://shkspr.mobi/blog/2026/03/some-updates-to-activitybot/#%f0%9f%93%8a-create-polls">📊 Create Polls</a></h3>

<p>Everyone loves to vote on meaningless polls - but this is quite a hard problem for ActivityBot. It would need to keep track of votes, prevent double voting, and probably some other difficult stuff.</p>

<h3 id="%f0%9f%97%a8%ef%b8%8f-change-quote-post-visibility"><a href="https://shkspr.mobi/blog/2026/03/some-updates-to-activitybot/#%f0%9f%97%a8%ef%b8%8f-change-quote-post-visibility">🗨️ Change Quote Post Visibility</a></h3>

<p>As quote posts are still quite new to Mastodon, I'm not sure how best to implement this.</p>

<h3 id="%f0%9f%94%97-proper-html-markdown-support"><a href="https://shkspr.mobi/blog/2026/03/some-updates-to-activitybot/#%f0%9f%94%97-proper-html-markdown-support">🔗 Proper HTML / Markdown Support</a></h3>

<p>Autolinking names, hashtags, and links just about works - but not very reliably. In theory the bot <em>could</em> parse Markdown and create richly formatted HTML from it. But that may require an external library which would bloat the size. Perhaps posting raw HTML could work?</p>

<h3 id="%f0%9f%96%bc%ef%b8%8f-focus-points-for-images"><a href="https://shkspr.mobi/blog/2026/03/some-updates-to-activitybot/#%f0%9f%96%bc%ef%b8%8f-focus-points-for-images">🖼️ Focus Points for Images</a></h3>

<p>Perhaps of less use now, but still of interest to people?</p>

<h3 id="%e2%9d%93-other-stuff"><a href="https://shkspr.mobi/blog/2026/03/some-updates-to-activitybot/#%e2%9d%93-other-stuff">❓ Other Stuff</a></h3>

<p>I don't know what I don't know. Maybe some stuff is total broken? Maybe it is wildly out of spec? If you spot something dodgy, please let me know or <a href="https://gitlab.com/edent/activity-bot/">raise a Pull Request</a>.</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=68592&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2026/03/some-updates-to-activitybot/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
			</item>
		<item>
		<title><![CDATA[A big list of things I disable in WordPress]]></title>
		<link>https://shkspr.mobi/blog/2025/11/a-big-list-of-things-i-disable-in-wordpress/</link>
					<comments>https://shkspr.mobi/blog/2025/11/a-big-list-of-things-i-disable-in-wordpress/#comments</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Sun, 30 Nov 2025 12:34:23 +0000</pubDate>
				<category><![CDATA[/etc/]]></category>
		<category><![CDATA[blog]]></category>
		<category><![CDATA[HowTo]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[WordPress]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=63344</guid>

					<description><![CDATA[There are many things I like about the WordPress blogging software, and many things I find irritating. The most annoying aspect is that WordPress insists that its way is the best and there shall be no deviance. That means a lot of forced cruft being injected into my site. Headers that bloat my page size, Gutenberg stuff I&#039;ve no use for, and ridiculous editorial decisions.  To double-down on the…]]></description>
										<content:encoded><![CDATA[<p>There are many things I like about the WordPress blogging software, and many things I find irritating. The most annoying aspect is that WordPress insists that its way is the best and there shall be no deviance. That means a <em>lot</em> of forced cruft being injected into my site. Headers that bloat my page size, Gutenberg stuff I've no use for, and <a href="https://developer.wordpress.org/reference/functions/capital_p_dangit/">ridiculous editorial decisions</a>.</p>

<p>To double-down on the annoyance, there's no simple way to turn them off. In part, that is due to the "<a href="https://wordpress.org/about/philosophy/">WordPress Philosophy</a>":</p>

<blockquote><p><strong>Decisions, not options</strong></p>

<p>[…] Every time you give a user an option, you are asking them to make a decision. When a user doesn’t care or understand the option this ultimately leads to frustration.</p></blockquote>

<p>I broadly agree with that. Having hundreds of options is a burden for users and a nightmare for maintainers. Do please read this <a href="https://tommcfarlin.com/wordpress-philosophy-decisions-not-options/">excellent discussion from Tom McFarlin for a more detailed analysis</a>.</p>

<p>But I <em>want</em> to turn things off. Luckily, there is a way. If you're a developer, you can remove a fair number of these "enforced" decisions. Add the following to your theme's <code>functions.php</code> file and watch the mandatory WordPress bloat whither away.  I've commented each removal and, where possible, given a source for more information.  Feel free to leave a comment suggesting how this script can be improved and simplified.</p>

<pre><code class="language-php">//  Remove mandatory classic theme.
function disable_classic_theme_styles() {
    wp_deregister_style( "classic-theme-styles" );
    wp_dequeue_style(    "classic-theme-styles" );
}
add_action( "wp_enqueue_scripts", "disable_classic_theme_styles" );

//  Remove WP Emoji.
//  http://www.denisbouquet.com/remove-wordpress-emoji-code/
remove_action( "wp_head",             "print_emoji_detection_script", 7 );
remove_action( "wp_print_styles",     "print_emoji_styles"              );
remove_action( "admin_print_scripts", "print_emoji_detection_script"    );
remove_action( "admin_print_styles",  "print_emoji_styles"              );
//  https://wordpress.org/support/topic/remove-the-new-dns-prefetch-code/
add_filter( "emoji_svg_url", "__return_false" );

//  Stop emoji replacement with images in RSS / Atom Feeds
//  https://danq.me/2023/09/04/wordpress-stop-emoji-images/
remove_filter( "the_content_feed", "wp_staticize_emoji" );
remove_filter( "comment_text_rss", "wp_staticize_emoji" );

//  Remove automatic formatting.
//  https://css-tricks.com/snippets/wordpress/disable-automatic-formatting/
remove_filter( "the_content",  "wptexturize" );
remove_filter( "the_excerpt",  "wptexturize" );
remove_filter( "comment_text", "wptexturize" );
remove_filter( "the_title",    "wptexturize" );

//  More formatting crap.
add_action("init", function() {
    remove_filter( "the_content", "convert_smilies", 20 );
    foreach ( array( "the_content", "the_title", "wp_title", "document_title" ) as $filter ) {
        remove_filter( $filter, "capital_P_dangit", 11 );
    }
    remove_filter( "comment_text", "capital_P_dangit", 31 );    //  No idea why this is separate
    remove_filter( "the_content",  "do_blocks", 9 );
}, 11);

//  Remove Gutenberg Styles.
//  https://wordpress.org/support/topic/how-to-disable-inline-styling-style-idglobal-styles-inline-css/
remove_action( "wp_enqueue_scripts", "wp_enqueue_global_styles" );

//  Remove Gutenberg editing widgets.
//  From https://wordpress.org/plugins/classic-widgets/
//  Disables the block editor from managing widgets in the Gutenberg plugin.
add_filter( "gutenberg_use_widgets_block_editor", "__return_false" );
//  Disables the block editor from managing widgets.
add_filter( "use_widgets_block_editor", "__return_false" );

//  Remove Gutenberg Block Library CSS from loading on the frontend.
//  https://smartwp.com/remove-gutenberg-css/
function remove_wp_block_library_css() {
    wp_dequeue_style( "wp-block-library"       );
    wp_dequeue_style( "wp-block-library-theme" );
    wp_dequeue_style( "wp-components"          );
}
add_action( "wp_enqueue_scripts", "remove_wp_block_library_css", 100 );

//  Remove hovercards on comment links in admin area.
//  https://wordpress.org/support/topic/how-to-disable-mshots-service/#post-12946617
add_filter( "akismet_enable_mshots", "__return_false" );

//  Remove Unused Plugin code.
function remove_plugin_css_js() {
    wp_dequeue_style( "image-sizes" );
}
add_action( "wp_enqueue_scripts", "remove_plugin_css_js", 100 );

//  Remove WordPress forced image size
//  https://core.trac.wordpress.org/ticket/62413#comment:40
add_filter( "wp_img_tag_add_auto_sizes", "__return_false" );

//  Remove &lt;img&gt; enhancements
//  https://developer.wordpress.org/reference/functions/wp_filter_content_tags/
remove_filter( "the_content",  "wp_filter_content_tags", 12 );

//  Stop rewriting http:// URls for the main domain.
//  https://developer.wordpress.org/reference/hooks/wp_should_replace_insecure_home_url/
remove_filter( "the_content", "wp_replace_insecure_home_url", 10 );

//  Remove the attachment stuff
//  https://developer.wordpress.org/news/2024/01/building-dynamic-block-based-attachment-templates-in-themes/
remove_filter( "the_content", "prepend_attachment" );

//  Remove the block filter
remove_filter( "the_content", "apply_block_hooks_to_content_from_post_object", 8 );

//  Remove browser check from Admin dashboard.
//  https://core.trac.wordpress.org/attachment/ticket/27626/disable-wp-check-browser-version.0.2.php
if ( !empty( $_SERVER["HTTP_USER_AGENT"] ) ) {
    add_filter( "pre_site_transient_browser_" . md5( $_SERVER["HTTP_USER_AGENT"] ), "__return_null" );
}

//  Remove shortlink.
//  https://stackoverflow.com/questions/42444063/disable-wordpress-short-links
remove_action( "wp_head", "wp_shortlink_wp_head" );

//  Remove RSD.
//  https://wpengineer.com/1438/wordpress-header/
remove_action( "wp_head", "rsd_link" );

//  Remove extra feed links.
//  https://developer.wordpress.org/reference/functions/feed_links/
add_filter( "feed_links_show_comments_feed", "__return_false" );
add_filter( "feed_links_show_posts_feed",    "__return_false" );

//  Remove api.w.org link.
//  https://wordpress.stackexchange.com/questions/211467/remove-json-api-links-in-header-html
remove_action( "wp_head", "rest_output_link_wp_head" );
//  https://wordpress.stackexchange.com/questions/211817/how-to-remove-rest-api-link-in-http-headers
//  https://developer.wordpress.org/reference/functions/rest_output_link_header/
remove_action( "template_redirect", "rest_output_link_header", 11, 0 );
</code></pre>

<p>You can find the latest version of <a href="https://gitlab.com/edent/blog-theme/-/blob/master/includes/remove.php">my debloat script</a> in my theme's repo.</p>

<p>If there are other things you find helpful to remove, or a better way to organise this file, please drop a comment in the box.</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=63344&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2025/11/a-big-list-of-things-i-disable-in-wordpress/feed/</wfw:commentRss>
			<slash:comments>14</slash:comments>
		
		
			</item>
		<item>
		<title><![CDATA[A Self-Hosted Favicon Proxy written in PHP]]></title>
		<link>https://shkspr.mobi/blog/2025/10/a-self-hosted-favicon-proxy-written-in-php/</link>
					<comments>https://shkspr.mobi/blog/2025/10/a-self-hosted-favicon-proxy-written-in-php/#comments</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Tue, 28 Oct 2025 12:34:54 +0000</pubDate>
				<category><![CDATA[/etc/]]></category>
		<category><![CDATA[favicon]]></category>
		<category><![CDATA[HowTo]]></category>
		<category><![CDATA[HTML]]></category>
		<category><![CDATA[php]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=63434</guid>

					<description><![CDATA[In theory, you should be able to get the base favicon of any domain by calling /favicon.ico - but the reality is somewhat more complex than that. Plenty of sites use a wide variety of semi-standardised images which are usually only discoverable from the site&#039;s HTML.  There are several services which allow you to get favicons based on a domain. But they all have their problems.   Google   Exposes…]]></description>
										<content:encoded><![CDATA[<p>In theory, you should be able to get the base favicon of any domain by calling <code>/favicon.ico</code> - but the reality is somewhat more complex than that. Plenty of sites use a wide variety of semi-standardised images which are usually only discoverable from the site's HTML.</p>

<p>There are several services which allow you to get favicons based on a domain. But they all have their problems.</p>

<ul>
<li><a href="https://www.google.com/s2/favicons?domain=shkspr.mobi&amp;sz=256">Google</a>

<ul>
<li>Exposes your user's to Google's tracking.</li>
<li>Relies on redirects.</li>
</ul></li>
<li><a href="https://icons.duckduckgo.com/ip9/shkspr.mobi.ico">DuckDuckGo</a>

<ul>
<li>Not officially supported by DDG.</li>
</ul></li>
<li><a href="https://favicon.is/shkspr.mobi">Favicon.is</a>

<ul>
<li>No privacy policy whatsoever.</li>
</ul></li>
<li><a href="https://icon.horse/">Icons.horse</a>

<ul>
<li>Paid service.</li>
<li>Only small size icons.</li>
</ul></li>
<li><a href="https://favicone.com/shkspr.mobi">Favicone</a>

<ul>
<li>No privacy policy.</li>
<li>Only small size icons.</li>
</ul></li>
</ul>

<p>I want to show favicons next to specific links, but I don't want to expose my visitors to unnecessary tracking. How can I proxy these images so they are stored and served locally?</p>

<p>There are a few existing services. Some use <a href="https://github.com/seadfeng/favicons-proxy">Cloudflare workers</a> or other <a href="https://github.com/shaklain125/gicon">cloud services</a>, there are some local-first ones which are <a href="https://github.com/toolness/favicon-proxy">unmaintained</a>.  But nothing modern, self-hosted, and as easy to deploy as uploading a single PHP file.</p>

<p>So here's my attempt to make something which will preserve user privacy, be reasonably fast, and have moderately up-to-date icons, while remaining fast and efficient.</p>

<p></p><nav role="doc-toc"><menu><li><h2 id="table-of-contents"><a href="https://shkspr.mobi/blog/2025/10/a-self-hosted-favicon-proxy-written-in-php/#table-of-contents">Table of Contents</a></h2><menu><li><a href="https://shkspr.mobi/blog/2025/10/a-self-hosted-favicon-proxy-written-in-php/#getting-the-domain">Getting the domain</a></li><li><a href="https://shkspr.mobi/blog/2025/10/a-self-hosted-favicon-proxy-written-in-php/#getting-the-image">Getting the image</a></li><li><a href="https://shkspr.mobi/blog/2025/10/a-self-hosted-favicon-proxy-written-in-php/#getting-the-structure-right">Getting the structure right</a></li><li><a href="https://shkspr.mobi/blog/2025/10/a-self-hosted-favicon-proxy-written-in-php/#preventing-abuse">Preventing abuse</a></li><li><a href="https://shkspr.mobi/blog/2025/10/a-self-hosted-favicon-proxy-written-in-php/#putting-it-all-together">Putting it all together</a></li></menu></li></menu></nav><p></p>

<h2 id="getting-the-domain"><a href="https://shkspr.mobi/blog/2025/10/a-self-hosted-favicon-proxy-written-in-php/#getting-the-domain">Getting the domain</a></h2>

<p>Assuming the request comes in to <code>https://proxy.example.com/?domain=bbc.co.uk</code></p>

<p>PHP has a <a href="https://www.php.net/manual/en/filter.constants.php#constant.filter-validate-domain">handy <code>FILTER_VALIDATE_DOMAIN</code> filter</a> which will determine if the string is a domain.</p>

<pre><code class="language-php">filter_var( $domain, FILTER_VALIDATE_DOMAIN, FILTER_FLAG_HOSTNAME );
</code></pre>

<h3 id="dealing-with-idns"><a href="https://shkspr.mobi/blog/2025/10/a-self-hosted-favicon-proxy-written-in-php/#dealing-with-idns">Dealing with IDNs</a></h3>

<p>Some domains contain non-ASCII characters - for example <a href="https://莎士比亚.org/">https://莎士比亚.org/</a> - not all favicon services support International Domain Names.</p>

<p>Using <a href="https://www.php.net/manual/en/function.idn-to-ascii.php">the <code>idn_to_ascii()</code> function</a>, it is possible to get the Punycode domain.</p>

<pre><code class="language-php">$domain = idn_to_ascii("莎士比亚.org");
</code></pre>

<h2 id="getting-the-image"><a href="https://shkspr.mobi/blog/2025/10/a-self-hosted-favicon-proxy-written-in-php/#getting-the-image">Getting the image</a></h2>

<ol>
<li>Check if the icon has previously been downloaded.</li>
<li>Rotate randomly between a few different Favicon services.</li>
<li>Download the icon.</li>
<li>Save it somewhere.</li>
</ol>

<h2 id="getting-the-structure-right"><a href="https://shkspr.mobi/blog/2025/10/a-self-hosted-favicon-proxy-written-in-php/#getting-the-structure-right">Getting the structure right</a></h2>

<p>I know from my work on OpenBenches that storing tens of thousands of files in a single directory can be problematic. So I'll store the retrieved favicon in: <code>/tld/domain/subdomain/</code></p>

<p>That will make it quick to see if an icon exists. I'll save the file with a filename based on the current timestamp. That will allow me to check if an icon is out of date, and will prevent people downloading the icons directly from me.</p>

<h2 id="preventing-abuse"><a href="https://shkspr.mobi/blog/2025/10/a-self-hosted-favicon-proxy-written-in-php/#preventing-abuse">Preventing abuse</a></h2>

<p>I don't want anyone but visitors to my site to be able to use this service. So I'll add a (weak) check to see if the request came from my domain.</p>

<pre><code class="language-php">$referer = parse_url( $_SERVER["HTTP_REFERER"], PHP_URL_HOST );
if ( $referer == "shkspr.mobi") {
   …
}
</code></pre>

<p>Some browsers may not send referers for privacy reasons. So they won't see the favicons. But they probably wouldn't have seen the images loaded from a 3<sup>rd</sup> party service. So I'll serve a default image.</p>

<h2 id="putting-it-all-together"><a href="https://shkspr.mobi/blog/2025/10/a-self-hosted-favicon-proxy-written-in-php/#putting-it-all-together">Putting it all together</a></h2>

<p>You can grab the code from <a href="https://git.edent.tel/edent/Favicon-Proxy-PHP">my personal git service</a>.</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=63434&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2025/10/a-self-hosted-favicon-proxy-written-in-php/feed/</wfw:commentRss>
			<slash:comments>3</slash:comments>
		
		
			</item>
		<item>
		<title><![CDATA[Stop using preg_* on HTML and start using \Dom\HTMLDocument instead]]></title>
		<link>https://shkspr.mobi/blog/2025/05/stop-using-preg_-on-html-and-use-domhtmldocument/</link>
					<comments>https://shkspr.mobi/blog/2025/05/stop-using-preg_-on-html-and-use-domhtmldocument/#comments</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Fri, 09 May 2025 11:34:56 +0000</pubDate>
				<category><![CDATA[/etc/]]></category>
		<category><![CDATA[HTML]]></category>
		<category><![CDATA[HTML5]]></category>
		<category><![CDATA[php]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=60375</guid>

					<description><![CDATA[It is a truth universally acknowledged that a programmer in possession of some HTML will eventually try to parse it with a regular expression.  This makes many people very angry and is widely regarded as a bad move.  In the bad old days, it was somewhat understandable for a PHP coder to run a quick-and-dirty preg_replace() on a scrap of code. They probably could control the input and there wasn&#039;t …]]></description>
										<content:encoded><![CDATA[<p>It is a truth universally acknowledged that a programmer in possession of some HTML will eventually try to parse it with a regular expression.</p>

<p><a href="https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454">This makes many people very angry and is widely regarded as a bad move</a>.</p>

<p>In the bad old days, it was somewhat understandable for a PHP coder to run a quick-and-dirty <code>preg_replace()</code> on a scrap of code. They probably could control the input and there wasn't a great way to manipulate an HTML5 DOM.</p>

<p>Rejoice sinners! PHP 8.4 is here to save your wicked souls. There's a new <a href="https://wiki.php.net/rfc/domdocument_html5_parser">HTML5 Parser</a> which makes <em>everything</em> better and stops you having to write brittle regexen.</p>

<p>Here are a few tips - mostly notes to myself - but I hope you'll find useful.</p>

<h2 id="sanitise-html"><a href="https://shkspr.mobi/blog/2025/05/stop-using-preg_-on-html-and-use-domhtmldocument/#sanitise-html">Sanitise HTML</a></h2>

<p>This is the most basic example. This loads HTML into a DOM, tries to fix all the mistakes it finds, and then spits out the result.</p>

<pre><code class="language-php">$html = '&lt;p id="yes" id="no"&gt;&lt;em&gt;Hi&lt;/div&gt;&lt;h2&gt;Test&lt;/h3&gt;&lt;img /&gt;';
$dom = \Dom\HTMLDocument::createFromString( $html, LIBXML_NOERROR | LIBXML_HTML_NOIMPLIED , "UTF-8" );
echo $dom-&gt;saveHTML();
</code></pre>

<p>It uses <code>LIBXML_HTML_NOIMPLIED</code> because we don't want a full HTML document with a doctype, head, body, etc.</p>

<p>If you want <a href="https://shkspr.mobi/blog/2025/04/introducing-pretty-print-html-for-php-8-4/">Pretty Printing, you can use my library</a>.</p>

<h2 id="get-the-plain-text"><a href="https://shkspr.mobi/blog/2025/05/stop-using-preg_-on-html-and-use-domhtmldocument/#get-the-plain-text">Get the plain text</a></h2>

<p>OK, so you've got the DOM, how do you get the text of the body without any of the surrounding HTML</p>

<pre><code class="language-php">$html = '&lt;p&gt;&lt;em&gt;Hello&lt;/em&gt; World!&lt;/p&gt;';
$dom = \Dom\HTMLDocument::createFromString( $html, LIBXML_NOERROR , "UTF-8" );
echo $dom-&gt;body-&gt;textContent;
</code></pre>

<p>Note, this doesn't replace images with their alt text.</p>

<h2 id="get-a-single-element"><a href="https://shkspr.mobi/blog/2025/05/stop-using-preg_-on-html-and-use-domhtmldocument/#get-a-single-element">Get a single element</a></h2>

<p>You can use <a href="https://developer.mozilla.org/en-US/docs/Web/API/Document/querySelector">the same <code>querySelector()</code> function as you do in JavaScript</a>!</p>

<pre><code class="language-php">$element = $dom-&gt;querySelector( "h2" );
</code></pre>

<p>That returns a <em>pointer</em> to the element. Which means you can run:</p>

<pre><code class="language-php">$element-&gt;setAttribute( "id", "interesting" );
echo $dom-&gt;querySelector( "h2" )-&gt;attributes["id"]-&gt;value;
</code></pre>

<p>And you will see that the DOM has been manipulated!</p>

<h2 id="search-for-multiple-elements"><a href="https://shkspr.mobi/blog/2025/05/stop-using-preg_-on-html-and-use-domhtmldocument/#search-for-multiple-elements">Search for multiple elements</a></h2>

<p>Suppose you have a bunch of headings and you want to get all of them. You can use <a href="https://developer.mozilla.org/en-US/docs/Web/API/Document/querySelectorAll">the same <code>querySelectorAll()</code> function as you do in JavaScript</a>!</p>

<p>To get all headings, in the order they appear:</p>

<pre><code class="language-php">$headings = $dom-&gt;querySelectorAll( "h1, h2, h3, h4, h5, h6" );
foreach ( $headings as $heading ) {
   // Do something
}
</code></pre>

<h2 id="advanced-search"><a href="https://shkspr.mobi/blog/2025/05/stop-using-preg_-on-html-and-use-domhtmldocument/#advanced-search">Advanced Search</a></h2>

<p>Suppose you have a bunch of links and you want to find only those which point to "example.com/test/". Again, you can use <a href="https://developer.mozilla.org/en-US/docs/Web/CSS/Attribute_selectors">the same attribute selectors</a> as you would elsewhere</p>

<pre><code class="language-php">$dom-&gt;querySelectorAll( "a[href^=https\:\/\/example\.com\/test\/]" );
</code></pre>

<h2 id="replacing-content"><a href="https://shkspr.mobi/blog/2025/05/stop-using-preg_-on-html-and-use-domhtmldocument/#replacing-content">Replacing content</a></h2>

<p>Sadly, it isn't quite as simple as setting the <code>innerHTML</code>.  Each search returns a node. That node may have <em>children</em>. Those children will also be node which, themselves, may have children, and so on.</p>

<p>Let's take a simple example:</p>

<pre><code class="language-php">$html = '&lt;h2&gt;Hello&lt;/h2&gt;';
$dom = \Dom\HTMLDocument::createFromString( $html, LIBXML_NOERROR | LIBXML_HTML_NOIMPLIED, "UTF-8" );
$element = $dom-&gt;querySelector( "h2" );
$element-&gt;childNodes[0]-&gt;textContent = "Goodbye";
echo $dom-&gt;saveHTML();
</code></pre>

<p>That changes "Hello" to "Goodbye".</p>

<p>But what if the element has child nodes?</p>

<pre><code class="language-php">$html = '&lt;h2&gt;Hello &lt;em&gt;friend&lt;/em&gt;&lt;/h2&gt;';
$dom = \Dom\HTMLDocument::createFromString( $html, LIBXML_NOERROR | LIBXML_HTML_NOIMPLIED, "UTF-8" );
$element = $dom-&gt;querySelector( "h2" );
$element-&gt;childNodes[0]-&gt;textContent = "Goodbye";
echo $dom-&gt;saveHTML();
</code></pre>

<p>That outputs <code>&lt;h2&gt;Goodbye&lt;em&gt;friend&lt;/em&gt;&lt;/h2&gt;</code> - so think carefully about the structure of the DOM and what you want to replace.</p>

<h2 id="adding-a-new-node"><a href="https://shkspr.mobi/blog/2025/05/stop-using-preg_-on-html-and-use-domhtmldocument/#adding-a-new-node">Adding a new node</a></h2>

<p>This one is tricky!  Let's suppose you have this:</p>

<pre><code class="language-html">&lt;div id="page"&gt;
   &lt;main&gt;
      &lt;h2&gt;Hello&lt;/h2&gt;
</code></pre>

<p>You want to add an <code>&lt;h1&gt;</code> <em>before</em> the <code>&lt;h2&gt;</code>. Here's how to do this.</p>

<p>First, you need to construct the DOM:</p>

<pre><code class="language-php">$html = '&lt;div id="page"&gt;&lt;main&gt;&lt;h2&gt;Hello&lt;/h2&gt;';
$dom = \Dom\HTMLDocument::createFromString( $html, LIBXML_NOERROR | LIBXML_HTML_NOIMPLIED, "UTF-8" );
</code></pre>

<p>Next, you need to construct <em>an entirely new</em> DOM for your new node.</p>

<pre><code class="language-php">$newHTML = "&lt;h1&gt;Title&lt;/h1&gt;";
$newDom = \Dom\HTMLDocument::createFromString( $newHTML, LIBXML_NOERROR | LIBXML_HTML_NOIMPLIED, "UTF-8" );
</code></pre>

<p>Next, extract the new element from the new DOM, and import it into the original DOM:</p>

<pre><code class="language-php">$element = $dom-&gt;importNode( $newDom-&gt;firstChild, true ); 
</code></pre>

<p>The element now needs to be inserted <em>somewhere</em> in the original DOM. In this case, get the <code>h2</code>, tell its parent node to insert the new node <em>before</em> the <code>h2</code>:</p>

<pre><code class="language-php">$h2 = $dom-&gt;querySelector( "h2" );
$h2-&gt;parentNode-&gt;insertBefore( $element, $h2 );
echo $dom-&gt;saveHTML();
</code></pre>

<p>Out pops:</p>

<pre><code class="language-html">&lt;div id="page"&gt;
   &lt;main&gt;
      &lt;h1&gt;Title&lt;/h1&gt;
      &lt;h2&gt;Hello&lt;/h2&gt;
   &lt;/main&gt;
&lt;/div&gt;
</code></pre>

<p>An alternative is to use <a href="https://www.php.net/manual/en/domnode.appendchild.php">the <code>appendChild()</code> method</a>. Note that it appends it to the <em>end</em> of the children. For example:</p>

<pre><code class="language-php">$div = $dom-&gt;querySelector( "#page" );
$div-&gt;appendChild( $element );
echo $dom-&gt;saveHTML();
</code></pre>

<p>Produces:</p>

<pre><code class="language-html">&lt;div id="page"&gt;
   &lt;main&gt;
      &lt;h2&gt;Hello&lt;/h2&gt;
   &lt;/main&gt;
   &lt;h1&gt;Title&lt;/h1&gt;
&lt;/div&gt;
</code></pre>

<h2 id="and-more"><a href="https://shkspr.mobi/blog/2025/05/stop-using-preg_-on-html-and-use-domhtmldocument/#and-more">And more?</a></h2>

<p>I've only scratched the surface of what the new 8.4 HTML Parser can do. I've already rewritten lots of my yucky old <code>preg_</code> code to something which (hopefully) is less likely to break in catastrophic ways.</p>

<p>If you have any other tips, please leave a comment.</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=60375&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2025/05/stop-using-preg_-on-html-and-use-domhtmldocument/feed/</wfw:commentRss>
			<slash:comments>5</slash:comments>
		
		
			</item>
		<item>
		<title><![CDATA[Using Tempest Highlight with WordPress]]></title>
		<link>https://shkspr.mobi/blog/2025/04/using-tempest-highlight-with-wordpress/</link>
					<comments>https://shkspr.mobi/blog/2025/04/using-tempest-highlight-with-wordpress/#comments</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Sat, 26 Apr 2025 11:34:19 +0000</pubDate>
				<category><![CDATA[/etc/]]></category>
		<category><![CDATA[css]]></category>
		<category><![CDATA[HTML]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[WordPress]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=59866</guid>

					<description><![CDATA[I like to highlight bits of code on my blog. I was using GeSHi - but it has ceased to receive updates and the colours it uses aren&#039;t WCAG compliant.  After skimming through a few options, I found Tempest Highlight. It has nearly everything I want in a code highlighter:        PHP with no 3rd party dependencies.      Lots of common languages.      Modern, with regular updates.      Easy to use fun…]]></description>
										<content:encoded><![CDATA[<p>I like to highlight bits of code on my blog. I <em>was</em> using <a href="https://shkspr.mobi/blog/2025/04/a-small-php-update-to-geshi/">GeSHi</a> - but it has ceased to receive updates and the colours it uses aren't WCAG compliant.</p>

<p>After skimming through a few options, I found <a href="https://github.com/tempestphp/highlight">Tempest Highlight</a>. It has <em>nearly</em> everything I want in a code highlighter:</p>

<ul style="list-style-type: &quot;✅&quot;;">
    <li>&nbsp;PHP with no 3rd party dependencies.</li>
    <li>&nbsp;Lots of common languages.</li>
    <li>&nbsp;Modern, with regular updates.</li>
    <li>&nbsp;Easy to use functions.</li>
    <li>&nbsp;Range of difference style sheets.</li>
</ul>

<p>But, on the downside:</p>

<ul style="list-style-type: &quot;❌&quot;;">
    <li>&nbsp;No WordPress plugin.</li>
    <li>&nbsp;Not all languages supported.</li>
    <li>&nbsp;CSS embedded in HTML.</li>
</ul>

<p>I can live without some esoteric languages, but I don't really want to run <code>composer install</code> on my blog. I just want a quick WordPress plugin.  So, here's how I did it.</p>

<p></p><nav role="doc-toc"><menu><li><h2 id="table-of-contents"><a href="https://shkspr.mobi/blog/2025/04/using-tempest-highlight-with-wordpress/#table-of-contents">Table of Contents</a></h2><menu><li><a href="https://shkspr.mobi/blog/2025/04/using-tempest-highlight-with-wordpress/#here-be-dragons">Here Be Dragons</a></li><li><a href="https://shkspr.mobi/blog/2025/04/using-tempest-highlight-with-wordpress/#the-art-of-loading-without-loading">The Art of Loading without Loading</a></li><li><a href="https://shkspr.mobi/blog/2025/04/using-tempest-highlight-with-wordpress/#testing">Testing</a></li><li><a href="https://shkspr.mobi/blog/2025/04/using-tempest-highlight-with-wordpress/#draw-the-rest-of-the-owl">Draw The Rest of the Owl</a></li><li><a href="https://shkspr.mobi/blog/2025/04/using-tempest-highlight-with-wordpress/#todo">ToDo</a></li><li><a href="https://shkspr.mobi/blog/2025/04/using-tempest-highlight-with-wordpress/#get-the-code">Get the code</a></li></menu></li></menu></nav><p></p>

<h2 id="here-be-dragons"><a href="https://shkspr.mobi/blog/2025/04/using-tempest-highlight-with-wordpress/#here-be-dragons">Here Be Dragons</a></h2>

<p>This is a quick prototype. It has an audience of one; me. It may break in unexpected ways. Use at your own risk.</p>

<p>The file layout is relatively simple:</p>

<pre><code class="language-_">WordPress Plugins
├── Highlight_Plugin
│&nbsp;&nbsp; ├── src/
│&nbsp;&nbsp; ├── autoload.php
│&nbsp;&nbsp; ├── index.php
│&nbsp;&nbsp; └── base.css
</code></pre>

<p>The <code>src/</code> directory contains the <code>src/</code> directory from <a href="https://github.com/tempestphp/highlight">Tempest Highlight</a>.</p>

<h2 id="the-art-of-loading-without-loading"><a href="https://shkspr.mobi/blog/2025/04/using-tempest-highlight-with-wordpress/#the-art-of-loading-without-loading">The Art of Loading without Loading</a></h2>

<p>Normally, to install a PHP package, the <code>composer</code> app creates an autoloader which will magically import everything you need into your project.  We can't do that here. Instead, we need to manually load the library.</p>

<p>Create a file in the plugin's directory called <code>autoload.php</code> - its job is to autoload everything in the <code>src/</code> directory.</p>

<pre><code class="language-php">&lt;?php
spl_autoload_register( function ( $class ) {
    //  Project-specific namespace prefix
    $prefix = "Tempest\\Highlight\\";

    //  Base directory for the namespace prefix
    $base_dir = __DIR__ . "/src/";

    //  Does the class use the namespace prefix?
    $len = strlen( $prefix );
    if ( strncmp( $prefix, $class, $len ) !== 0) {
        //  No, move to the next registered autoloader
        return;
    }

    //  Get the relative class name
    $relative_class = substr( $class, $len );

    //  Replace namespace separators with directory separators, append with .php
    $file = $base_dir . str_replace( "\\", "/", $relative_class ) . ".php";

    //  If the file exists, require it
    if ( file_exists( $file ) ) {
        require $file;
    }
});
</code></pre>

<p>I don't know if that's the <em>easiest</em> way to do it. But it works!</p>

<h2 id="testing"><a href="https://shkspr.mobi/blog/2025/04/using-tempest-highlight-with-wordpress/#testing">Testing</a></h2>

<p>The <code>index.php</code> file can now be tested:</p>

<pre><code class="language-php">//  Load the Tempest Highlight library
require_once __DIR__ . "/autoload.php";

//  Set up the namespace
use Tempest\Highlight\Highlighter;

//  Define the theme.
$theme = new Tempest\Highlight\Themes\InlineTheme( __DIR__ . "/src/Themes/Css/light-plus.css");

//  Create the highlighter.
$highlighter = new Tempest\Highlight\Highlighter( $theme );

//  Print some formatted HTML
echo $highlighter-&gt;parse("&lt;em id='foo' class='bar'&gt;test&lt;/em&gt;", "html" );
</code></pre>

<p>All being well, that should produce this:</p>

<pre><code class="language-_">&amp;lt;&lt;span style="color: #0000ff;"&gt;em&lt;/span&gt; id='foo' class='bar'&amp;gt;test&amp;lt;/&lt;span style="color: #0000ff;"&gt;em&lt;/span&gt;&amp;gt;
</code></pre>

<p>That has the CSS embedded. Not ideal, but certainly good enough.  I picked "light-plus" because it was the only theme which seemed to meet at least WCAG AA when on a white background.</p>

<p>OK, so how do we go from printing out a scrap of HTML to extracting all the code snippets from a WordPress blog?</p>

<h2 id="draw-the-rest-of-the-owl"><a href="https://shkspr.mobi/blog/2025/04/using-tempest-highlight-with-wordpress/#draw-the-rest-of-the-owl">Draw The Rest of the Owl</a></h2>

<p>In <em>theory</em> the code is relatively straightforward.</p>

<h3 id="find-code-snippets"><a href="https://shkspr.mobi/blog/2025/04/using-tempest-highlight-with-wordpress/#find-code-snippets">Find code snippets</a></h3>

<p>My <a href="https://codeberg.org/edent/markdown-extra-unofficial/">Markdown plugin</a> transforms this:</p>

<pre><code class="language-_"> ```javascript
 var a = 2.0;
 ``` 
</code></pre>

<p>Into this:</p>

<pre><code class="language-html">&lt;pre&gt;&lt;code class="language-javascript"&gt;
var a = 2.0;
&lt;/code&gt;&lt;/pre&gt;
</code></pre>

<p>No need to use a regex, the new PHP 8.4 HTMLDocument gives us direct programmatic access to the HTML.</p>

<pre><code class="language-php">//  Load the content into PHP 8.4's HTML DOM.
$dom = Dom\HTMLDocument::createFromString( $content, LIBXML_NOERROR | LIBXML_HTML_NOIMPLIED, "UTF-8" );

//  Select the code snippets.
//  `&lt;pre&gt;&lt;code class="language-*"&gt;`
$codeSnippets = $dom-&gt;querySelectorAll( "pre&gt;code[class^=language-]" );
</code></pre>

<h3 id="replace-the-snippets"><a href="https://shkspr.mobi/blog/2025/04/using-tempest-highlight-with-wordpress/#replace-the-snippets">Replace the snippets</a></h3>

<p>From the above, I have the language and code, so it can "easily" be replaced.</p>

<pre><code class="language-php">//  Iterate through each snippet.
foreach ( $codeSnippets as $code ) {
    //  Get the HTML from within the &lt;code&gt;.
    $originalCode = $code-&gt;textContent;
    //  Replace the contents of &lt;code&gt; with the highlighted HTML.
    $code-&gt;innerHTML = $highlighter-&gt;parse( $originalCode, $language )
}
</code></pre>

<p>Replacing the code in that node manipulates the original DOM.  Which means, after looping through all the snippets, I can return the altered HTML like so:</p>

<pre><code class="language-php">return $dom-&gt;saveHTML();
</code></pre>

<h3 id="and-then"><a href="https://shkspr.mobi/blog/2025/04/using-tempest-highlight-with-wordpress/#and-then">And then…</a></h3>

<p>Obviously, there's a bit more too it than that. It ignores RSS feeds, it adds a base CSS style to the head, some SVGs get embedded, semantic metadata is included, and it all gets a bit tangled and complicated.</p>

<h2 id="todo"><a href="https://shkspr.mobi/blog/2025/04/using-tempest-highlight-with-wordpress/#todo">ToDo</a></h2>

<p>A few things need to happen to make this even better.</p>

<ul>
<li>Encoded comments as well and posts.</li>
<li>Add new languages.</li>
<li>Don't in-line the CSS into the HTML, but add it as a separate stylesheet.</li>
</ul>

<p>But, for now, it is running on my blog and that's good enough for me!</p>

<h2 id="get-the-code"><a href="https://shkspr.mobi/blog/2025/04/using-tempest-highlight-with-wordpress/#get-the-code">Get the code</a></h2>

<p>You can <a href="https://github.com/edent/highlight">play about with the WordPress plugin</a>. Bugs reports, pull requests, and suggestions all warmly welcomed.</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=59866&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2025/04/using-tempest-highlight-with-wordpress/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
			</item>
		<item>
		<title><![CDATA[A small PHP update to GeSHi]]></title>
		<link>https://shkspr.mobi/blog/2025/04/a-small-php-update-to-geshi/</link>
					<comments>https://shkspr.mobi/blog/2025/04/a-small-php-update-to-geshi/#respond</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Wed, 23 Apr 2025 11:34:53 +0000</pubDate>
				<category><![CDATA[/etc/]]></category>
		<category><![CDATA[HTML]]></category>
		<category><![CDATA[php]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=59807</guid>

					<description><![CDATA[The faithful old GeSHi Syntax Highlighter hasn&#039;t seen an update in a many a long year.  It&#039;s a tried and trusted way to do server-side code highlighting - turning a myriad of programming languages into beautiful HTML &#38; CSS.  A few weeks ago, I noticed someone had proposed an update to its HTML rendering. The changes were mostly adding in new element names.  PHP has been updated several times…]]></description>
										<content:encoded><![CDATA[<p>The faithful old GeSHi Syntax Highlighter hasn't seen an update in a many a long year.  It's a tried and trusted way to do server-side code highlighting - turning a myriad of programming languages into beautiful HTML &amp; CSS.</p>

<p>A few weeks ago, I noticed someone had <a href="https://github.com/GeSHi/geshi-1.0/pull/156">proposed an update to its HTML rendering</a>. The changes were mostly adding in new element names.</p>

<p>PHP has been updated several times since GeSHi was last updated, so I thought I'd do the same. Here's <a href="https://github.com/GeSHi/geshi-1.0/pull/162">an update to the PHP highlighter</a>.</p>

<p>Getting all the current PHP functions was fairly simple:</p>

<pre><code class="language-php">$functions = get_defined_functions();
$builtInFunctions = $functions['internal'];
sort($builtInFunctions);
foreach ( $builtInFunctions as $key =&gt; $value ) {
   echo "'{$value}', "; 
}
</code></pre>

<p>Now I'm wondering if there's a <em>better</em> code highlighter.  Here's what I'm looking for:</p>

<ul>
<li>Server-side. I don't want to clutter the web with JavaScript.</li>
<li>PHP only. I don't want to add something more complicated to my tech stack.</li>
<li>WordPress for preference (but not blocks-only). Although I can build around a library.</li>
<li>Accessible colours. GeSHi's style-sheet doesn't always meet WCAG.</li>
<li>Actively maintained. If it hasn't been updated in 2 years, it's probably broken.</li>
<li>Somewhat hackable. I like to add a bit of semantic fluff around the output.</li>
</ul>

<p>Any thoughts?</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=59807&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2025/04/a-small-php-update-to-geshi/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title><![CDATA[Introducing Pretty Print HTML for PHP 8.4]]></title>
		<link>https://shkspr.mobi/blog/2025/04/introducing-pretty-print-html-for-php-8-4/</link>
					<comments>https://shkspr.mobi/blog/2025/04/introducing-pretty-print-html-for-php-8-4/#respond</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Sat, 19 Apr 2025 11:34:54 +0000</pubDate>
				<category><![CDATA[/etc/]]></category>
		<category><![CDATA[HTML]]></category>
		<category><![CDATA[php]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=59672</guid>

					<description><![CDATA[I&#039;m delight to announce the first release of my opinionated HTML Pretty Printer for new versions of PHP.   Grab the code from Packagist Contribute on GitLab   There are several prettifiers on Packagist, but I think mine is the only one which works with the new Dom\HTMLDocument class.  Table of ContentsWhatHowLimitationsWhyNext Steps  What  This takes hard-to-read HTML like:  &#60;!doctype…]]></description>
										<content:encoded><![CDATA[<p>I'm delight to announce the first release of my opinionated HTML Pretty Printer for new versions of PHP.</p>

<ul>
<li><a href="https://packagist.org/packages/edent/pretty-print-html">Grab the code from Packagist</a></li>
<li><a href="https://gitlab.com/edent/pretty-print-html-using-php/">Contribute on GitLab</a></li>
</ul>

<p>There are several prettifiers on Packagist, but I think mine is the only one which works with <a href="https://wiki.php.net/rfc/domdocument_html5_parser">the new <code>Dom\HTMLDocument</code> class</a>.</p>

<p></p><nav role="doc-toc"><menu><li><h2 id="table-of-contents"><a href="https://shkspr.mobi/blog/2025/04/introducing-pretty-print-html-for-php-8-4/#table-of-contents">Table of Contents</a></h2><menu><li><a href="https://shkspr.mobi/blog/2025/04/introducing-pretty-print-html-for-php-8-4/#what">What</a></li><li><a href="https://shkspr.mobi/blog/2025/04/introducing-pretty-print-html-for-php-8-4/#how">How</a></li><li><a href="https://shkspr.mobi/blog/2025/04/introducing-pretty-print-html-for-php-8-4/#limitations">Limitations</a></li><li><a href="https://shkspr.mobi/blog/2025/04/introducing-pretty-print-html-for-php-8-4/#why">Why</a></li><li><a href="https://shkspr.mobi/blog/2025/04/introducing-pretty-print-html-for-php-8-4/#next-steps">Next Steps</a></li></menu></li></menu></nav><p></p>

<h2 id="what"><a href="https://shkspr.mobi/blog/2025/04/introducing-pretty-print-html-for-php-8-4/#what">What</a></h2>

<p>This takes hard-to-read HTML like:</p>

<p><code>&lt;!doctype html&gt;&lt;html&gt;&lt;head&gt;&lt;meta charset="UTF-8"&gt;&lt;/head&gt;&lt;body&gt;&lt;div id="main" class="news main"&gt;&lt;h1 id="top"&gt;Title&lt;/h1&gt;&lt;p&gt;How &lt;em&gt;exciting&lt;/em&gt;!&lt;/p&gt;&lt;/div&gt;</code></p>

<p>And pretty-prints it with some <em>opinionated</em> formatting:</p>

<pre><code class="language-html">&lt;!doctype html&gt;
&lt;html&gt;
    &lt;head&gt;
        &lt;meta charset=UTF-8&gt;
    &lt;/head&gt;
    &lt;body&gt;
        &lt;div class="main news" id=main&gt;
            &lt;h1 id=top&gt;Title&lt;/h1&gt;
            &lt;p&gt;How &lt;em&gt;exciting&lt;/em&gt;!&lt;/p&gt;
        &lt;/div&gt;
    &lt;/body&gt;
&lt;/html&gt;
</code></pre>

<p>All elements are indented where possible. Attributes are sorted alphabetically. Attribute variables are unquoted if possible. CSS and JS are unaltered. These options are configurable.</p>

<p>To get an idea of what it outputs, take a look at the source code of this page!</p>

<h2 id="how"><a href="https://shkspr.mobi/blog/2025/04/introducing-pretty-print-html-for-php-8-4/#how">How</a></h2>

<p>This is designed to be simple to use, but with enough options to be useful to as many people as possible.</p>

<pre><code class="language-php">//  HTML as a string:
$html = "&lt;div&gt;This is &lt;span&gt; an &lt;em&gt;example&lt;/em&gt;";
//  Or as a file:
$html = file_get_contents( "example.html" );

//  Turn the HTML into a Dom\HTMLDocument
$dom = \Dom\HTMLDocument::createFromString( $html, LIBXML_NOERROR, "UTF-8" );

//  Create the pretty printer
$formatter = new Edent\PrettyPrintHtml\PrettyPrintHtml();

//  Output the result
echo $formatter-&gt;serializeHtml( $dom );
</code></pre>

<h2 id="limitations"><a href="https://shkspr.mobi/blog/2025/04/introducing-pretty-print-html-for-php-8-4/#limitations">Limitations</a></h2>

<p>Whitespace is <em>hard</em>. There are many different types. Sometimes it is for display, sometimes it isn't. Adding extra newlines and tabs almost certainly <em>will</em> cause layout changes somewhere on your page.</p>

<p>You can either change your CSS to minimise this, add elements to the <code>preserveElements</code> list to stop them being altered, or re-write your original HTML.  The choice is yours.</p>

<h2 id="why"><a href="https://shkspr.mobi/blog/2025/04/introducing-pretty-print-html-for-php-8-4/#why">Why</a></h2>

<p><a href="https://libraries.mit.edu/150books/2011/05/11/1985/">As was written long ago</a>:</p>

<blockquote><p>A computer language is not just a way of getting a computer to perform operations but rather … it is a novel formal medium for expressing ideas about methodology. Thus, programs must be written for people to read, and only incidentally for machines to execute.</p></blockquote>

<p>PHP's new <code>Dom\HTMLDocument</code> class produces syntactically valid HTML code. The code is very easy for a computer to parse. But because there is no indenting, the code is difficult for a human to parse.</p>

<p>Adding newlines and indents before every new element can introduce spacing errors when the HTML is rendered to screen. Some of these can be fixed with extra CSS, some cannot</p>

<p>This pretty-printer attempts to make code readable for humans by striking a balance between legibility when rendered on screen or viewed as source code.</p>

<p>Why is human readability so important?</p>

<p>As <a href="https://ohhelloana.blog/in-defense-of-unpolished-websites/">Ana Rodrigues said</a>:</p>

<blockquote><p>Today's heavily optimized websites have largely killed the "view source" learning experience. The code is minified, bundled, and often incomprehensible to beginners trying to understand how things work. […] I want anyone, regardless of skill level, to inspect elements, understand the structure, and learn from readable code.</p></blockquote>

<p>Using this pretty printer should give you and your users an excellent "view source" experience, without sacrificing the browser's ability to render the code.</p>

<h2 id="next-steps"><a href="https://shkspr.mobi/blog/2025/04/introducing-pretty-print-html-for-php-8-4/#next-steps">Next Steps</a></h2>

<p>I'm sure there are many bugs and oddities. I'd love you to <a href="https://gitlab.com/edent/pretty-print-html-using-php/">report any problems on GitLab</a>. Feel free to contribute test-cases and code.</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=59672&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2025/04/introducing-pretty-print-html-for-php-8-4/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title><![CDATA[An opinionated HTML Serializer for PHP 8.4]]></title>
		<link>https://shkspr.mobi/blog/2025/04/an-opinionated-html-serializer-for-php-8-4/</link>
					<comments>https://shkspr.mobi/blog/2025/04/an-opinionated-html-serializer-for-php-8-4/#respond</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Wed, 02 Apr 2025 11:34:36 +0000</pubDate>
				<category><![CDATA[/etc/]]></category>
		<category><![CDATA[HowTo]]></category>
		<category><![CDATA[HTML5]]></category>
		<category><![CDATA[php]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=59322</guid>

					<description><![CDATA[A few days ago, I wrote a shitty pretty-printer for PHP 8.4&#039;s new Dom\HTMLDocument class.  I&#039;ve since re-written it to be faster and more stylistically correct.  It turns this:  &#60;html lang=&#34;en-GB&#34;&#62;&#60;head&#62;&#60;title id=&#34;something&#34;&#62;Test&#60;/title&#62;&#60;/head&#62;&#60;body&#62;&#60;h1 class=&#34;top upper&#34;&#62;Testing&#60;/h1&#62;&#60;main&#62;&#60;p&#62;Some &#60;em&#62;HTML&#60;/em&#62; and an &#60;img src=&#34;example.png&#34; alt=&#34;Alternate Text&#34;&#62;&#60;/p&#62;Text not in an…]]></description>
										<content:encoded><![CDATA[<p>A few days ago, <a href="https://shkspr.mobi/blog/2025/03/pretty-print-html-using-php-8-4s-new-html-dom/">I wrote a shitty pretty-printer</a> for PHP 8.4's new <a href="https://www.php.net/manual/en/class.dom-htmldocument.php">Dom\HTMLDocument class</a>.</p>

<p>I've since re-written it to be faster and more stylistically correct.</p>

<p>It turns this:</p>

<pre><code class="language-html">&lt;html lang="en-GB"&gt;&lt;head&gt;&lt;title id="something"&gt;Test&lt;/title&gt;&lt;/head&gt;&lt;body&gt;&lt;h1 class="top upper"&gt;Testing&lt;/h1&gt;&lt;main&gt;&lt;p&gt;Some &lt;em&gt;HTML&lt;/em&gt; and an &lt;img src="example.png" alt="Alternate Text"&gt;&lt;/p&gt;Text not in an element&lt;ol&gt;&lt;li&gt;List&lt;/li&gt;&lt;li&gt;Another list&lt;/li&gt;&lt;/ol&gt;&lt;/main&gt;&lt;/body&gt;&lt;/html&gt;
</code></pre>

<p>Into this:</p>

<pre><code class="language-html">&lt;!doctype html&gt;
&lt;html lang=en-GB&gt;
    &lt;head&gt;
        &lt;title id=something&gt;Test&lt;/title&gt;
    &lt;/head&gt;
    &lt;body&gt;
        &lt;h1 class="top upper"&gt;Testing&lt;/h1&gt;
        &lt;main&gt;
            &lt;p&gt;
                Some 
                &lt;em&gt;HTML&lt;/em&gt;
                 and an 
                &lt;img src=example.png alt="Alternate Text"&gt;
            &lt;/p&gt;
            Text not in an element
            &lt;ol&gt;
                &lt;li&gt;List&lt;/li&gt;
                &lt;li&gt;Another list&lt;/li&gt;
            &lt;/ol&gt;
        &lt;/main&gt;
    &lt;/body&gt;
&lt;/html&gt;
</code></pre>

<p>I say it is "opinionated" because it does the following:</p>

<ul>
<li>Attributes are unquoted unless necessary.</li>
<li>Every element is logically indented.</li>
<li>Text content of CSS and JS is unaltered. No pretty-printing, minification, or checking for correctness.</li>
<li>Text content of elements <em>may</em> have extra newlines and tabs. Browsers will tend to ignore multiple whitespaces unless the CSS tells them otherwise.

<ul>
<li>This fucks up <code>&lt;pre&gt;</code> blocks which contain markup.</li>
</ul></li>
</ul>

<p>It is primarily designed to make the <em>markup</em> easy to read. Because <a href="https://libraries.mit.edu/150books/2011/05/11/1985/">according to the experts</a>:</p>

<blockquote><p>A computer language is not just a way of getting a computer to perform operations but rather … it is a novel formal medium for expressing ideas about methodology. Thus, programs must be written for people to read, and only incidentally for machines to execute.</p></blockquote>

<p>I'm <em>fairly</em> sure this all works properly. But feel free to argue in the comments or <a href="https://gitlab.com/edent/pretty-print-html-using-php/">send me a pull request</a>.</p>

<p>Here's how it works.</p>

<h2 id="when-is-an-element-not-an-element-when-it-is-a-void"><a href="https://shkspr.mobi/blog/2025/04/an-opinionated-html-serializer-for-php-8-4/#when-is-an-element-not-an-element-when-it-is-a-void">When is an element not an element? When it is a void!</a></h2>

<p>Modern HTML has the concept of "<a href="https://developer.mozilla.org/en-US/docs/Glossary/Void_element">Void Elements</a>". Normally, something like <code>&lt;a&gt;</code> <em>must</em> eventually be followed by a closing <code>&lt;/a&gt;</code>.  But Void Elements don't need closing.</p>

<p>This keeps a list of elements which must not be explicitly closed.</p>

<pre><code class="language-php">$void_elements = [
    "area",
    "base",
    "br",
    "col",
    "embed",
    "hr",
    "img",
    "input",
    "link",
    "meta",
    "param",
    "source",
    "track",
    "wbr",
];
</code></pre>

<h2 id="tabs-%f0%9f%86%9a-space"><a href="https://shkspr.mobi/blog/2025/04/an-opinionated-html-serializer-for-php-8-4/#tabs-%f0%9f%86%9a-space">Tabs 🆚 Space</a></h2>

<p>Tabs, obviously. Users can set their tab width to their personal preference and it won't get confused with semantically significant whitespace.</p>

<pre><code class="language-php">$indent_character = "\t";
</code></pre>

<h2 id="setting-up-the-dom"><a href="https://shkspr.mobi/blog/2025/04/an-opinionated-html-serializer-for-php-8-4/#setting-up-the-dom">Setting up the DOM</a></h2>

<p>The new HTMLDocument should be broadly familiar to anyone who has used the previous one.</p>

<pre><code class="language-php">$html = '&lt;html lang="en-GB"&gt;&lt;head&gt;&lt;title id="something"&gt;Test&lt;/title&gt;&lt;/head&gt;&lt;body&gt;&lt;h1 class="top upper"&gt;Testing&lt;/h1&gt;&lt;main&gt;&lt;p&gt;Some &lt;em&gt;HTML&lt;/em&gt; and an &lt;img src="example.png" alt="Alternate Text"&gt;&lt;/p&gt;Text not in an element&lt;ol&gt;&lt;li&gt;List&lt;/li&gt;&lt;li&gt;Another list&lt;/li&gt;&lt;/ol&gt;&lt;/main&gt;&lt;/body&gt;&lt;/html&gt;&gt;'
$dom = Dom\HTMLDocument::createFromString( $html, LIBXML_NOERROR, "UTF-8" );
</code></pre>

<p>This automatically adds <code>&lt;head&gt;</code> and <code>&lt;body&gt;</code> elements. If you don't want that, use the <a href="https://www.php.net/manual/en/libxml.constants.php#constant.libxml-html-noimplied"><code>LIBXML_HTML_NOIMPLIED</code> flag</a>:</p>

<pre><code class="language-php">$dom = Dom\HTMLDocument::createFromString( $html, LIBXML_NOERROR | LIBXML_HTML_NOIMPLIED, "UTF-8" );
</code></pre>

<h2 id="to-quote-or-not-to-quote"><a href="https://shkspr.mobi/blog/2025/04/an-opinionated-html-serializer-for-php-8-4/#to-quote-or-not-to-quote">To Quote or Not To Quote?</a></h2>

<p>Traditionally, HTML attributes needed quotes:</p>

<pre><code class="language-html">&lt;img src="example.png" class="avatar no-border" id="user-123"&gt;
</code></pre>

<p>Modern HTML allows those attributes to be <em>un</em>quoted as long as they don't contain <a href="https://infra.spec.whatwg.org/#ascii-whitespace">ASCII Whitespace</a> or <a href="https://html.spec.whatwg.org/multipage/syntax.html#unquoted">certain other characters</a></p>

<p>For example, the above becomes:</p>

<pre><code class="language-html">&lt;img src=example.png class="avatar no-border" id=user-123&gt;
</code></pre>

<p>This function looks for the presence of those characters:</p>

<pre><code class="language-php">function value_unquoted( $haystack )
{
    //  Must not contain specific characters

    $needles = [ 
        //  https://infra.spec.whatwg.org/#ascii-whitespace
        "\t", "\n", "\f", "\n", " ", 
        //  https://html.spec.whatwg.org/multipage/syntax.html#unquoted 
        "\"", "'", "=", "&lt;", "&gt;", "`" ];
    foreach ( $needles as $needle )
    {
        if ( str_contains( $haystack, $needle ) )
        {
            return false;
        }
    }
    //  Must not be null
    if ( $haystack == null ) { return false; }
    return true;
}
</code></pre>

<h2 id="re-re-re-recursion"><a href="https://shkspr.mobi/blog/2025/04/an-opinionated-html-serializer-for-php-8-4/#re-re-re-recursion">Re-re-re-recursion</a></h2>

<p>I've tried to document this as best I can.</p>

<p>It traverses the DOM tree, printing out correctly indented opening elements and their attributes. If there's text content, that's printed. If an element needs closing, that's printed with the appropriate indentation.</p>

<pre><code class="language-php">function serializeHTML( $node, $treeIndex = 0, $output = "")
{
    global $indent_character, $preserve_internal_whitespace, $void_elements;

    //  Manually add the doctype to start.
    if ( $output == "" ) {
        $output .= "&lt;!doctype html&gt;\n";
    }

    if( property_exists( $node, "localName" ) ) {
        //  This is an Element.

        //  Get all the Attributes (id, class, src, &amp;c.).
        $attributes = "";
        if ( property_exists($node, "attributes")) {
            foreach( $node-&gt;attributes as $attribute ) {
                $value = $attribute-&gt;nodeValue;
                //  Only add " if the value contains specific characters.
                $quote = value_unquoted( $value ) ? "" : "\"";

                $attributes .= " {$attribute-&gt;nodeName}={$quote}{$value}{$quote}";
            }
        }

        //  Print the opening element and all attributes.
        $output .= "&lt;{$node-&gt;localName}{$attributes}&gt;";

    } else if( property_exists( $node, "nodeName" ) &amp;&amp;  $node-&gt;nodeName == "#comment" ) {
        //  Comment
        $output .= "&lt;!-- {$node-&gt;textContent} --&gt;";
    }

    //  Increase indent.
    $treeIndex++;
    $tabStart = "\n" . str_repeat( $indent_character, $treeIndex ); 
    $tabEnd   = "\n" . str_repeat( $indent_character, $treeIndex - 1);

    //  Does this node have children?
    if( property_exists( $node, "childElementCount" ) &amp;&amp; $node-&gt;childElementCount &gt; 0 ) {

        //  Loop through the children.
        $i=0;
        while( $childNode = $node-&gt;childNodes-&gt;item( $i++ ) ) {

            //  Is this a text node?
            if ($childNode-&gt;nodeType == 3 ) {
                //  Only print output if there's no HTML inside the content.
                //  Ignore Void Elements.
                if ( 
                      !str_contains( $childNode-&gt;textContent, "&lt;" ) &amp;&amp; 
                    property_exists( $childNode, "localName" ) &amp;&amp; 
                          !in_array( $childNode-&gt;localName, $void_elements ) ) 
                {
                    $output .= $tabStart . $childNode-&gt;textContent;
                }
            } else {
                $output .= $tabStart;
            }

            //  Recursively indent all children.
            $output = serializeHTML( $childNode, $treeIndex, $output );
        };

        //  Suffix with a "\n" and a suitable number of "\t"s.
        $output .= "{$tabEnd}"; 

    } else if ( property_exists( $node, "childElementCount" ) &amp;&amp; property_exists( $node, "innerHTML" ) ) {
        //  If there are no children and the node contains content, print the contents.
        $output .= $node-&gt;innerHTML;
    }

    //  Close the element, unless it is a void.
    if( property_exists( $node, "localName" ) &amp;&amp; !in_array( $node-&gt;localName, $void_elements ) ) {
        $output .= "&lt;/{$node-&gt;localName}&gt;";
    }

    //  Return a string of fully indented HTML.
    return $output;
}
</code></pre>

<h2 id="print-it-out"><a href="https://shkspr.mobi/blog/2025/04/an-opinionated-html-serializer-for-php-8-4/#print-it-out">Print it out</a></h2>

<p>The serialized string hardcodes the <code>&lt;!doctype html&gt;</code> - which is probably fine.  The full HTML is shown with:</p>

<pre><code class="language-php">echo serializeHTML( $dom-&gt;documentElement );
</code></pre>

<h2 id="next-steps"><a href="https://shkspr.mobi/blog/2025/04/an-opinionated-html-serializer-for-php-8-4/#next-steps">Next Steps</a></h2>

<p>Please <a href="https://gitlab.com/edent/pretty-print-html-using-php/">raise any issues on GitLab</a> or leave a comment.</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=59322&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2025/04/an-opinionated-html-serializer-for-php-8-4/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title><![CDATA[Pretty Print HTML using PHP 8.4's new HTML DOM]]></title>
		<link>https://shkspr.mobi/blog/2025/03/pretty-print-html-using-php-8-4s-new-html-dom/</link>
					<comments>https://shkspr.mobi/blog/2025/03/pretty-print-html-using-php-8-4s-new-html-dom/#comments</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Mon, 31 Mar 2025 11:34:54 +0000</pubDate>
				<category><![CDATA[/etc/]]></category>
		<category><![CDATA[HowTo]]></category>
		<category><![CDATA[HTML]]></category>
		<category><![CDATA[php]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=59238</guid>

					<description><![CDATA[Those whom the gods would send mad, they first teach recursion.  PHP 8.4 introduces a new Dom\HTMLDocument class it is a modern HTML5 replacement for the ageing XHTML based DOMDocument.  You can read more about how it works - the short version is that it reads and correctly sanitises HTML and turns it into a nested object. Hurrah!  The one thing it doesn&#039;t do is pretty-printing.  When you call…]]></description>
										<content:encoded><![CDATA[<p>Those whom the gods would send mad, they first teach recursion.</p>

<p>PHP 8.4 introduces a new <a href="https://www.php.net/manual/en/class.dom-htmldocument.php">Dom\HTMLDocument class</a> it is a modern HTML5 replacement for the ageing XHTML based DOMDocument.  You can <a href="https://wiki.php.net/rfc/domdocument_html5_parser">read more about how it works</a> - the short version is that it reads and correctly sanitises HTML and turns it into a nested object. Hurrah!</p>

<p>The one thing it <em>doesn't</em> do is pretty-printing.  When you call <code>$dom-&gt;saveHTML()</code> it will output something like:</p>

<pre><code class="language-html">&lt;html lang="en-GB"&gt;&lt;head&gt;&lt;title&gt;Test&lt;/title&gt;&lt;/head&gt;&lt;body&gt;&lt;h1&gt;Testing&lt;/h1&gt;&lt;main&gt;&lt;p&gt;Some &lt;em&gt;HTML&lt;/em&gt; and an &lt;img src="example.png"&gt;&lt;/p&gt;&lt;ol&gt;&lt;li&gt;List&lt;/li&gt;&lt;li&gt;Another list&lt;/li&gt;&lt;/ol&gt;&lt;/main&gt;&lt;/body&gt;&lt;/html&gt;
</code></pre>

<p>Perfect for a computer to read, but slightly tricky for humans.</p>

<p>As was <a href="https://libraries.mit.edu/150books/2011/05/11/1985/">written by the sages</a>:</p>

<blockquote><p>A computer language is not just a way of getting a computer to perform operations but rather … it is a novel formal medium for expressing ideas about methodology. Thus, programs must be written for people to read, and only incidentally for machines to execute.</p></blockquote>

<p>HTML <em>is</em> a programming language. Making markup easy to read for humans is a fine and noble goal.  The aim is to turn the single line above into something like:</p>

<pre><code class="language-html">&lt;html lang="en-GB"&gt;
    &lt;head&gt;
        &lt;title&gt;Test&lt;/title&gt;
    &lt;/head&gt;
    &lt;body&gt;
        &lt;h1&gt;Testing&lt;/h1&gt;
        &lt;main&gt;
            &lt;p&gt;Some &lt;em&gt;HTML&lt;/em&gt; and an &lt;img src="example.png"&gt;&lt;/p&gt;
            &lt;ol&gt;
                &lt;li&gt;List&lt;/li&gt;
                &lt;li&gt;Another list&lt;/li&gt;
            &lt;/ol&gt;
        &lt;/main&gt;
    &lt;/body&gt;
&lt;/html&gt;
</code></pre>

<p>Cor! That's much better!</p>

<p>I've cobbled together a script which is <em>broadly</em> accurate. There are a million-and-one edge cases and about twice as many personal preferences. This aims to be quick, simple, and basically fine. I am indebted to <a href="https://topic.alibabacloud.com/a/php-domdocument-recursive-formatting-of-indented-html-documents_4_86_30953142.html">this random Chinese script</a> and to <a href="https://github.com/wasinger/html-pretty-min">html-pretty-min</a>.</p>

<h2 id="step-by-step"><a href="https://shkspr.mobi/blog/2025/03/pretty-print-html-using-php-8-4s-new-html-dom/#step-by-step">Step By Step</a></h2>

<p>I'm going to walk through how everything works. This is as much for my benefit as for yours! This is beta code. It sorta-kinda-works for me. Think of it as a first pass at an attempt to prove that something can be done. Please don't use it in production!</p>

<h3 id="setting-up-the-dom"><a href="https://shkspr.mobi/blog/2025/03/pretty-print-html-using-php-8-4s-new-html-dom/#setting-up-the-dom">Setting up the DOM</a></h3>

<p>The new HTMLDocument should be broadly familiar to anyone who has used the previous one.</p>

<pre><code class="language-php">$html = '&lt;html lang="en-GB"&gt;&lt;head&gt;&lt;title&gt;Test&lt;/title&gt;&lt;/head&gt;&lt;body&gt;&lt;h1&gt;Testing&lt;/h1&gt;&lt;main&gt;&lt;p&gt;Some &lt;em&gt;HTML&lt;/em&gt; and an &lt;img src="example.png"&gt;&lt;/p&gt;&lt;ol&gt;&lt;li&gt;List&lt;li&gt;Another list&lt;/body&gt;&lt;/html&gt;'
$dom = Dom\HTMLDocument::createFromString( $html, LIBXML_NOERROR, "UTF-8" );
</code></pre>

<p>This automatically adds <code>&lt;head&gt;</code> and <code>&lt;body&gt;</code> elements. If you don't want that, use the <a href="https://www.php.net/manual/en/libxml.constants.php#constant.libxml-html-noimplied"><code>LIBXML_HTML_NOIMPLIED</code> flag</a>:</p>

<pre><code class="language-php">$dom = Dom\HTMLDocument::createFromString( $html, LIBXML_NOERROR | LIBXML_HTML_NOIMPLIED, "UTF-8" );
</code></pre>

<h3 id="where-not-to-indent"><a href="https://shkspr.mobi/blog/2025/03/pretty-print-html-using-php-8-4s-new-html-dom/#where-not-to-indent">Where <em>not</em> to indent</a></h3>

<p>There are certain elements whose contents shouldn't be pretty-printed because it might change the meaning or layout of the text. For example, in a paragraph:</p>

<pre><code class="language-html">&lt;p&gt;
    Some 
    &lt;em&gt;
        HT
        &lt;strong&gt;M&lt;/strong&gt;
        L
    &lt;/em&gt;
&lt;/p&gt;
</code></pre>

<p>I've picked these elements from <a href="https://html.spec.whatwg.org/multipage/text-level-semantics.html#text-level-semantics">text-level semantics</a> and a few others which I consider sensible. Feel free to edit this list if you want.</p>

<pre><code class="language-php">$preserve_internal_whitespace = [
    "a", 
    "em", "strong", "small", 
    "s", "cite", "q", 
    "dfn", "abbr", 
    "ruby", "rt", "rp", 
    "data", "time", 
    "pre", "code", "var", "samp", "kbd", 
    "sub", "sup", 
    "b", "i", "mark", "u",
    "bdi", "bdo", 
    "span",
    "h1", "h2", "h3", "h4", "h5", "h6",
    "p",
    "li",
    "button", "form", "input", "label", "select", "textarea",
];
</code></pre>

<p>The function has an option to <em>force</em> indenting every time it encounters an element.</p>

<h3 id="tabs-%f0%9f%86%9a-spaces"><a href="https://shkspr.mobi/blog/2025/03/pretty-print-html-using-php-8-4s-new-html-dom/#tabs-%f0%9f%86%9a-spaces">Tabs 🆚 Spaces</a></h3>

<p>Tabs, obviously. Users can set their tab width to their personal preference and it won't get confused with semantically significant whitespace.</p>

<pre><code class="language-php">$indent_character = "\t";
</code></pre>

<h3 id="recursive-function"><a href="https://shkspr.mobi/blog/2025/03/pretty-print-html-using-php-8-4s-new-html-dom/#recursive-function">Recursive Function</a></h3>

<p>This function reads through each node in the HTML tree. If the node should be indented, the function inserts a new node with the requisite number of tabs before the existing node. It also adds a suffix node to indent the next line appropriately. It then goes through the node's children and recursively repeats the process.</p>

<p><strong>This modifies the existing Document</strong>.</p>

<pre><code class="language-php">function prettyPrintHTML( $node, $treeIndex = 0, $forceWhitespace = false )
{    
    global $indent_character, $preserve_internal_whitespace;

    //  If this node contains content which shouldn't be separately indented
    //  And if whitespace is not forced
    if ( property_exists( $node, "localName" ) &amp;&amp; in_array( $node-&gt;localName, $preserve_internal_whitespace ) &amp;&amp; !$forceWhitespace ) {
        return;
    }

    //  Does this node have children?
    if( property_exists( $node, "childElementCount" ) &amp;&amp; $node-&gt;childElementCount &gt; 0 ) {
        //  Move in a step
        $treeIndex++;
        $tabStart = "\n" . str_repeat( $indent_character, $treeIndex ); 
        $tabEnd   = "\n" . str_repeat( $indent_character, $treeIndex - 1);

        //  Remove any existing indenting at the start of the line
        $node-&gt;innerHTML = trim($node-&gt;innerHTML);

        //  Loop through the children
        $i=0;

        while( $childNode = $node-&gt;childNodes-&gt;item( $i++ ) ) {
            //  Was the *previous* sibling a text-only node?
            //  If so, don't add a previous newline
            if ( $i &gt; 0 ) {
                $olderSibling = $node-&gt;childNodes-&gt;item( $i-1 );

                if ( $olderSibling-&gt;nodeType == XML_TEXT_NODE  &amp;&amp; !$forceWhitespace ) {
                    $i++;
                    continue;
                }
                $node-&gt;insertBefore( $node-&gt;ownerDocument-&gt;createTextNode( $tabStart ), $childNode );
            }
            $i++; 
            //  Recursively indent all children
            prettyPrintHTML( $childNode, $treeIndex, $forceWhitespace );
        };

        //  Suffix with a node which has "\n" and a suitable number of "\t"
        $node-&gt;appendChild( $node-&gt;ownerDocument-&gt;createTextNode( $tabEnd ) ); 
    }
}
</code></pre>

<h3 id="printing-it-out"><a href="https://shkspr.mobi/blog/2025/03/pretty-print-html-using-php-8-4s-new-html-dom/#printing-it-out">Printing it out</a></h3>

<p>First, call the function.  <strong>This modifies the existing Document</strong>.</p>

<pre><code class="language-php">prettyPrintHTML( $dom-&gt;documentElement );
</code></pre>

<p>Then call <a href="https://www.php.net/manual/en/dom-htmldocument.savehtml.php">the normal <code>saveHtml()</code> serialiser</a>:</p>

<pre><code class="language-php">echo $dom-&gt;saveHTML();
</code></pre>

<p>Note - this does not print a <code>&lt;!doctype html&gt;</code> - you'll need to include that manually if you're intending to use the entire document.</p>

<h2 id="licence"><a href="https://shkspr.mobi/blog/2025/03/pretty-print-html-using-php-8-4s-new-html-dom/#licence">Licence</a></h2>

<p>I consider the above too trivial to licence - but you may treat it as MIT if that makes you happy.</p>

<h2 id="thoughts-comments-next-steps"><a href="https://shkspr.mobi/blog/2025/03/pretty-print-html-using-php-8-4s-new-html-dom/#thoughts-comments-next-steps">Thoughts? Comments? Next steps?</a></h2>

<p>I've not written any formal tests, nor have I measured its speed, there may be subtle-bugs, and catastrophic errors. I know it doesn't work well if the HTML is already indented. It mysteriously prints double newlines for some unfathomable reason.</p>

<p>I'd love to know if you find this useful. Please <a href="https://gitlab.com/edent/pretty-print-html-using-php/">get involved on GitLab</a> or drop a comment here.</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=59238&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2025/03/pretty-print-html-using-php-8-4s-new-html-dom/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
			</item>
		<item>
		<title><![CDATA[Create a Table of Contents based on HTML Heading Elements]]></title>
		<link>https://shkspr.mobi/blog/2025/03/create-a-table-of-contents-based-on-html-heading-elements/</link>
					<comments>https://shkspr.mobi/blog/2025/03/create-a-table-of-contents-based-on-html-heading-elements/#comments</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Wed, 26 Mar 2025 12:34:31 +0000</pubDate>
				<category><![CDATA[/etc/]]></category>
		<category><![CDATA[HowTo]]></category>
		<category><![CDATA[HTML]]></category>
		<category><![CDATA[php]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=59105</guid>

					<description><![CDATA[Some of my blog posts are long. They have lots of HTML headings like &#60;h2&#62; and &#60;h3&#62;. Say, wouldn&#039;t it be super-awesome to have something magically generate a Table of Contents?  I&#039;ve built a utility which runs server-side using PHP. Give it some HTML and it will construct a Table of Contents.  Let&#039;s dive in!  Table of ContentsBackgroundHeading ExampleWhat is the purpose of a table of…]]></description>
										<content:encoded><![CDATA[<p>Some of my blog posts are long<sup id="fnref:too"><a href="https://shkspr.mobi/blog/2025/03/create-a-table-of-contents-based-on-html-heading-elements/#fn:too" class="footnote-ref" title="Too long really, but who can be bothered to edit?" role="doc-noteref">0</a></sup>. They have lots of HTML headings like <code>&lt;h2&gt;</code> and <code>&lt;h3&gt;</code>. Say, wouldn't it be super-awesome to have something magically generate a Table of Contents?  I've built a utility which runs server-side using PHP. Give it some HTML and it will construct a Table of Contents.</p>

<p>Let's dive in!</p>

<p></p><nav role="doc-toc"><menu><li><h2 id="table-of-contents"><a href="https://shkspr.mobi/blog/2025/03/create-a-table-of-contents-based-on-html-heading-elements/#table-of-contents">Table of Contents</a></h2><menu><li><a href="https://shkspr.mobi/blog/2025/03/create-a-table-of-contents-based-on-html-heading-elements/#background">Background</a><menu><li><a href="https://shkspr.mobi/blog/2025/03/create-a-table-of-contents-based-on-html-heading-elements/#heading-example">Heading Example</a></li><li><a href="https://shkspr.mobi/blog/2025/03/create-a-table-of-contents-based-on-html-heading-elements/#what-is-the-purpose-of-a-table-of-contents">What is the purpose of a table of contents?</a></li></menu></li><li><a href="https://shkspr.mobi/blog/2025/03/create-a-table-of-contents-based-on-html-heading-elements/#code">Code</a><menu><li><a href="https://shkspr.mobi/blog/2025/03/create-a-table-of-contents-based-on-html-heading-elements/#load-the-html">Load the HTML</a><menu><li><a href="https://shkspr.mobi/blog/2025/03/create-a-table-of-contents-based-on-html-heading-elements/#using-php-8-4">Using PHP 8.4</a></li></menu></li><li><a href="https://shkspr.mobi/blog/2025/03/create-a-table-of-contents-based-on-html-heading-elements/#parse-the-html">Parse the HTML</a><menu><li><a href="https://shkspr.mobi/blog/2025/03/create-a-table-of-contents-based-on-html-heading-elements/#php-8-4-queryselectorall">PHP 8.4 querySelectorAll</a></li></menu></li><li><a href="https://shkspr.mobi/blog/2025/03/create-a-table-of-contents-based-on-html-heading-elements/#recursive-looping">Recursive looping</a><menu><li><a href="#"></a><menu><li><a href="#"></a><menu><li><a href="https://shkspr.mobi/blog/2025/03/create-a-table-of-contents-based-on-html-heading-elements/#missing-content">Missing content</a></li></menu></li></menu></li></menu></li><li><a href="https://shkspr.mobi/blog/2025/03/create-a-table-of-contents-based-on-html-heading-elements/#converting-to-html">Converting to HTML</a></li></menu></li><li><a href="https://shkspr.mobi/blog/2025/03/create-a-table-of-contents-based-on-html-heading-elements/#semantic-correctness">Semantic Correctness</a><menu><li><a href="https://shkspr.mobi/blog/2025/03/create-a-table-of-contents-based-on-html-heading-elements/#epub-example">ePub Example</a></li><li><a href="https://shkspr.mobi/blog/2025/03/create-a-table-of-contents-based-on-html-heading-elements/#split-the-difference-with-a-menu">Split the difference with a menu</a></li><li><a href="https://shkspr.mobi/blog/2025/03/create-a-table-of-contents-based-on-html-heading-elements/#where-should-the-heading-go">Where should the heading go?</a></li></menu></li><li><a href="https://shkspr.mobi/blog/2025/03/create-a-table-of-contents-based-on-html-heading-elements/#conclusion">Conclusion</a></li></menu></li></menu></nav><p></p>

<h2 id="background"><a href="https://shkspr.mobi/blog/2025/03/create-a-table-of-contents-based-on-html-heading-elements/#background">Background</a></h2>

<p>HTML has <a href="https://html.spec.whatwg.org/multipage/sections.html#the-h1,-h2,-h3,-h4,-h5,-and-h6-elements">six levels of headings</a><sup id="fnref:beatles"><a href="https://shkspr.mobi/blog/2025/03/create-a-table-of-contents-based-on-html-heading-elements/#fn:beatles" class="footnote-ref" title="Although Paul McCartney disagrees." role="doc-noteref">1</a></sup> - <code>&lt;h1&gt;</code> is the main heading for content, <code>&lt;h2&gt;</code> is a sub-heading, <code>&lt;h3&gt;</code> is a sub-sub-heading, and so on.</p>

<p>Together, they form a hierarchy.</p>

<h3 id="heading-example"><a href="https://shkspr.mobi/blog/2025/03/create-a-table-of-contents-based-on-html-heading-elements/#heading-example">Heading Example</a></h3>

<p>HTML headings are expected to be used a bit like this (I've nested this example so you can see the hierarchy):</p>

<pre><code class="language-html">&lt;h1&gt;The Theory of Everything&lt;/h1&gt;
   &lt;h2&gt;Experiments&lt;/h2&gt;
      &lt;h3&gt;First attempt&lt;/h3&gt;
      &lt;h3&gt;Second attempt&lt;/h3&gt;
   &lt;h2&gt;Equipment&lt;/h2&gt;
      &lt;h3&gt;Broken equipment&lt;/h3&gt;
         &lt;h4&gt;Repaired equipment&lt;/h4&gt;
      &lt;h3&gt;Working Equipment&lt;/h3&gt;
…
</code></pre>

<h3 id="what-is-the-purpose-of-a-table-of-contents"><a href="https://shkspr.mobi/blog/2025/03/create-a-table-of-contents-based-on-html-heading-elements/#what-is-the-purpose-of-a-table-of-contents">What is the purpose of a table of contents?</a></h3>

<p>Wayfinding. On a long document, it is useful to be able to see an overview of the contents and then immediately navigate to the desired location.</p>

<p>The ToC has to provide a hierarchical view of all the headings and then link to them.</p>

<h2 id="code"><a href="https://shkspr.mobi/blog/2025/03/create-a-table-of-contents-based-on-html-heading-elements/#code">Code</a></h2>

<p>I'm running this as part of a WordPress plugin. You may need to adapt it for your own use.</p>

<h3 id="load-the-html"><a href="https://shkspr.mobi/blog/2025/03/create-a-table-of-contents-based-on-html-heading-elements/#load-the-html">Load the HTML</a></h3>

<p>This uses <a href="https://www.php.net/manual/en/class.domdocument.php">PHP's DOMdocument</a>. I've manually added a <code>UTF-8</code> header so that Unicode is preserved. If your HTML already has that, you can remove the addition from the code.</p>

<pre><code class="language-php">//  Load it into a DOM for manipulation
$dom = new DOMDocument();
//  Suppress warnings about HTML errors
libxml_use_internal_errors( true );
//  Force UTF-8 support
$dom-&gt;loadHTML( "&lt;!DOCTYPE html&gt;&lt;html&gt;&lt;head&gt;&lt;meta charset=UTF-8&gt;&lt;/head&gt;&lt;body&gt;" . $content, LIBXML_NOERROR | LIBXML_NOWARNING );
libxml_clear_errors();
</code></pre>

<h4 id="using-php-8-4"><a href="https://shkspr.mobi/blog/2025/03/create-a-table-of-contents-based-on-html-heading-elements/#using-php-8-4">Using PHP 8.4</a></h4>

<p>The latest version of PHP contains <a href="https://www.php.net/manual/en/class.dom-htmldocument.php">a better HTML-aware DOM</a>. It can be used like this:</p>

<pre><code class="language-php">$dom = Dom\HTMLDocument::createFromString( $content, LIBXML_NOERROR , "UTF-8" );
</code></pre>

<h3 id="parse-the-html"><a href="https://shkspr.mobi/blog/2025/03/create-a-table-of-contents-based-on-html-heading-elements/#parse-the-html">Parse the HTML</a></h3>

<p>It is not a good idea to use Regular Expressions to parse HTML - no matter how well-formed you think it is. Instead, use <a href="https://www.php.net/manual/en/class.domxpath.php">XPath</a> to extract data from the DOM.</p>

<pre><code class="language-php">//  Parse with XPath
$xpath = new DOMXPath( $dom );

//  Look for all h* elements
$headings = $xpath-&gt;query( "//h1 | //h2 | //h3 | //h4 | //h5 | //h6" );
</code></pre>

<p>This produces an array with all the heading elements in the order they appear in the document.</p>

<h4 id="php-8-4-queryselectorall"><a href="https://shkspr.mobi/blog/2025/03/create-a-table-of-contents-based-on-html-heading-elements/#php-8-4-queryselectorall">PHP 8.4 querySelectorAll</a></h4>

<p>Rather than using XPath, modern versions of PHP can use <a href="https://www.php.net/manual/en/dom-parentnode.queryselectorall.php">querySelectorAll</a>:</p>

<pre><code class="language-php">$headings = $dom-&gt;querySelectorAll( "h1, h2, h3, h4, h5, h6" );
</code></pre>

<h3 id="recursive-looping"><a href="https://shkspr.mobi/blog/2025/03/create-a-table-of-contents-based-on-html-heading-elements/#recursive-looping">Recursive looping</a></h3>

<p>This is a bit knotty. It produces a nested array of the elements, their <code>id</code> attributes, and text.  The end result should be something like:</p>

<pre><code class="language-_">array (
  array (
    'text' =&gt; '&lt;h2&gt;Table of Contents&lt;/h2&gt;',
    'raw' =&gt; true,
  ),
  array (
    'text' =&gt; 'The Theory of Everything',
    'id' =&gt; 'the-theory-of-everything',
    'children' =&gt; 
    array (
      array (
        'text' =&gt; 'Experiments',
        'id' =&gt; 'experiments',
        'children' =&gt; 
        array (
          array (
            'text' =&gt; 'First attempt',
            'id' =&gt; 'first-attempt',
          ),
          array (
            'text' =&gt; 'Second attempt',
            'id' =&gt; 'second-attempt',
</code></pre>

<p>The code is moderately complex, but I've commented it as best as I can.</p>

<pre><code class="language-php">//  Start an array to hold all the headings in a hierarchy
$root = [];
//  Add an h2 with the title
$root[] = [
    "text"     =&gt; "&lt;h2&gt;Table of Contents&lt;/h2&gt;", 
    "raw"      =&gt; true, 
    "children" =&gt; []
];

// Stack to track current hierarchy level
$stack = [&amp;$root]; 

//  Loop through the headings
foreach ($headings as $heading) {

    //  Get the information
    //  Expecting &lt;h2 id="something"&gt;Text&lt;/h2&gt;
    $element = $heading-&gt;nodeName;  //  e.g. h2, h3, h4, etc
    $text    = trim( $heading-&gt;textContent );   
    $id      = $heading-&gt;getAttribute( "id" );

    //  h2 becomes 2, h3 becomes 3 etc
    $level = (int) substr($element, 1);

    //  Get data from element
    $node = array( 
        "text"     =&gt; $text, 
        "id"       =&gt; $id , 
        "children" =&gt; [] 
    );

    //  Ensure there are no gaps in the heading hierarchy
    while ( count( $stack ) &gt; $level ) {
        array_pop( $stack );
    }

    //  If a gap exists (e.g., h4 without an immediately preceding h3), create placeholders
    while ( count( $stack ) &lt; $level ) {
        //  What's the last element in the stack?
        $stackSize = count( $stack );
        $lastIndex = count( $stack[ $stackSize - 1] ) - 1;
        if ($lastIndex &lt; 0) {
            //  If there is no previous sibling, create a placeholder parent
            $stack[$stackSize - 1][] = [
                "text"     =&gt; "",   //  This could have some placeholder text to warn the user?
                "children" =&gt; []
            ];
            $stack[] = &amp;$stack[count($stack) - 1][0]['children'];
        } else {
            $stack[] = &amp;$stack[count($stack) - 1][$lastIndex]['children'];
        }
    }

    //  Add the node to the current level
    $stack[count($stack) - 1][] = $node;
    $stack[] = &amp;$stack[count($stack) - 1][count($stack[count($stack) - 1]) - 1]['children'];
}
</code></pre>

<h6 id="missing-content"><a href="https://shkspr.mobi/blog/2025/03/create-a-table-of-contents-based-on-html-heading-elements/#missing-content">Missing content</a></h6>

<p>The trickiest part of the above is dealing with missing elements in the hierarchy. If you're <em>sure</em> you don't ever skip from an <code>&lt;h3&gt;</code> to an <code>&lt;h6&gt;</code>, you can get rid of some of the code dealing with that edge case.</p>

<h3 id="converting-to-html"><a href="https://shkspr.mobi/blog/2025/03/create-a-table-of-contents-based-on-html-heading-elements/#converting-to-html">Converting to HTML</a></h3>

<p>OK, there's a hierarchical array, how does it become HTML?</p>

<p>Again, a little bit of recursion:</p>

<pre><code class="language-php">function arrayToHTMLList( $array, $style = "ul" )
{
    $html = "";

    //  Loop through the array
    foreach( $array as $element ) {
        //  Get the data of this element
        $text     = $element["text"];
        $id       = $element["id"];
        $children = $element["children"];
        $raw      = $element["raw"] ?? false;

        if ( $raw ) {
            //  Add it to the HTML without adding an internal link
            $html .= "&lt;li&gt;{$text}";
        } else {
            //  Add it to the HTML
            $html .= "&lt;li&gt;&lt;a href=#{$id}&gt;{$text}&lt;/a&gt;";
        }

        //  If the element has children
        if ( sizeof( $children ) &gt; 0 ) {
            //  Recursively add it to the HTML
            $html .=  "&lt;{$style}&gt;" . arrayToHTMLList( $children, $style ) . "&lt;/{$style}&gt;";
        } 
    }

    return $html;
}
</code></pre>

<h2 id="semantic-correctness"><a href="https://shkspr.mobi/blog/2025/03/create-a-table-of-contents-based-on-html-heading-elements/#semantic-correctness">Semantic Correctness</a></h2>

<p>Finally, what should a table of contents look like in HTML?  There is no <code>&lt;toc&gt;</code> element, so what is most appropriate?</p>

<h3 id="epub-example"><a href="https://shkspr.mobi/blog/2025/03/create-a-table-of-contents-based-on-html-heading-elements/#epub-example">ePub Example</a></h3>

<p>Modern eBooks use the ePub standard which is based on HTML. Here's how <a href="https://kb.daisy.org/publishing/docs/navigation/toc.html">an ePub creates a ToC</a>.</p>

<pre><code class="language-html">&lt;nav role="doc-toc" epub:type="toc" id="toc"&gt;
&lt;h2&gt;Table of Contents&lt;/h2&gt;
&lt;ol&gt;
  &lt;li&gt;
    &lt;a href="s01.xhtml"&gt;A simple link&lt;/a&gt;
  &lt;/li&gt;
  …
&lt;/ol&gt;
&lt;/nav&gt;
</code></pre>

<p>The modern(ish) <code>&lt;nav&gt;</code> element!</p>

<blockquote><p>The nav element represents a section of a page that links to other pages or to parts within the page: a section with navigation links.
<a href="https://html.spec.whatwg.org/multipage/sections.html#the-nav-element">HTML Specification</a></p></blockquote>

<p>But there's a slight wrinkle. The ePub example above use <code>&lt;ol&gt;</code> an ordered list. The HTML example in the spec uses <code>&lt;ul&gt;</code> an <em>un</em>ordered list.</p>

<p>Which is right? Well, that depends on whether you think the contents on your page should be referred to in order or not. There is, however, a secret third way.</p>

<h3 id="split-the-difference-with-a-menu"><a href="https://shkspr.mobi/blog/2025/03/create-a-table-of-contents-based-on-html-heading-elements/#split-the-difference-with-a-menu">Split the difference with a menu</a></h3>

<p>I decided to use <a href="https://developer.mozilla.org/en-US/docs/Web/HTML/Element/menu">the <code>&lt;menu&gt;</code> element</a> for my navigation. It is semantically the same as <code>&lt;ul&gt;</code> but just feels a bit closer to what I expect from navigation. Feel free to argue with me in the comments.</p>

<h3 id="where-should-the-heading-go"><a href="https://shkspr.mobi/blog/2025/03/create-a-table-of-contents-based-on-html-heading-elements/#where-should-the-heading-go">Where should the heading go?</a></h3>

<p>I've put the title of the list into the list itself. That's valid HTML and, if my understanding is correct, should announce itself as the title of the navigation element to screen-readers and the like.</p>

<h2 id="conclusion"><a href="https://shkspr.mobi/blog/2025/03/create-a-table-of-contents-based-on-html-heading-elements/#conclusion">Conclusion</a></h2>

<p>I've used <em>slightly</em> more heading in this post than I would usually, but hopefully the <a href="https://shkspr.mobi/blog/2025/03/create-a-table-of-contents-based-on-html-heading-elements/#table-of-contents">Table of Contents at the top</a> demonstrates how this works.</p>

<p>If you want to reuse this code, I consider it too trivial to licence. But, if it makes you happy, you can treat it as MIT.</p>

<p>Thoughts? Comments? Feedback? Drop a note in the box.</p>

<div id="footnotes" role="doc-endnotes">
<hr>
<ol start="0">

<li id="fn:too">
<p>Too long really, but who can be bothered to edit?&nbsp;<a href="https://shkspr.mobi/blog/2025/03/create-a-table-of-contents-based-on-html-heading-elements/#fnref:too" class="footnote-backref" role="doc-backlink">↩︎</a></p>
</li>

<li id="fn:beatles">
<p>Although <a href="https://www.nme.com/news/music/paul-mccartney-12-1188735">Paul McCartney disagrees</a>.&nbsp;<a href="https://shkspr.mobi/blog/2025/03/create-a-table-of-contents-based-on-html-heading-elements/#fnref:beatles" class="footnote-backref" role="doc-backlink">↩︎</a></p>
</li>

</ol>
</div>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=59105&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2025/03/create-a-table-of-contents-based-on-html-heading-elements/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
			</item>
		<item>
		<title><![CDATA[Change the way dates are presented in WordPress's admin view]]></title>
		<link>https://shkspr.mobi/blog/2025/02/change-the-way-dates-are-presented-in-wordpresss-admin-view/</link>
					<comments>https://shkspr.mobi/blog/2025/02/change-the-way-dates-are-presented-in-wordpresss-admin-view/#comments</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Wed, 26 Feb 2025 12:34:21 +0000</pubDate>
				<category><![CDATA[/etc/]]></category>
		<category><![CDATA[blogging]]></category>
		<category><![CDATA[HowTo]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[WordPress]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=58427</guid>

					<description><![CDATA[WordPress does not respect an admin&#039;s preferred date format.  Here&#039;s how the admin list of posts looks to me:    I don&#039;t want it to look like that. I want it in RFC3339 format.  I know what you&#039;re thinking, just change the default date display - but that only seems to work in some areas of WordPress. It doesn&#039;t change the column-date format.  Here&#039;s what mine is set to:    So that doesn&#039;t work. …]]></description>
										<content:encoded><![CDATA[<p>WordPress does not respect an admin's preferred date format.</p>

<p>Here's how the admin list of posts looks to me:</p>

<img src="https://shkspr.mobi/blog/wp-content/uploads/2025/02/WP-Date-Wrong-fs8.png" alt="Column with the date format separated by slashes." width="420" height="674" class="aligncenter size-full wp-image-58437">

<p>I don't want it to look like that. I want it in RFC3339 format.</p>

<p>I know what you're thinking, <a href="https://wordpress.org/documentation/article/customize-date-and-time-format/">just change the default date display</a> - but that only seems to work in some areas of WordPress. It doesn't change the <code>column-date</code> format.  Here's what mine is set to:</p>

<img src="https://shkspr.mobi/blog/wp-content/uploads/2025/02/WP-date-format-fs8.png" alt="Settings screen showing date format set to dashes." width="940" height="414" class="aligncenter size-full wp-image-58432">

<p>So that doesn't work.</p>

<p>Instead, you need to use <a href="https://developer.wordpress.org/reference/hooks/post_date_column_time/">the slightly obscure <code>post_date_column_time</code> filter</a></p>

<p>Add this to your theme's <code>functions.php</code>:</p>

<pre><code class="language-php">//  Admin view - change date format
function rfc3339_post_date_time( $time, $post ) {
    //  Modify the default time format
    $rfc3339_time = date( "Y-m-d H:i", strtotime( $post-&gt;post_date ) );
    return $rfc3339_time;
}
add_filter( "post_date_column_time", "rfc3339_post_date_time", 10, 2 );
</code></pre>

<p>And, hey presto, your date column will look like this:
<img src="https://shkspr.mobi/blog/wp-content/uploads/2025/02/WP-Date-Rigth-fs8.png" alt="Column with the date format separated by dashes." width="420" height="670" class="aligncenter size-full wp-image-58438"></p>

<p>Obviously, you can change that code to whichever date format you prefer.</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=58427&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2025/02/change-the-way-dates-are-presented-in-wordpresss-admin-view/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
			</item>
		<item>
		<title><![CDATA[Graphing the connections between my blog posts]]></title>
		<link>https://shkspr.mobi/blog/2025/01/graphing-the-connections-between-my-blog-posts/</link>
					<comments>https://shkspr.mobi/blog/2025/01/graphing-the-connections-between-my-blog-posts/#comments</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Thu, 09 Jan 2025 12:34:56 +0000</pubDate>
				<category><![CDATA[/etc/]]></category>
		<category><![CDATA[blogging]]></category>
		<category><![CDATA[graphs]]></category>
		<category><![CDATA[HTML]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[WordPress]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=55159</guid>

					<description><![CDATA[I love ripping off good ideas from other people&#039;s blogs.  I was reading Alvaro Graves-Fuenzalida&#039;s blog when I saw this nifty little force-directed graph:    When zoomed in, it shows the relation between posts and tags.    In this case, I can see that the posts about Small Gods and Pyramids both share the tags of Discworld, Fantasy, and Book Review. But only Small Gods has the tag of Religion. …]]></description>
										<content:encoded><![CDATA[<p>I love ripping off good ideas from other people's blogs.  I was reading <a href="https://stuff.graves.cl/posts/2024-03-05_20_41-book-review---small-gods-by-terry-pratchett.html">Alvaro Graves-Fuenzalida's blog</a> when I saw this nifty little force-directed graph:</p>

<img src="https://shkspr.mobi/blog/wp-content/uploads/2024/12/Graph-fs8.png" alt="A graph of interconnected nodes." width="800" height="600" class="aligncenter size-full wp-image-55160">

<p>When zoomed in, it shows the relation between posts and tags.</p>

<img src="https://shkspr.mobi/blog/wp-content/uploads/2024/12/Graph-detail-fs8.png" alt="Text labels on the nodes show that the two of the posts share a common tag." width="1600" height="1200" class="aligncenter size-full wp-image-55161">

<p>In this case, I can see that the posts about Small Gods and Pyramids both share the tags of Discworld, Fantasy, and Book Review. But only Small Gods has the tag of Religion.</p>

<p>Isn't that cool! It is a native feature of <a href="https://quartz.jzhao.xyz/features/graph-view">Quartz's GraphView</a>. How can I build something like that for my WordPress blog?</p>

<h2 id="aim"><a href="https://shkspr.mobi/blog/2025/01/graphing-the-connections-between-my-blog-posts/#aim">Aim</a></h2>

<p>Create an interactive graph which shows the relationship between a post, its links, and their tags.</p>

<p>It will end up looking something like this:</p>

<img src="https://shkspr.mobi/blog/wp-content/uploads/2024/12/fdg.png" alt="A force directed graph showing how four different posts link to each other and how their hashtags relate." width="1188" height="1088" class="aligncenter size-full wp-image-55169">

<p>You can <a href="https://gitlab.com/edent/blog-theme/-/blob/master/includes/graph.php">get the code</a> or follow along to see how it works.</p>

<p>This is a multi-stage process. Let's begin!</p>

<h2 id="what-we-need"><a href="https://shkspr.mobi/blog/2025/01/graphing-the-connections-between-my-blog-posts/#what-we-need">What We Need</a></h2>

<p>When on a single Post, we need the following:</p>

<ul>
<li>The tags assigned to that Post.</li>
<li>Internal links back to that Post.</li>
<li>Internal links from that Post.</li>
<li>The tags assigned to links to and from that Post.</li>
</ul>

<h2 id="tags-assigned-to-that-post"><a href="https://shkspr.mobi/blog/2025/01/graphing-the-connections-between-my-blog-posts/#tags-assigned-to-that-post">Tags assigned to that Post.</a></h2>

<p>This is pretty easy!  Using the <a href="https://developer.wordpress.org/reference/functions/get_the_tag_list/"><code>get_the_tag_list()</code> function</a> we can, unsurprisingly, get all the tags associated with a post.</p>

<pre><code class="language-php">$post_tags_text = get_the_tag_list( "", ",", $ID );
$post_tags_array = explode( "," , $post_tags_text );
</code></pre>

<p>That just gets the list of tag names. If we want the tag IDs as well, we need to use <a href="https://developer.wordpress.org/reference/functions/get_the_tags/">the <code>get_the_tags()</code> function</a>.</p>

<pre><code class="language-php">$post_tags = get_the_tags($ID);
$tags = array();
foreach($post_tags as $tag) {
    $tags[$tag-&gt;term_id] = $tag-&gt;name; 
}
</code></pre>

<h2 id="backlinks"><a href="https://shkspr.mobi/blog/2025/01/graphing-the-connections-between-my-blog-posts/#backlinks">Backlinks</a></h2>

<p>Internal links back to the Post is slightly trickier. WordPress doesn't save relational information like that. Instead, we get the Post's URl and <a href="https://shkspr.mobi/blog/2023/10/displaying-internal-linkbacks-on-wordpress/">search for that in the database</a>. Then we get the post IDs of all the posts which contain that string.</p>

<pre><code class="language-php">//  Get all the posts which link to this one, oldest first
$the_query = new WP_Query(
    array(
        's' =&gt; $search_url,
        'post_type' =&gt; 'post',
        "posts_per_page" =&gt; "-1",
        "order" =&gt; "ASC"
    )
);

//  Nothing to do if there are no inbound links
if ( !$the_query-&gt;have_posts() ) {
    return;
}
</code></pre>

<h2 id="backlinks-tags"><a href="https://shkspr.mobi/blog/2025/01/graphing-the-connections-between-my-blog-posts/#backlinks-tags">Backlinks' Tags</a></h2>

<p>Once we have an array of posts which link back here, we can get their tags as above:</p>

<pre><code class="language-php">//  Loop through the posts
while ( $the_query-&gt;have_posts() ) {
    //  Set it up
    $the_query-&gt;the_post();
    $id                  = get_the_ID();
    $title               = esc_html( get_the_title() );
    $url                 = get_the_permalink();
    $backlink_tags_text  = get_the_tag_list( "", ",", $ID );
    $backlink_tags_array = explode( "," , $backlink_tags_text );
}
</code></pre>

<h2 id="links-from-the-post"><a href="https://shkspr.mobi/blog/2025/01/graphing-the-connections-between-my-blog-posts/#links-from-the-post">Links from the Post</a></h2>

<p>Again, WordPress's lack of relational links is a weakness. In order to get internal links, we need to:</p>

<ol>
<li>Render the HTML using all the filters</li>
<li>Search for all <code>&lt;a href="…"&gt;</code></li>
<li>Extract the ones which start with the blog's domain</li>
<li>Get those posts' IDs.</li>
</ol>

<p>Rendering the content into HTML is done with:</p>

<pre><code class="language-php">$content = apply_filters( "the_content", get_the_content( null, false, $ID ) );
</code></pre>

<p>Searching for links is slightly more complex. The easiest way is to load the HTML into a DOMDocument, then extract all the anchors. All my blog posts start <code>/blog/YYYY</code> so I can avoid selecting links to tags, uploaded files, or other things. Your blog may be different.</p>

<pre><code class="language-php">$dom = new DOMDocument();
libxml_use_internal_errors( true ); //  Suppress warnings from malformed HTML
$dom-&gt;loadHTML( $content );
libxml_clear_errors();

$links = [];
foreach ( $dom-&gt;getElementsByTagName( "a" ) as $anchor ) {
    $href = $anchor-&gt;getAttribute( "href" );
    if (preg_match('/^https:\/\/shkspr\.mobi\/blog\/\d{4}$/', $href)) {
        $links[] = $href;
    }
}
</code></pre>

<p>The ID of each post can be found with <a href="https://developer.wordpress.org/reference/functions/url_to_postid/">the <code>url_to_postid()</code> function</a>. That means we can re-use the earlier code to see what tags those posts have.</p>

<h2 id="building-a-graph"><a href="https://shkspr.mobi/blog/2025/01/graphing-the-connections-between-my-blog-posts/#building-a-graph">Building a graph</a></h2>

<p>OK, so we have all our constituent parts. Let's build a graph!</p>

<p>Graphs consist of nodes (posts and tags) and edges (links between them). The exact format of the graph is going to depend on the graph library we use.</p>

<p>I've decided to use <a href="https://github.com/d3/d3-force?tab=readme-ov-file">D3.js's Force Graph</a> as it is relatively simple and produces a reasonably good looking interactive SVG.</p>

<p>Imagine there are two blog posts and two hashtags.</p>

<pre><code class="language-js">const nodes = [
    { id: 1, label: "Blog Post 1",    url: "https://example.com/post/1", group: "post" },
    { id: 2, label: "Blog Post 2",    url: "https://example.com/post/2", group: "post" },
    { id: 3, label: "hashtag",        url: "https://example.com/tag/3",  group: "tag"  },
    { id: 4, label: "anotherHashtag", url: "https://example.com/tag/4",  group: "tag"  },
];
</code></pre>

<ul>
<li>Blog Post 1 links to Blog Post 2.</li>
<li>Blog Post 1 has a #hashtag.</li>
<li>Both 1 &amp; 2 share #anotherHashtag.</li>
</ul>

<pre><code class="language-js">const links = [
    { source: 1, target: 2 },
    { source: 3, target: 1 },
    { source: 4, target: 1 },
    { source: 4, target: 2 },
];
</code></pre>

<p>Here's how to create a list of nodes and their links.  You will need to edit it for your own blog's peculiarities.</p>

<pre><code class="language-php">&lt;?php 
// Load WordPress environment
require_once( "wp-load.php" );

//  Set up arrays for nodes and links
$nodes = array();
$links = array();

//  ID of the Post
$main_post_id = 12345;

//  Get the Post's details
$main_post_url   = get_permalink( $main_post_id );
$main_post_title = get_the_title( $main_post_id );

//  Function to add new nodes
function add_item_to_nodes( &amp;$nodes, $id, $label, $url, $group ) {
    $nodes[] = [ 
        "id"    =&gt; $id, 
        "label" =&gt; $label, 
        "url"   =&gt; $url, 
        "group" =&gt; $group
    ];

}

//  Function to add new relationships
function add_relationship( &amp;$links, $source, $target ) {
    $links[] = [
        "source" =&gt; $source,
        "target" =&gt; $target
    ];
}

//  Add Post to the nodes
add_item_to_nodes( $nodes, $main_post_id, $main_post_title, $main_post_url, "post" );

//  Get the tags of the Post
$main_post_tags = get_the_tags( $main_post_id );

//  Add the tags as nodes, and create links to main Post
foreach( $main_post_tags as $tag ) {
    $id   = $tag-&gt;term_id;
    $name = $tag-&gt;name;

    //  Add the node
    add_item_to_nodes( $nodes, $id, $name, "https://shkspr.mobi/blog/tag/" . $name, "tag" );
    //  Add the relationship
    add_relationship( $links, $id, $main_post_id );
}

//  Get all the posts which link to this one, oldest first
$the_query = new WP_Query(
    array(
        's'              =&gt; $main_post_url,
        'post_type'      =&gt; 'post',
        "posts_per_page" =&gt; "-1",
        "order"          =&gt; "ASC"
    )
);

//  Nothing to do if there are no inbound links
if ( $the_query-&gt;have_posts() ) {
    //  Loop through the posts
    while ( $the_query-&gt;have_posts() ) {
        //  Set up the query
        $the_query-&gt;the_post();
        $post_id = get_the_ID();
        $title = esc_html( get_the_title() );
        $url   = get_the_permalink();

        //  Add the node
        add_item_to_nodes( $nodes, $post_id, $title, $url, "post" );
        //  Add the relationship
        add_relationship( $links, $post_id, $main_post_id );

        //  Get the tags of the Post
        $post_tags = get_the_tags( $post_id );

        //  Add the tags as nodes, and create links to main Post
        foreach($post_tags as $tag) {

            $id   = $tag-&gt;term_id;
            $name = $tag-&gt;name;

            //  Add the node
            add_item_to_nodes( $nodes, $id, $name, "https://shkspr.mobi/blog/tag/" . $name, "tag" );
            //  Add the relationship
            add_relationship( $links, $id, $post_id );
        }

    }
}

//  Get all the internal links from this post
//  Render the post as HTML
$content = apply_filters( "the_content", get_the_content( null, false, $ID ) );

//  Load it into HTML
$dom = new DOMDocument();
libxml_use_internal_errors( true );
$dom-&gt;loadHTML( $content );
libxml_clear_errors();

//  Get any &lt;a href="…" which starts with https://shkspr.mobi/blog/
$internal_links = [];
foreach ( $dom-&gt;getElementsByTagName( "a" ) as $anchor ) {
    $href = $anchor-&gt;getAttribute( "href" );
    if (preg_match('/^https:\/\/shkspr\.mobi\/blog\/\d{4}$/', $href)) {
        $internal_links[] = $href;
    }
}

//  Loop through the internal links, get their hashtags
foreach ( $internal_links as $url ) {
    $post_id = url_to_postid( $url );
    //  Get the Post's details
    $post_title = get_the_title( $id );

    //  Add the node
    add_item_to_nodes( $nodes, $post_id, $post_title, $url, "post" );
    //  Add the relationship
    add_relationship($links, $main_post_id, $post_id );

    //  Get the tags of the Post
    $post_tags = get_the_tags( $post_id );

    //  Add the tags as nodes, and create links to main Post
    foreach( $post_tags as $tag ) {
        $id   = $tag-&gt;term_id;
        $name = $tag-&gt;name;

        //  Add the node
        add_item_to_nodes( $nodes, $id, $name, "https://shkspr.mobi/blog/tag/" . $name, "tag" );
        //  Add the relationship
        add_relationship( $links, $id, $post_id );
    }
}

//  Deduplicate the nodes and links
$nodes_unique = array_unique( $nodes, SORT_REGULAR );
$links_unique = array_unique( $links, SORT_REGULAR );

//  Put them in the keyless format that D3 expects
$nodes_output = array();
$links_output = array();

foreach ( $nodes_unique as $node ) {
    $nodes_output[] = $node;
}

foreach ( $links_unique as $link ) {
    $links_output[] = $link;
}

//  Return the JSON
echo json_encode( $nodes_output, JSON_PRETTY_PRINT );
echo "\n";
echo json_encode( $links_output, JSON_PRETTY_PRINT );
</code></pre>

<h2 id="creating-a-force-directed-svg"><a href="https://shkspr.mobi/blog/2025/01/graphing-the-connections-between-my-blog-posts/#creating-a-force-directed-svg">Creating a Force Directed SVG</a></h2>

<p>Once the data are spat out, you can include them in a web-page. Here's a basic example:</p>

<pre><code class="language-html">&lt;!DOCTYPE html&gt;
&lt;html lang="en"&gt;
    &lt;head&gt;
        &lt;meta charset="UTF-8"&gt;
        &lt;meta name="viewport" content="width=device-width, initial-scale=1.0"&gt;
        &lt;title&gt;Force Directed Graph&lt;/title&gt;
        &lt;script src="https://d3js.org/d3.v7.min.js"&gt;&lt;/script&gt;
    &lt;/head&gt;
    &lt;body&gt;
        &lt;svg width="800" height="600"&gt;
            &lt;defs&gt;
                &lt;marker id="arrowhead" markerWidth="10" markerHeight="7" refX="10" refY="3.5" orient="auto" fill="#999"&gt;
                &lt;path d="M0,0 L10,3.5 L0,7 Z"&gt;&lt;/path&gt;
                &lt;/marker&gt;
            &lt;/defs&gt;
        &lt;/svg&gt;
        &lt;script&gt;
</code></pre>

<pre><code class="language-js">            const nodes = [];
            const links = [];

            const width  = 800;
            const height = 600;

            const svg = d3.select("svg")
                .attr( "width",  width  )
                .attr( "height", height );

            const simulation = d3.forceSimulation( nodes )
                .force( "link",   d3.forceLink( links ).id( d =&gt; d.id ).distance( 100 ) )
                .force( "charge", d3.forceManyBody().strength( -300 ) )
                .force( "center", d3.forceCenter( width / 2, height / 2 ) );

            //  Run simulation with simple animation
            simulation.on("tick", () =&gt; {
                link
                    .attr("x1", d =&gt; d.source.x)
                    .attr("y1", d =&gt; d.source.y)
                    .attr("x2", d =&gt; d.target.x)
                    .attr("y2", d =&gt; d.target.y);   node
                    .attr("transform", d =&gt; `translate(${d.x},${d.y})`);
            });

            // Draw links
            const link = svg.selectAll( ".link" )
                .data(links)
                .enter().append("line")
                .attr( "stroke", "#999" )
                .attr( "stroke-width", 2 )
                .attr( "x1", d =&gt; d.source.x )
                .attr( "y1", d =&gt; d.source.y )
                .attr( "x2", d =&gt; d.target.x )
                .attr( "y2", d =&gt; d.target.y )
                .attr( "marker-end", "url(#arrowhead)" );

            //  Draw nodes
            const node = svg.selectAll( ".node" )
                .data( nodes )
                .enter().append( "g" )
                .attr( "class", "node" )
                .attr( "transform", d =&gt; `translate(${d.x},${d.y})` )
                .call(d3.drag() //  Make nodes draggable
                    .on( "start", dragStarted )
                    .on( "drag",  dragged )
                    .on( "end",   dragEnded ) 
                );

            //  Add hyperlink
            node.append("a")
            .attr( "xlink:href", d =&gt; d.url ) //    Link to the node's URL
            .attr( "target", "_blank" ) //  Open in a new tab
            .each(function (d) {
                const a = d3.select(this);
                //  Different shapes for posts and tags
                if ( d.group === "post" ) {
                    a.append("circle")
                        .attr("r", 10)
                        .attr("fill", "blue");
                } else if ( d.group === "tag" ) {
                    //  White background rectangle
                    a.append("rect")
                            .attr("width", 20)
                            .attr("height", 20)
                            .attr("x", -10)
                            .attr("y", -10)
                            .attr("fill", "white"); 
                    // Red octothorpe
                    a.append("path")
                            .attr("d", "M-10,-5 H10 M-10,5 H10 M-5,-10 V10 M5,-10 V10") 
                            .attr("stroke", "red")
                            .attr("stroke-width", 2)
                            .attr("fill", "none");
                }
                //  Text label
                a.append( "text")
                    .attr( "dy", 4 )
                    .attr( "x", d =&gt; ( d.group === "post" ? 12 : 14 ) )
                    .attr( "fill", "black" )
                    .style("font-size", "12px" )
                    .text( d.label );
            });

            //  Standard helper functions to make nodes draggable
            function dragStarted( event, d ) {
                if ( !event.active ) simulation.alphaTarget(0.3).restart();
                d.fx = d.x;
                d.fy = d.y;
            }
            function dragged( event, d ) {
                d.fx = event.x;
                d.fy = event.y;
            }
            function dragEnded( event, d ) {
                if (!event.active) simulation.alphaTarget(0);
                d.fx = null;
                d.fy = null;
            }
</code></pre>

<pre><code class="language-html">        &lt;/script&gt;
    &lt;/body&gt;
&lt;/html&gt;
</code></pre>

<h2 id="next-steps"><a href="https://shkspr.mobi/blog/2025/01/graphing-the-connections-between-my-blog-posts/#next-steps">Next Steps</a></h2>

<p>It needs a bit of cleaning up if I want to turn it into a WordPress plugin. It might be nice to make it a static SVG rather than relying on JavaScript. And the general æsthetic needs a bit of work.</p>

<p>Perhaps I could make it 3D like my <a href="https://shkspr.mobi/blog/2023/04/msc-dissertation-exploring-the-visualisation-of-hierarchical-cybersecurity-data-within-the-metaverse/">MSc Dissertation</a>?</p>

<p>But I'm pretty happy with that for an afternoon hack!</p>

<p>You can <a href="https://gitlab.com/edent/blog-theme/-/blob/master/includes/graph.php">get the code</a> if you want to play.</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=55159&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2025/01/graphing-the-connections-between-my-blog-posts/feed/</wfw:commentRss>
			<slash:comments>3</slash:comments>
		
		
			</item>
		<item>
		<title><![CDATA[Order WordPress Posts by Most Comments]]></title>
		<link>https://shkspr.mobi/blog/2024/12/order-wordpress-posts-by-most-comments/</link>
					<comments>https://shkspr.mobi/blog/2024/12/order-wordpress-posts-by-most-comments/#respond</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Thu, 12 Dec 2024 12:34:42 +0000</pubDate>
				<category><![CDATA[/etc/]]></category>
		<category><![CDATA[blogging]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[WordPress]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=54404</guid>

					<description><![CDATA[I take great delight in seeing people reply to my blog posts.  I use WebMentions to collect replies from social media and other sites. But which of my posts has the most comments? Here&#039;s a snipped to stick in your functions.php file. It allows you to add ?comment-order to any WordPress URl and have the posts with the most comments on top.  //  Add ordering by comments add_action( &#039;pre_get_posts&#039;, …]]></description>
										<content:encoded><![CDATA[<p>I take great delight in seeing people reply to my blog posts.  I use WebMentions to collect replies from social media and other sites. But which of my posts has the most comments? Here's a snipped to stick in your <code>functions.php</code> file. It allows you to add <code>?comment-order</code> to any WordPress URl and have the posts with the most comments on top.</p>

<pre><code class="language-php">//  Add ordering by comments
add_action( 'pre_get_posts', 'pre_get_posts_by_comments' );
function pre_get_posts_by_comments( $query ) {
    //  Do nothing if the post_status parameter in the URL is not "comment-order"
    if ( ! isset( $_GET['comment-order'] ) ) {
        return;
    }

    $query-&gt;set( "orderby", "comment_count" );  //  Default: date
    $query-&gt;set( "order", "DESC" ); //  Biggest first
}
</code></pre>

<p>This makes use of <a href="https://developer.wordpress.org/reference/hooks/pre_get_posts/">the <code>pre_get_posts</code> hook</a> to rewrite the posts query. That means it works on most WordPress pages.</p>

<p>For example:</p>

<ul>
<li>My homepage <a href="https://shkspr.mobi/blog/?comment-order">https://shkspr.mobi/blog/?comment-order</a></li>
<li>Posts with a specific tag <a href="https://shkspr.mobi/blog/tag/blockchain/?comment-order">https://shkspr.mobi/blog/tag/blockchain/?comment-order</a></li>
<li>Dates <a href="https://shkspr.mobi/blog/2012/?comment-order">https://shkspr.mobi/blog/2012/?comment-order</a></li>
</ul>

<p>Did you find this post useful? Please leave a comment here!</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=54404&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2024/12/order-wordpress-posts-by-most-comments/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title><![CDATA[Change WordPress Fragment Links in RSS Feeds to be Permalinks]]></title>
		<link>https://shkspr.mobi/blog/2024/12/change-wordpress-fragment-links-in-rss-feeds-to-be-permalinks/</link>
					<comments>https://shkspr.mobi/blog/2024/12/change-wordpress-fragment-links-in-rss-feeds-to-be-permalinks/#comments</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Fri, 06 Dec 2024 12:34:14 +0000</pubDate>
				<category><![CDATA[/etc/]]></category>
		<category><![CDATA[HowTo]]></category>
		<category><![CDATA[HTML]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[WordPress]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=54080</guid>

					<description><![CDATA[Here&#039;s a knotty problem. Lots of my posts use URl Fragments. Those are links which start with #. They allow me to write:  &#60;a href=&#34;#where-is-this-a-problem&#62;Jump to heading&#60;/a&#62;   So when someone clicks on a link, they go straight to the relevant section.  For example, they might want to skip straight to how to fix it.  Isn&#039;t that clever?  Where is this a problem?  This works great when someone is…]]></description>
										<content:encoded><![CDATA[<p>Here's a knotty problem. Lots of my posts use <a href="https://developer.mozilla.org/en-US/docs/Web/URI/Fragment">URl Fragments</a>. Those are links which start with <code>#</code>. They allow me to write:</p>

<pre><code class="language-html">&lt;a href="https://shkspr.mobi/blog/2024/12/change-wordpress-fragment-links-in-rss-feeds-to-be-permalinks/#where-is-this-a-problem&gt;Jump%20to%20heading&lt;/a&gt;/code/prepSo%20when%20someone%20clicks%20on%20a%20link,%20they%20a%20href="#where-is-this-a-problem">go straight to the relevant section</a>.  For example, they might want to skip straight to <a href="https://shkspr.mobi/blog/2024/12/change-wordpress-fragment-links-in-rss-feeds-to-be-permalinks/#how-to-fix-it">how to fix it</a>.</p>

<p>Isn't that clever?</p>

<h2 id="where-is-this-a-problem"><a href="https://shkspr.mobi/blog/2024/12/change-wordpress-fragment-links-in-rss-feeds-to-be-permalinks/#where-is-this-a-problem">Where is this a problem?</a></h2>

<p>This works great when someone is on my website. They're on the page, and a fragment links straight to the correct section of that page.</p>

<p>But some people view this blog in RSS &amp; Atom feeds - and those feeds also power my newsletter.</p>

<p>When those people see a fragment, it is devoid of its original context. So they end up going to some random location, or my homepage.</p>

<h2 id="how-to-fix-it"><a href="https://shkspr.mobi/blog/2024/12/change-wordpress-fragment-links-in-rss-feeds-to-be-permalinks/#how-to-fix-it">How to fix it?</a></h2>

<p>Stick this into your WordPress theme's <code>functions.php</code> file:</p>

<pre><code class="language-php">//  In the RSS feed, change #whatever to &lt;permalink&gt;#whatever
function rewrite_fragment_links_in_rss($content) {
    global $post;

    //  Ensure this is a feed
    if ( is_feed() &amp;&amp; $post instanceof WP_Post ) {
        //  Get the permalink
        $base_url = get_permalink( $post );

        //  Regex to get href="https://shkspr.mobi/blog/2024/12/change-wordpress-fragment-links-in-rss-feeds-to-be-permalinks/#%20%20%20%20%20%20%20%20$content%20=%20preg_replace_callback(%20%20%20%20%20%20%20%20%20%20%20%20"/href=["\']#([^"\']+)["\']/',
            function ( $matches ) use ( $base_url ) {
                return 'href="' . esc_url( $base_url . '#' . $matches[1] ) . '"';
            },
            $content
        );
    }

    return $content;
}

//  Hook into feed filters for both excerpts and full content
add_filter( "the_excerpt_rss",  "rewrite_fragment_links_in_rss" );
add_filter( "the_content_feed", "rewrite_fragment_links_in_rss" );
</code></pre>

<p>That listens out for the RSS feed being generated and replaces <code>#whatever</code> with <code>https://shkspr.mobi/blog/2024/12/change-wordpress-fragment-links-in-rss-feeds-to-be-permalinks#whatever</code></p>

<p>Nifty!</p>

<p>Hopefully, if you click on the links in my emails and feeds, it should take you to the right place now.</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=54080&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2024/12/change-wordpress-fragment-links-in-rss-feeds-to-be-permalinks/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
			</item>
		<item>
		<title><![CDATA[A simple and free way to post RSS feeds to Threads]]></title>
		<link>https://shkspr.mobi/blog/2024/11/a-simple-and-free-way-to-post-rss-feeds-to-threads/</link>
					<comments>https://shkspr.mobi/blog/2024/11/a-simple-and-free-way-to-post-rss-feeds-to-threads/#comments</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Wed, 06 Nov 2024 12:30:00 +0000</pubDate>
				<category><![CDATA[/etc/]]></category>
		<category><![CDATA[api]]></category>
		<category><![CDATA[HowTo]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[rss]]></category>
		<category><![CDATA[Threads]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=53776</guid>

					<description><![CDATA[Threads is Meta&#039;s attempt to disrupt the social media landscape. Whether you care for it or not, there are a lot of users there. And, sometimes, you have to go where the audience is.  Here&#039;s how I build a really simple PHP tool to post to Threads using their official API.  This allows you to send a single status update programatically, or regularly send new items from your RSS feed to an account. …]]></description>
										<content:encoded><![CDATA[<p><a href="https://threads.net">Threads</a> is Meta's attempt to disrupt the social media landscape. Whether you care for it or not, there are a lot of users there. And, sometimes, you have to go where the audience is.</p>

<p>Here's how I build a really simple PHP tool to post to Threads using their official API.  This allows you to send a single status update programatically, or regularly send new items from your RSS feed to an account.</p>

<p>You can see the bot in action at <a href="https://www.threads.net/@openbenches_org">https://www.threads.net/@openbenches_org</a></p>

<h2 id="get-the-code"><a href="https://shkspr.mobi/blog/2024/11/a-simple-and-free-way-to-post-rss-feeds-to-threads/#get-the-code">Get the code</a></h2>

<p>The <a href="https://codeberg.org/edent/RSS2Threads">code is available as Open Source</a>. It should be fairly self explanatory for a moderately competent programmer - but feel free to open an issue if you think it is confusing.</p>

<h2 id="get-it-working"><a href="https://shkspr.mobi/blog/2024/11/a-simple-and-free-way-to-post-rss-feeds-to-threads/#get-it-working">Get it working</a></h2>

<ol>
<li>Create an account on Threads (duh!) - this involves signing up to Instagram.</li>
<li>Create a Facebook Developer account.</li>
<li>Create <a href="https://developers.facebook.com/apps/">an app which requests the Threads posting API</a>.

<ul>
<li>You do not need to publish this app if you're only using it yourself.</li>
</ul></li>
<li>Create a User Token using the "User Token Generator"</li>
<li>Get your <a href="https://developers.facebook.com/docs/threads/threads-profiles/">Threads account's User ID</a> with:

<ul>
<li><code>curl -s -X GET "https://graph.threads.net/v1.0/me?ields=id,username,name,threads_profile_picture_url,threads_biography&amp;access_token=TOKEN"</code></li>
<li>(Yes, <code>ields</code>. If you use <code>fields</code> you get something else!)</li>
</ul></li>
<li>Clone the <a href="https://codeberg.org/edent/RSS2Threads">RSS2Threads repo</a> and stick it on a webserver somewhere.</li>
<li>Rename <code>config.sample.php</code> to <code>config.php</code> and add your feeds' details, along with your ID and Token.</li>
<li>Run <code>php rss2threads.php</code></li>
</ol>

<p>And that's it!</p>

<p>The service will download your RSS feed, check if it has posted the entries to Threads and, if not, post them.</p>

<h2 id="how-i-built-it"><a href="https://shkspr.mobi/blog/2024/11/a-simple-and-free-way-to-post-rss-feeds-to-threads/#how-i-built-it">How I built it</a></h2>

<p>Shoulders of giants, and all that! I have been using <a href="https://codeberg.org/nesges/rss2bsky">Thomas Nesges's RSS2BSky</a> for auto-posting to BlueSky. I also used <a href="https://github.com/0xjessel/threads-bart-bot">Jesse Chen's Python Threads example code</a>.</p>

<p>Posting is a two stage process.</p>

<ol>
<li>POST the URl encoded text to:

<ul>
<li><code>https://graph.threads.net/USER_ID/threads?text=My%20post&amp;access_token=TOKEN&amp;media_type=TEXT</code></li>
<li>If successful, the API will return a Creation ID.</li>
</ul></li>
<li>POST the Creation ID to:

<ul>
<li><code>https://graph.threads.net/USER_ID/threads_publish?creation_id=CREATION_ID&amp;access_token=TOKEN</code></li>
<li>If successful, the API will return a Post ID.</li>
</ul></li>
</ol>

<p>Successful RSS posts are stored in a simple SQLite database. If an RSS entry was posted successfully, it won't be reposted.</p>

<h2 id="caveats"><a href="https://shkspr.mobi/blog/2024/11/a-simple-and-free-way-to-post-rss-feeds-to-threads/#caveats">Caveats</a></h2>

<ul>
<li>There are no unit tests, fuzzing, or exception handling. It's assumed you're running this on well-formed RSS that you trust.</li>
<li>The Threads API is <strong>slow!</strong> It takes ages for a post to be sent to it.</li>
<li><a href="https://developers.facebook.com/docs/development/create-an-app/threads-use-case">Getting a Threads API token</a> is <strong>difficult</strong> and the margin is too small for me to explain it here.</li>
</ul>

<h2 id="feedback"><a href="https://shkspr.mobi/blog/2024/11/a-simple-and-free-way-to-post-rss-feeds-to-threads/#feedback">Feedback</a></h2>

<p>Please leave a comment here or <a href="https://codeberg.org/edent/RSS2Threads">on the code repository</a>.</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=53776&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2024/11/a-simple-and-free-way-to-post-rss-feeds-to-threads/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
			</item>
		<item>
		<title><![CDATA[Using phpList for a blog's newsletter]]></title>
		<link>https://shkspr.mobi/blog/2024/10/using-phplist-for-a-blogs-newsletter/</link>
					<comments>https://shkspr.mobi/blog/2024/10/using-phplist-for-a-blogs-newsletter/#comments</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Thu, 31 Oct 2024 12:34:36 +0000</pubDate>
				<category><![CDATA[/etc/]]></category>
		<category><![CDATA[email]]></category>
		<category><![CDATA[HowTo]]></category>
		<category><![CDATA[newsletter]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[rss]]></category>
		<category><![CDATA[WordPress]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=53583</guid>

					<description><![CDATA[Some people like to receive this blog via email. I previously used JetPack to send out subscriber messages - but it became increasingly clear that Automattic isn&#039;t a good steward of such things.  I couldn&#039;t find any services which would let me send a few thousand subscribers a few emails per week, at zero cost.  So, redecentralise!  I installed phpList which is an open source email campaign tool. …]]></description>
										<content:encoded><![CDATA[<p>Some people like to receive this blog via email. I previously used JetPack to send out subscriber messages - but it became increasingly clear that Automattic isn't a good steward of such things.  I couldn't find any services which would let me send a few thousand subscribers a few emails per week, at zero cost.</p>

<p>So, redecentralise!</p>

<p>I installed <a href="https://www.phplist.org/">phpList</a> which is an open source email campaign tool.  My webhost - <a href="https://krystal.io/">Krystal</a> - had a one-click install option. But, phpList isn't quite one-click for sending out a regular blog newsletter.  <a href="https://discuss.phplist.org/t/daily-rss-problems-there-are-no-feed-items-that-will-be-included-in-the-first-campaign/9835/">I found the set-up to be quite confusing</a>, so here are the steps I took to turn an RSS feed into an Email Newsletter for free.</p>

<h2 id="install-the-plugins"><a href="https://shkspr.mobi/blog/2024/10/using-phplist-for-a-blogs-newsletter/#install-the-plugins">Install the plugins</a></h2>

<ol>
<li>Navigate to Config → Manage plugins</li>
<li>Enable "CommonPlugin"</li>
<li>Add the <a href="https://resources.phplist.com/plugin/rssfeed">RSS Feed Plugin</a> using the Plugin package URL <code>https://github.com/bramley/phplist-plugin-rssfeed/archive/master.zip</code></li>
</ol>

<h2 id="configure-the-rss-feed-plugin"><a href="https://shkspr.mobi/blog/2024/10/using-phplist-for-a-blogs-newsletter/#configure-the-rss-feed-plugin">Configure the RSS Feed Plugin</a></h2>

<ol>
<li>Navigate to Config → Settings</li>
<li>Scroll down to the RSS Settings</li>
<li>Set both Minimum <em>and</em> Maximum number of items to 1<br><img src="https://shkspr.mobi/blog/wp-content/uploads/2024/10/rsssettings-fs8.png" alt="RSS Settings Screen." width="888" height="450" class="aligncenter size-full wp-image-53584"><br>That will ensure you only send the latest RSS item as your newsletter.</li>
<li>Set "Use the item summary content (the description or summary element) instead of the content element" to "No". This will allow the full text of the RSS item to be sent.</li>
</ol>

<h2 id="edit-config-php"><a href="https://shkspr.mobi/blog/2024/10/using-phplist-for-a-blogs-newsletter/#edit-config-php">Edit <code>config.php</code></a></h2>

<p>For some reason, you need to manually edit this file in a text editor, rather than a GUI.</p>

<ol>
<li>Set <code>define('USE_REPETITION', 1);</code> - this allows the newsletter to be sent whenever there is a new RSS item.</li>
<li>Set <code>define('CLICKTRACK', 0);</code> - this removes tracking links from your emails. I don't care who opens my emails or what they click on.</li>
</ol>

<h2 id="add-the-campaign"><a href="https://shkspr.mobi/blog/2024/10/using-phplist-for-a-blogs-newsletter/#add-the-campaign">Add The Campaign</a></h2>

<ol>
<li>Go to  Campaigns → Send a campaign.</li>
<li>Start a new campaign.</li>
</ol>

<h3 id="tab-1"><a href="https://shkspr.mobi/blog/2024/10/using-phplist-for-a-blogs-newsletter/#tab-1">Tab 1</a></h3>

<ol>
<li>Campaign subject should be <code>[RSSITEM:TITLE]</code> - that will make the subject line the same as your <strong>post</strong>'s title</li>
<li>Compose message should be <code>[RSS]</code> - that will ensure the contents come from your RSS feed.</li>
</ol>

<h3 id="tab-2"><a href="https://shkspr.mobi/blog/2024/10/using-phplist-for-a-blogs-newsletter/#tab-2">Tab 2</a></h3>

<ol>
<li>Add your RSS feed's URl</li>
<li>Order items "Newest" first - to get the most recent item.</li>
<li>Add a custom HTML template. I used one from <a href="https://emailframe.work/">https://emailframe.work/</a></li>
</ol>

<pre><code class="language-html">&lt;div style="margin:0; padding:0; background-color:#F2F2F2;"&gt;
  &lt;h1&gt;&lt;a href="[URL]"&gt;[TITLE]&lt;/a&gt;&lt;/h1&gt;
  &lt;table width="100%" border="0" cellpadding="0" cellspacing="0" bgcolor="#F2F2F2"&gt;
      &lt;tr&gt;
          &lt;td valign="top"&gt;
              [CONTENT]
          &lt;/td&gt;
      &lt;/tr&gt;
  &lt;/table&gt;
&lt;/div&gt;
</code></pre>

<h3 id="tab-3"><a href="https://shkspr.mobi/blog/2024/10/using-phplist-for-a-blogs-newsletter/#tab-3">Tab 3</a></h3>

<ol>
<li>Send as HTML</li>
</ol>

<h3 id="tab-4"><a href="https://shkspr.mobi/blog/2024/10/using-phplist-for-a-blogs-newsletter/#tab-4">Tab 4</a></h3>

<ol>
<li>"Stop sending after" - choose the furthest date in the future possible.</li>
<li>"Repeat campaign every" - I chose "hour". That should check the RSS feed each hour.</li>
</ol>

<h3 id="tab-5"><a href="https://shkspr.mobi/blog/2024/10/using-phplist-for-a-blogs-newsletter/#tab-5">Tab 5</a></h3>

<ol>
<li>"Lists" - pick the email list you want to send from.</li>
</ol>

<h3 id="tab-6"><a href="https://shkspr.mobi/blog/2024/10/using-phplist-for-a-blogs-newsletter/#tab-6">Tab 6</a></h3>

<ol>
<li>You should be finished! It will tell you if there are any errors.</li>
<li>Place the campaign in the queue for processing.</li>
</ol>

<h2 id="wordpress-sign-up-form"><a href="https://shkspr.mobi/blog/2024/10/using-phplist-for-a-blogs-newsletter/#wordpress-sign-up-form">WordPress Sign Up Form</a></h2>

<p>You can either redirect users to your phpList subscription page, or put a form directly on your site.</p>

<pre><code class="language-html">&lt;form method="post" action="/YourSubscribePage/?p=subscribe&amp;id=1" name="subscribeform"&gt;
    &lt;label for="email"&gt;Email address:&lt;/label&gt;
    &lt;input type="email" name="email" required="required" placeholder="" size="40" id="email"&gt;
    &lt;input type="hidden" name="htmlemail" value="1"&gt;
    &lt;input type="hidden" name="list[2]" value="signup"&gt;
    &lt;input type="hidden" name="listname[2]" value="newsletter"&gt;
    &lt;div style="display:none"&gt;
        &lt;input type="text" name="VerificationCodeX" value="" size="20"&gt;
    &lt;/div&gt;
    &lt;input type="submit" name="subscribe" value="Subscribe"&gt;
&lt;/form&gt;
</code></pre>

<p>Adjust the hidden parameters based on your list.</p>

<p>If in doubt, go to Config →  Subscribe pages, and generate a new subscribe page. Then copy the form from that.</p>

<h2 id="cron-jobs"><a href="https://shkspr.mobi/blog/2024/10/using-phplist-for-a-blogs-newsletter/#cron-jobs">Cron Jobs</a></h2>

<p>You need two cron jobs set up.</p>

<h3 id="update-the-rss-feed"><a href="https://shkspr.mobi/blog/2024/10/using-phplist-for-a-blogs-newsletter/#update-the-rss-feed">Update the RSS feed</a></h3>

<p>I run this every hour:</p>

<p><code>/usr/bin/php /path/to/YourSubscribePage/admin/index.php -p get -m RssFeedPlugin -c /path/to/YourSubscribePage/config/config.php</code></p>

<h3 id="process-the-queue"><a href="https://shkspr.mobi/blog/2024/10/using-phplist-for-a-blogs-newsletter/#process-the-queue">Process the Queue</a></h3>

<p>I run this a few minutes after the RSS feed is updated</p>

<p><code>/usr/bin/php -q /path/to/YourSubscribePage/admin/index.php -p processqueue -c /path/to/YourSubscribePage/config/config.php &gt;/dev/null</code></p>

<h2 id="and-then"><a href="https://shkspr.mobi/blog/2024/10/using-phplist-for-a-blogs-newsletter/#and-then">And then...</a></h2>

<p>That <em>should</em> be it.  There are lots of options which you can fiddle around with. But the above should be enough to get your first newsletter out.</p>

<p>Huge thanks to <a href="https://dcameron.me.uk/">Duncan Cameron</a> for graciously answering my noddy questions and helping me out with the config.</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=53583&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2024/10/using-phplist-for-a-blogs-newsletter/feed/</wfw:commentRss>
			<slash:comments>2</slash:comments>
		
		
			</item>
		<item>
		<title><![CDATA[WordPress - Display hook action priority in the dashboard]]></title>
		<link>https://shkspr.mobi/blog/2024/08/wordpress-display-hook-action-priority-in-the-dashboard/</link>
					<comments>https://shkspr.mobi/blog/2024/08/wordpress-display-hook-action-priority-in-the-dashboard/#comments</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Sat, 31 Aug 2024 11:34:14 +0000</pubDate>
				<category><![CDATA[/etc/]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[widget]]></category>
		<category><![CDATA[WordPress]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=52230</guid>

					<description><![CDATA[If your WordPress site has lots of plugins, it&#039;s sometimes difficult to keep track of what is manipulating your content. Ever wondered what priority all your various actions and filters have? This is a widget which will show you which actions are registered to your blog&#039;s hooks, and their priority order.  It looks like this:    Stick this code in your theme&#039;s functions.php or in its own plugin. …]]></description>
										<content:encoded><![CDATA[<p>If your WordPress site has lots of plugins, it's sometimes difficult to keep track of what is manipulating your content. Ever wondered what priority all your various actions and filters have? This is a widget which will show you which actions are registered to your blog's hooks, and their priority order.</p>

<p>It looks like this:</p>

<img src="https://shkspr.mobi/blog/wp-content/uploads/2024/08/priorities-fs8.png" alt="List of actions with various priorities." width="600" height="599" class="aligncenter size-full wp-image-52231">

<p>Stick this code in your theme's <code>functions.php</code> or in its own plugin.</p>

<pre><code class="language-php">function edent_priority_dashboard_widget_contents() {
    global $wp_filter; 
    //  Change this to the hook you're interested in
    $hook_name = "the_content";
    if ( isset( $wp_filter[$hook_name] ) ) {

        //  Display the hook name in the widget
        echo "&lt;h3&gt;{$hook_name}&lt;/h3&gt;";

        //  Start a list
        echo "&lt;ul&gt;";

        //  Loop through the callbacks in priority order
        foreach ( $wp_filter[$hook_name]-&gt;callbacks as $priority =&gt; $callbacks ) {
            echo "&lt;li&gt;Priority: {$priority}&lt;ul&gt;";

            foreach ( $callbacks as $callback ) {
                //  Some callbacks are arrays
                if ( is_array( $callback["function"] ) ) {
                    if (is_object($callback["function"][0])) {
                        $callback_info = get_class($callback["function"][0]) . '::' . $callback["function"][1];
                    } else {
                        $callback_info = $callback["function"][0] . '::' . $callback["function"][1];
                    }
                } else {
                    $callback_info = $callback["function"];
                }
                //  Show the information
                echo "&lt;li&gt;Callback: {$callback_info}&lt;/li&gt;";
            }
            echo "&lt;/ul&gt;&lt;/li&gt;";
        }
        echo '&lt;/ul&gt;';

    } 
    else {
        echo "No filters found for hook: {$hook_name}";
    }

    //  Scrap of CSS to ensure list items display properly on the dashboard
    $priority_css_code = "#edent_dashboard_widget ul { list-style: circle; padding: 1em; }";
    //  Inline the CSS
    echo "&lt;link rel=\"stylesheet\" type=\"text/css\" href=\"data:text/css;base64," . 
        base64_encode($priority_css_code) . "\"&gt;";

}

//  Register the widget with the admin dashboard
function edent_register_dashboard_widget() {
    wp_add_dashboard_widget(
        "edent_dashboard_widget",   //  ID of the widget
        "Priorities",   //  Title of the widget
        "edent_priority_dashboard_widget_contents"  //  Function to run
    );
}
add_action( "wp_dashboard_setup", "edent_register_dashboard_widget" );
</code></pre>

<h2 id="why"><a href="https://shkspr.mobi/blog/2024/08/wordpress-display-hook-action-priority-in-the-dashboard/#why">Why?</a></h2>

<p>WordPress lets you <a href="https://developer.wordpress.org/plugins/hooks/">add actions and filters to hooks</a>.  For example, whenever your blog wants to show some content, a hook of <code>the_content</code> is run.</p>

<p>You can add an action to run a function when that happens. For example, if you want to make all the text in your blog posts uppercase, you could add this to your theme or plugin:</p>

<pre><code class="language-php">function lower_case_everything( $content ) {
   return strtolower( $content );
}
add_filter( 'the_content', 'lower_case_everything', 99 );
</code></pre>

<p>The <code>add_filter</code> says "When the hook called <code>the_content</code> is fired, run the function <code>lower_case_everything</code>, with a priority of 99".  The lower the number, the sooner the function is run.</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=52230&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2024/08/wordpress-display-hook-action-priority-in-the-dashboard/feed/</wfw:commentRss>
			<slash:comments>2</slash:comments>
		
		
			</item>
		<item>
		<title><![CDATA[Liberate your Markdown posts from JetPack in WordPress]]></title>
		<link>https://shkspr.mobi/blog/2024/08/liberate-your-markdown-posts-from-jetpack-in-wordpress/</link>
					<comments>https://shkspr.mobi/blog/2024/08/liberate-your-markdown-posts-from-jetpack-in-wordpress/#respond</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Sun, 25 Aug 2024 11:34:20 +0000</pubDate>
				<category><![CDATA[/etc/]]></category>
		<category><![CDATA[jetpack]]></category>
		<category><![CDATA[markdown]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[WordPress]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=51755</guid>

					<description><![CDATA[A scrap of code which I hope helps you.  Problem  You installed the WordPress JetPack plugin and wrote all your blog posts in Markdown. Now you want to remove JetPack or replace it with a better Markdown parser.  You turn off JetPack&#039;s &#34;Write posts or pages in plain-text Markdown syntax&#34;.  You click edit on a post and see the HTML version of your page. Where did the Markdown version go? …]]></description>
										<content:encoded><![CDATA[<p>A scrap of code which I hope helps you.</p>

<h2 id="problem"><a href="https://shkspr.mobi/blog/2024/08/liberate-your-markdown-posts-from-jetpack-in-wordpress/#problem">Problem</a></h2>

<p>You installed the WordPress JetPack plugin and wrote all your blog posts in Markdown. Now you want to remove JetPack or replace it with a better Markdown parser.</p>

<p>You turn off JetPack's "Write posts or pages in plain-text Markdown syntax".  You click edit on a post and see the HTML version of your page. Where did the Markdown version go?</p>

<h2 id="background"><a href="https://shkspr.mobi/blog/2024/08/liberate-your-markdown-posts-from-jetpack-in-wordpress/#background">Background</a></h2>

<p>When you write using JetPack's Markdown plugin, the Markdown version is stored in <code>post_content_filtered</code>. When you hit "publish" or "update", the page is parsed as Markdown and the HTML output is stored in <code>post_content</code>.</p>

<p>When you hit "edit", the <code>post_content_filtered</code> version is loaded into the editor - and the process starts again.</p>

<h2 id="solution"><a href="https://shkspr.mobi/blog/2024/08/liberate-your-markdown-posts-from-jetpack-in-wordpress/#solution">Solution</a></h2>

<p>When you edit a post, replace the content with the filtered version, then delete the filtered version.</p>

<p>Place this code in your theme's <code>functions.php</code>.</p>

<pre><code class="language-php">function edit_markdown_content( $content, $id ) {
    $post = get_post( $id );
    if ( $post &amp;&amp; ! empty( $post-&gt;post_content_filtered ) ) {
        //  Get the Markdown version
        $markdown = $post-&gt;post_content_filtered;

        //  Delete the post_content_filtered version
        global $wpdb;
        $debug = $wpdb-&gt;query( 
            $wpdb-&gt;prepare(
                "UPDATE $wpdb-&gt;posts SET `post_content_filtered` = '' WHERE `wp_posts`.`ID` = %d",
                                                                                              $id
            )
        );

        //  Replace the post_content with the Markdown version
        $post-&gt;post_content = $markdown;

        //  Send it to the editor with a message saying that it was restored, along with the date of restoration
        return "&lt;!-- Restored from post_content_filtered \n" . date("c") . "\n--&gt;" . $post-&gt;post_content;
    }
    return $post-&gt;post_content;
}
add_filter( "edit_post_content", "edit_markdown_content", 1, 2 );
</code></pre>

<p>I adapted it from <a href="https://github.com/terrylinooo/githuber-md/blob/5ae517a549600f719645d35baf30b29a8069ebcc/src/Controllers/Markdown.php#L1111">WP Githuber MD</a></p>

<h2 id="direct-mysql"><a href="https://shkspr.mobi/blog/2024/08/liberate-your-markdown-posts-from-jetpack-in-wordpress/#direct-mysql">Direct MySQL</a></h2>

<p>If you want to automatically convert all your posts, you can edit your database directly.</p>

<pre><code class="language-mysql">UPDATE wp_posts
SET 
    post_content = post_content_filtered,
    post_content_filtered = ''
WHERE post_content_filtered IS NOT NULL AND post_content_filtered&lt;&gt;'';
</code></pre>

<h2 id="warning"><a href="https://shkspr.mobi/blog/2024/08/liberate-your-markdown-posts-from-jetpack-in-wordpress/#warning">Warning</a></h2>

<p>If you do not have a Markdown parser installed, posts will come out looking *very* strange.</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=51755&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2024/08/liberate-your-markdown-posts-from-jetpack-in-wordpress/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title><![CDATA[Working around an old and buggy HTML Tidy in PHP]]></title>
		<link>https://shkspr.mobi/blog/2024/08/working-around-and-old-and-buggy-html-tidy-in-php/</link>
					<comments>https://shkspr.mobi/blog/2024/08/working-around-and-old-and-buggy-html-tidy-in-php/#respond</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Sat, 17 Aug 2024 11:34:50 +0000</pubDate>
				<category><![CDATA[/etc/]]></category>
		<category><![CDATA[HTML]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[php]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=51208</guid>

					<description><![CDATA[Dan Q very kindly shared his script to make WordPress do good HTML. But I couldn&#039;t get it working.  Looking at the HTML it was spitting out, the meta generator said it was HTML Tidy version 5.6.0.  That&#039;s quite old!  I confirmed this by running:  echo tidy_get_release();   Which spat out 2017/11/25. Aha!  There are a few bugs in this version of HTML Tidy, some of which are fixed in later…]]></description>
										<content:encoded><![CDATA[<p>Dan Q very kindly shared his <a href="https://github.com/Dan-Q/wp-htmltidy-hack-demo">script to make WordPress do good HTML</a>. But I couldn't get it working.</p>

<p>Looking at the HTML it was spitting out, the meta generator said it was HTML Tidy version 5.6.0.  That's quite old!  I confirmed this by running:</p>

<pre><code class="language-php">echo tidy_get_release();
</code></pre>

<p>Which spat out <code>2017/11/25</code>. Aha!</p>

<p>There are a few bugs in this version of HTML Tidy, some of which are fixed in later versions.</p>

<p>Here's how to fix them.</p>

<p><a href="https://www.php.net/manual/en/tidy.examples.basic.php#107877">Auto Indent doesn't work</a>. This is fixed by manually specifying <code>"indent" =&gt; 2</code></p>

<p><a href="https://github.com/htacg/tidy-html5/issues/1107">Indent with tabs doesn't work</a>.  So I told it to indent with 8 spaces using <code>"indent-spaces"       =&gt; 8,</code></p>

<p>Then I used a regex (<a href="https://stackoverflow.com/questions/590747/using-regular-expressions-to-parse-html-why-not">naughty!</a>) to replace 8 spaces with a tab.</p>

<pre><code class="language-php">$tidy = preg_replace( '/        /', "\t", $tidy );
</code></pre>

<p>Older versions of Tidy <a href="https://github.com/htacg/tidy-html5/issues/1097">don't support newer HTML elements like <code>&lt;search&gt;</code></a>.  This can be fixed with <code>"new-blocklevel-tags" =&gt; "search",</code></p>

<p><a href="https://github.com/htacg/tidy-html5/issues/895">The <code>&lt;summary&gt;</code> element isn't closed properly</a>. This was an annoying one. I had to manually rewrite my HTML to remove an <code>&lt;h2&gt;</code> element from inside the summary.</p>

<p>Although not really a bug, I like to have HTML comments on a newline.</p>

<pre><code class="language-php">$tidy = preg_replace( '/&gt;&lt;!--/', "&gt;\n&lt;!--", $tidy );
</code></pre>

<p>Sadly, <a href="https://github.com/htacg/tidy-html5/releases">the last release of HTML Tidy was back in 2021</a>. While some of the above bugs are fixed, <a href="https://github.com/htacg/tidy-html5/issues">there are more piling up</a>.</p>

<p>So I'll continue with these workarounds for now. Hit "view source" and tell me what you think!</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=51208&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2024/08/working-around-and-old-and-buggy-html-tidy-in-php/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title><![CDATA[link rel="alternate" type="text/plain"]]></title>
		<link>https://shkspr.mobi/blog/2024/05/link-relalternate-typetext-plain/</link>
					<comments>https://shkspr.mobi/blog/2024/05/link-relalternate-typetext-plain/#comments</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Fri, 10 May 2024 11:34:57 +0000</pubDate>
				<category><![CDATA[/etc/]]></category>
		<category><![CDATA[HowTo]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[WordPress]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=50490</guid>

					<description><![CDATA[Hot on the heels of yesterday&#039;s post, I&#039;ve now made all of this blog available in text-only mode.  Simply append .txt to the URl of any page and you&#039;ll get back the contents in plain UTF-8 text. No formatting, no images (although you can see the alt text), no nothing!   Front page https://shkspr.mobi/blog/.txt This blog post https://shkspr.mobi/blog/2024/05/link-relalternate-typetext-plain/.txt A …]]></description>
										<content:encoded><![CDATA[<p>Hot on the heels of <a href="https://shkspr.mobi/blog/2024/05/a-completely-plaintext-wordpress-theme/">yesterday's post</a>, I've now made all of this blog available in text-<em>only</em> mode.</p>

<p>Simply append <code>.txt</code> to the URl of <strong>any</strong> page and you'll get back the contents in plain UTF-8 text. No formatting, no images (although you can see the alt text), no nothing!</p>

<ul>
<li>Front page <a href="https://shkspr.mobi/blog/.txt"></a><a href="https://shkspr.mobi/blog/.txt">https://shkspr.mobi/blog/.txt</a></li>
<li>This blog post <a href="https://shkspr.mobi/blog/2024/05/link-relalternate-typetext-plain/.txt"></a><a href="https://shkspr.mobi/blog/2024/05/link-relalternate-typetext-plain/.txt">https://shkspr.mobi/blog/2024/05/link-relalternate-typetext-plain/.txt</a></li>
<li>A tag <a href="https://shkspr.mobi/blog/tag/solar.txt"></a><a href="https://shkspr.mobi/blog/tag/solar.txt">https://shkspr.mobi/blog/tag/solar.txt</a></li>
</ul>

<p>This was slightly tricky to get right!  While there might be an easier way to do it, here's how I got it to work.</p>

<p>Firstly, when someone requests <code>/whatever.txt</code>, WordPress is going to 404 - because that page doesn't exist. So, my theme's <code>functions.php</code>, detects any URls which end in <code>.txt</code> and redirects it to a different template.</p>

<pre><code class="language-php">//  Theme Switcher
add_filter( "template_include", "custom_theme_switch" );
function custom_theme_switch( $template ) {

    //  What was requested?
    $requested_url = $_SERVER["REQUEST_URI"];

    //  Check if the URL ends with .txt
    if ( substr( $requested_url, -4 ) === ".txt")  {    
        //  Get the path to the custom template
        $custom_template = get_template_directory() . "/templates/txt-template.php";
        //  Check if the custom template exists
        if ( file_exists( $custom_template ) ) {
            return $custom_template;
        }
    }

    //  Return the default template
    return $template;
}
</code></pre>

<p>The <code>txt-template.php</code> file is more complex.  It takes the requested URl, strips off the <code>.txt</code>, matches it against the WordPress rewrite rules, and then constructs the <code>WP_Query</code> which would have been run if the <code>.txt</code> wasn't there.</p>

<pre><code class="language-php">//  Run the query for the URl requested
$requested_url = $_SERVER['REQUEST_URI'];    // This will be /whatever
$blog_details = wp_parse_url( home_url() );  // Get the blog's domain to construct a full URl
$query = get_query_for_url( 
    $blog_details["scheme"] . "://" . $blog_details["host"] . substr( $requested_url, 0, -4 )
);

function get_query_for_url( $url ) {
    //  Get all the rewrite rules
    global $wp_rewrite;

    //  Get the WordPress site URL path
    $site_path = parse_url( get_site_url(), PHP_URL_PATH ) . "/";

    //  Parse the requested URL
    $url_parts = parse_url( $url );

    //  Remove the domain and site path from the URL
    //  For example, change `https://example.com/blog/2024/04/test` to just `2024/04/test`
    $url_path = isset( $url_parts['path'] ) ? str_replace( $site_path, '', $url_parts['path'] ) : '';

    //  Match the URL against WordPress rewrite rules
    $rewrite_rules = $wp_rewrite-&gt;wp_rewrite_rules();
    $matched_rule = false;

    foreach ( $rewrite_rules as $pattern =&gt; $query ) {
        if ( preg_match( "#^$pattern#", $url_path, $matches ) ) {
            $matched_rule = $query;
            break;
        }
    }

    //  Replace each occurrence of $matches[N] with the corresponding value
    foreach ( $matches as $key =&gt; $value ) {
        $matched_rule = str_replace( "\$matches[{$key}]", $value, $matched_rule );
    }

    //  Turn the query string into a WordPress query
    $query_params = array();
    parse_str(
        parse_url( $matched_rule, PHP_URL_QUERY), 
        $query_params
    );

    //  Construct a new WP_Query object using the extracted query parameters
    $query = new WP_Query($query_params);

    //  Return the result of the query
    return $query;
}
</code></pre>

<p>From there, it's a case of iterating over the posts returned by the query. You can <a href="https://gitlab.com/edent/blog-theme/-/blob/master/templates/txt-template.php">see the full code on my GitLab</a>.</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=50490&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2024/05/link-relalternate-typetext-plain/feed/</wfw:commentRss>
			<slash:comments>4</slash:comments>
		
		
			</item>
	</channel>
</rss>
