WebMentions, Privacy, and DDoS - Oh My!
Mastodon - the distributed social network - has two interesting challenges when it comes to how users share links. I'd like to discuss those issues and suggest a possible way forward.
When you click on a link on my website which takes you to another website, your browser sends a Referer0. This says to the other site "Hey, I came here using a link on shkspr.mobi
". This is useful because it lets a site owner know who is linking to them. I love seeing which weird and wonderful sites have linked to my content.
It is also something of a privacy nightmare as it lets sites see who is clicking and from where they're clicking. So Mastodon sets a noreferrer
1 attribute on all links. This tells the browser not to send the Referer.
This means sites no longer know who is sending them traffic.
That's either a good thing from a privacy perspective or a disaster from a marketing perspective. Or a little bit of both.
Here's a related issue. When a user posts a link to your website on Mastodon, the server checks your page to see if there are any oEmbed tags for a rich link preview. But, at the moment, it doesn't check your website's robots.txt
file - which lets it know whether it is allowed to scrape your content.
In the case of something like Twitter or Facebook, this is fine. If a million users post a link, the centralised social network checks the link once and caches the result.
With - potentially - thousands of distributed Mastodon sites, this presents a problem. If a popular account posts a link, their instance fetches a rich preview. Then every instance which has users following them also requests that URL. Essentially, this is a DDoS attack.
I can fix you
So here's my thoughts on how to fix this.
When a user posts a link to Mastodon, their instance should send a WebMention to the site hosting the link. This informs the website that someone has shared their content. Perhaps a user could adjust their privacy settings to allow or deny this.
The instance would check the site's robots.txt
and, if allowed, scrape the site to see if there were any Open Graph Protocol metadata elements on it.
That metadata should be included in the post as it is shared across the network.
For example, a status could look like this:
JSON
{
"id": "123",
"created_at": "2022-03-16T14:44:31.580Z",
"in_reply_to_id": null,
"in_reply_to_account_id": null,
"visibility": "public",
"language": "en",
"uri": "https://mastodon.social/users/Edent/statuses/123",
"content": "<p>Check out https://example.com/</p>",
"ogp_allowed": true,
"ogp": {
"og:title": "My amazing site",
"og:image:url": "https://cdn.mastodon.social/cache/example.com/preview.jpg",
"og:description": "A long description. Perhaps the first paragraph of the text."
...
}
...
}
When a post is boosted across the network, the instances can see that there is rich metadata associated with the link. If there is an image associate with the post, that will be loaded from the cache on the original Mastodon instance - avoiding overloading the website.
Now, there is a flaw in this idea. A malicious Mastodon server could serve up a fake OGP image and description. So a link to McDonald's might display a fake image promoting Burger King.
To protect against this, a receiving instance could randomly or periodically check the OGP metadata that they receive. If it has been changed, they can update it.
Perhaps a diagram would help?

What other people say about the problem
Feedback?
Is this a problem? Does this present a viable solution? Have I missed something obvious? Please leave a comment and let me know 😃
More comments on Mastodon.